During training, all data (including weights, gradients, metrics) are only available to the current replica. For example, if you calculate the batch loss after every iteration and print it to stdout, you will only view the batch loss of a single replica.
Allreduce allows replicas to communicate. You can calculate the sum across replicas using
eml.sync.replica_sum. Note that this function only accepts a numpy array or a numpy scalar.
Calculate user-defined metrics
# Distribute the calculation of your model metrics test_set = eml.data.distribute(test_set) # Calculate the number of true positives on this replica replica_true_positives = model.calculate_true_positives(test_set) # Calculate the total number of true positives across all replicas true_positives = eml.sync.replica_sum(np.float32(replica_true_positives))