eml.sync

replica_mean

During training, all data (including weights, gradients, metrics) are only available to the current replica. For example, if you calculate the batch loss after every iteration and print it to stdout, you will only view the batch loss of a single replica.

Allreduce allows replicas to communicate. You can calculate the mean across replicas using eml.sync.replica_mean. Note that this function only accepts a numpy array or a numpy scalar.

Calculate the average loss across replicas

# Calculate this replica's batch loss as a numpy array
replica_losses = model.calculate_losses()
# Calculate the average loss across all replicas
losses = eml.sync.replica_mean(replica_losses)

Calculate the accuracy on your test set

# Distribute the calculation of your model metrics
test_set = eml.data.distribute(test_set)
# Calculate this replica's accuracy metrics
replica_accuracy = model.calculate_accuracy(test_set)
# Calculate the accuracy across all replicas
accuracy = eml.sync.replica_mean(np.float32(replica_accuracy))

replica_sum

During training, all data (including weights, gradients, metrics) are only available to the current replica. For example, if you calculate the batch loss after every iteration and print it to stdout, you will only view the batch loss of a single replica.

Allreduce allows replicas to communicate. You can calculate the sum across replicas using eml.sync.replica_sum. Note that this function only accepts a numpy array or a numpy scalar.

Calculate user-defined metrics

# Distribute the calculation of your model metrics
test_set = eml.data.distribute(test_set)
# Calculate the number of true positives on this replica
replica_true_positives = model.calculate_true_positives(test_set)
# Calculate the total number of true positives across all replicas
true_positives = eml.sync.replica_sum(np.float32(replica_true_positives))