To wrap your optimizer in a distributed optimizer allowing replicas to communicate with each other during the optimize step, use eml.optimizer.distribute.

import engineml.tensorflow as eml
import tensorflow as tf
opt = tf.train.AdamOptimizer()
opt = eml.optimizer.distribute(opt)


When training your model across multiple replicas, you are scaling the effective batch size. In order to compensate for this larger batch size, it is a best practice to also scale your optimizer's learning rate.

Linear Scaling Rule: When the minibatch size is multiplied by k, multiply the learning rate by k.

Source: Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour

learning_rate = 0.01
learning_rate = eml.optimizer.scale_learning_rate(learning_rate)