My Job Failed During the Build Step
Sometimes you will encounter errors when submitting jobs Engine ML. Often these errors occur while building your docker image.
To view messages from the docker build use
engine job docker-build-log JOB_ID.
engine job docker-build-log adaptive-strut Getting docker build output for job adaptive-strut Step 1/14 : FROM ...
My Job Failed after Running
Check your logs for errors with engine job log. This will display logs from the
master node. Keep in mind that there are actually
n copies of your code running and if your job dies, then one of
those copies may have crashed. You can download the logs from all replicas by adding the
--download-all flag to
engine job log.
My Job is Hanging
Often jobs aren't stuck, but load data slowly. Some questions to ask yourself if your job looks like it is hanging:
- Does the code use
- Is the data loading implemented correctly?
- Are the data files small?