Engine ML provides a dashboard to monitor GPU, CPU, network, and disk utilization. You can quickly launch this dashboard in your web browser with the CLI.
engine job metrics adaptive-strut
At the top of the dashboard you will see average metrics across all GPUs. Drilldown metrics (per GPU/CPU/Node) are listed below these averages.
You can quickly add annotations to your metrics dashboard with
See Launch your Experiment for more details.