Introduction
gpu-seff is a command developed at OSC for use on OSC's systems and is similar providing GPU resource data, similar to the CPU resource data reported by the seff command.
Availability
CARDINAL | PITZER | ASCEND |
---|---|---|
X |
X |
X |
Usage
gpu-seff takes the following options and parameters.
$ gpu-seff -h usage: gpu-seff [-h] [-M {pitzer,ascend,cardinal}] [-d] [-t TIMEBUFFER] [-v] [-j] [-mu] [-me] [-gu] [-ge] [-ji] [-gi] jobid positional arguments: jobid Jobid optional arguments: -h, --help show this help message and exit -M {pitzer,ascend,cardinal}, -c {pitzer,ascend,cardinal}, --cluster {pitzer,ascend,cardinal} Cluster -d, --debug Debug -t TIMEBUFFER, --timebuffer TIMEBUFFER Time buffer (seconds, default 60) -v, --verbose Detailed per-gpu report -j, --json Output in json format optional output options: By default these options are enabled. If any flags are passed in, only those flags will be used. -mu, --memory-util Include memory utilization data -me, --memory-efficiency Include memory efficiency data -gu, --gpu-util Include GPU utilization data -ge, --gpu-efficiency Include GPU utilization data -ji, --job-info Include basic information about the job -gi, --gpu-info GPU-related information about the job
Default behavior
By default, the gpu-seff command will give an overview of the job resource usage, including total memory utilization, average memory efficiency, total utilization, and total efficiency across all allocated GPUs.
$ gpu-seff 100 Job ID: 100 Cluster: cardinal User/Group: xxx/PPP1234 State: COMPLETED Nodes: 1 Job Wall-clock time: 00:03:40 GPUs per node: 2 Total GPUs: 2 GPU Memory Utilized: 77.17 GB GPU Memory Efficiency: 41.49% of 186.01 GB GPU Utilization: 00:01:14 GPU Efficiency: 17.03% of 00:07:20 gpu-walltime
Running from another cluster
xxx@ascend-login01:~$ gpu-seff -M cardinal 100 Job ID: 100 Cluster: cardinal User/Group: xxx/PPP1234 State: COMPLETED Nodes: 1 Job Wall-clock time: 00:03:40 GPUs per node: 2 Total GPUs: 2 GPU Memory Utilized: 77.17 GB GPU Memory Efficiency: 41.49% of 186.01 GB GPU Utilization: 00:01:14 GPU Efficiency: 17.03% of 00:07:20 gpu-walltime
Get per-GPU statistics
To get per-GPU statistics, rather than summary statistics across all gpus, pass the verbose flag
$ gpu-seff 100 --verbose Job ID: 100 Cluster: cardinal User/Group: xxx/PPP1234 State: COMPLETED Nodes: 1 Job Wall-clock time: 00:03:40 GPUs per node: 2 Total GPUs: 2 GPU Memory Utilized: Host c0818 GPU #0: 32.08 GB Host c0818 GPU #1: 45.09 GB GPU Memory Efficiency: Host c0818 GPU #0: 34.50% of 93.00 GB Host c0818 GPU #1: 48.48% of 93.00 GB GPU Utilization: Host c0818 GPU #0: 00:00:35 Host c0818 GPU #1: 00:00:39 GPU Efficiency: Host c0818 GPU #0: 15.96% of 00:03:40 gpu-walltime Host c0818 GPU #1: 18.11% of 00:03:40 gpu-walltime
Output to as JSON
To display the resourece information in an easily parsible json format, pass the --json flag.
$ gpu-seff 100 --json { "jobid": "100", "cluster": "cardinal", "user": "xxx", "group": "PP1234", "nodes": 1, "walltime": "00:03:40", "gpu_per_node": 2.0, "total_gpus": 2, "gputime": 440, "mem_util": "77.17 GB", "mem_eff": 41.48684832257049, "gpu_util": 74, "gpu_eff": 17.034495 }
Enabling/disabling certain statistics
By default, gpu-seff will display all of memory utilization, memory efficiency, gpu utilization and gpu efficiency, basic job information, and GPU resource details. If any of the optional output options are specified, then only those specified will be shown.
To only display, GPU details and exclude basic job information:
$ gpu-seff 100 -gi -mu -me -gu -ge GPUs per node: 2 Total GPUs: 2 GPU Memory Utilized: 77.17 GB GPU Memory Efficiency: 41.49% of 186.01 GB GPU Utilization: 00:01:14 GPU Efficiency: 17.03% of 00:07:20 gpu-walltime