gpu-seff

Introduction

gpu-seff is a command developed at OSC for use on OSC's systems and is similar providing GPU resource data, similar to the CPU resource data reported by the seff command.

Availability

CARDINAL PITZER ASCEND

X

X

X

 

Usage

gpu-seff takes the following options and parameters.

$ gpu-seff -h
usage: gpu-seff [-h] [-M {pitzer,ascend,cardinal}] [-d] [-t TIMEBUFFER] [-v] [-j] [-mu] [-me] [-gu] [-ge] [-ji] [-gi] jobid

positional arguments:
  jobid                 Jobid

optional arguments:
  -h, --help            show this help message and exit
  -M {pitzer,ascend,cardinal}, -c {pitzer,ascend,cardinal}, --cluster {pitzer,ascend,cardinal}
                        Cluster
  -d, --debug           Debug
  -t TIMEBUFFER, --timebuffer TIMEBUFFER
                        Time buffer (seconds, default 60)
  -v, --verbose         Detailed per-gpu report
  -j, --json            Output in json format

optional output options:
  By default these options are enabled. If any flags are passed in, only those flags will be used.

  -mu, --memory-util    Include memory utilization data
  -me, --memory-efficiency
                        Include memory efficiency data
  -gu, --gpu-util       Include GPU utilization data
  -ge, --gpu-efficiency
                        Include GPU utilization data
  -ji, --job-info       Include basic information about the job
  -gi, --gpu-info       GPU-related information about the job   

 

Default behavior

By default, the gpu-seff command will give an overview of the job resource usage, including total memory utilization, average memory efficiency, total utilization, and total efficiency across all allocated GPUs.

$ gpu-seff 100
Job ID: 100
Cluster: cardinal
User/Group: xxx/PPP1234
State: COMPLETED
Nodes: 1
Job Wall-clock time: 00:03:40
GPUs per node: 2
Total GPUs: 2
GPU Memory Utilized: 77.17 GB
GPU Memory Efficiency: 41.49% of 186.01 GB
GPU Utilization: 00:01:14
GPU Efficiency: 17.03% of 00:07:20 gpu-walltime

Running from another cluster

xxx@ascend-login01:~$ gpu-seff -M cardinal 100
Job ID: 100
Cluster: cardinal
User/Group: xxx/PPP1234
State: COMPLETED
Nodes: 1
Job Wall-clock time: 00:03:40
GPUs per node: 2
Total GPUs: 2
GPU Memory Utilized: 77.17 GB
GPU Memory Efficiency: 41.49% of 186.01 GB
GPU Utilization: 00:01:14
GPU Efficiency: 17.03% of 00:07:20 gpu-walltime

Get per-GPU statistics

To get per-GPU statistics, rather than summary statistics across all gpus, pass the verbose flag

$ gpu-seff 100 --verbose
Job ID: 100
Cluster: cardinal
User/Group: xxx/PPP1234
State: COMPLETED
Nodes: 1
Job Wall-clock time: 00:03:40
GPUs per node: 2
Total GPUs: 2
GPU Memory Utilized:
  Host c0818 GPU #0: 32.08 GB
  Host c0818 GPU #1: 45.09 GB
GPU Memory Efficiency:
  Host c0818 GPU #0: 34.50% of 93.00 GB
  Host c0818 GPU #1: 48.48% of 93.00 GB
GPU Utilization:
  Host c0818 GPU #0: 00:00:35
  Host c0818 GPU #1: 00:00:39
GPU Efficiency:
  Host c0818 GPU #0: 15.96% of 00:03:40 gpu-walltime
  Host c0818 GPU #1: 18.11% of 00:03:40 gpu-walltime

Output to as JSON

To display the resourece information in an easily parsible json format, pass the --json flag.

$ gpu-seff 100 --json
{
    "jobid": "100",
    "cluster": "cardinal",
    "user": "xxx",
    "group": "PP1234",
    "nodes": 1,
    "walltime": "00:03:40",
    "gpu_per_node": 2.0,
    "total_gpus": 2,
    "gputime": 440,
    "mem_util": "77.17 GB",
    "mem_eff": 41.48684832257049,
    "gpu_util": 74,
    "gpu_eff": 17.034495
}

 

Enabling/disabling certain statistics

By default, gpu-seff will display all of memory utilization, memory efficiency, gpu utilization and gpu efficiency,  basic job information, and GPU resource details. If any of the optional output options are specified, then only those specified will be shown.

To only display, GPU details and exclude basic job information:

$ gpu-seff 100 -gi -mu -me -gu -ge
GPUs per node: 2
Total GPUs: 2
GPU Memory Utilized: 77.17 GB
GPU Memory Efficiency: 41.49% of 186.01 GB
GPU Utilization: 00:01:14
GPU Efficiency: 17.03% of 00:07:20 gpu-walltime

 

Supercomputer: 
Service: