Nsight GPU profiler not working due to DCGM conflict

Resolution: 
Resolved

UPDATE (Mar 15, 2023)

After the downtime on Mar. 14, 2023, OSC enabled a new Slurm option --gres=nsight. DCGM will be disabled on the nodes for the job with the Slurm option, and Nsight will function normally.

==================================

We are experiencing an issue with Nsight GPU profiler, which is affected by the GPU monitoring service (DCGM) that we are running.

This causes Nsight to malfunction, and produce error messages:

 ==ERROR== Profiling failed because a driver resource was unavailable. Ensure that no other tool (like DCGM) is concurrently collecting profiling data. See https://docs.nvidia.com/nsight-compute/ProfilingGuide/index.html#faq for more details.

We are looking for a workaround to resolve this issue.

Please contact oschelp@osc.edu if there are questions.