python – Low GPU Utilization NVIDIA / FFMPEG

I’m trying to run a Docker container on runpod.io to offset media transcoding via serverless GPU’s. I have the container image based off of “nvidia/cuda:12.1.0-cudnn8-devel-ubuntu22.04”.

Here’s the configuration for FFMPEG in my Dockerfile:

git clone https://github.com/FFmpeg/nv-codec-headers.git && \
    make install -C ./nv-codec-headers && \
    git clone https://git.ffmpeg.org/ffmpeg.git ffmpeg_source/ && \
    /ffmpeg_source/configure --prefix=/usr --ld="g++" --enable-nonfree --enable-gpl --enable-gnutls --enable-cuda-nvcc --enable-cuda --enable-cuda-llvm --enable-cuvid --enable-nvenc --enable-ffnvcodec --enable-libnpp --enable-libmp3lame --enable-libx264 --enable-libx265 --enable-libvpx --enable-libfreetype --enable-libvorbis --enable-libfdk-aac --enable-libopus --extra-cflags=-I/usr/local/cuda/include --extra-ldflags=-L/usr/local/cuda/lib64 --disable-static --enable-shared --disable-stripping

I have the capability for GPU in my compose.yml:

    deploy:
      resources:
        reservations:
          devices:
          - driver: nvidia
            count: 1
            capabilities: [gpu]

I receive the CUDA boot up screen when the container deploys. It’s running on an RTX A6000, which is capable of hardware accelerated encoding and decoding via NVIDIA for ffmpeg.

My FFMPEG command goes as follows:

command = "ffmpeg -y -hwaccel cuda -hwaccel_output_format cuda -i - "

command += f"-vf scale_npp=1920:1080 -c:v h264_nvenc -b:v 5M -preset p2 -tune ll -f mp4 -bufsize 5M -maxrate 10M -qmin 0 -g 250 -bf 3 -b_ref_mode middle -temporal-aq 1 -rc-lookahead 20 -i_qfactor 0.75 -b_qfactor 1.1 {tempfile_1080.name} "

command += f"-vf scale_npp=1280:720 -c:v h264_nvenc -b:v 3M -preset p2 -tune ll -f mp4 -bufsize 3M -maxrate 6M -qmin 0 -g 250 -bf 3 -b_ref_mode middle -temporal-aq 1 -rc-lookahead 20 -i_qfactor 0.75 -b_qfactor 1.1 {tempfile_720.name} "

command += f"-vf scale_npp=640:480 -c:v h264_nvenc -b:v 1M -preset p2 -tune ll -f mp4 -bufsize 1M -maxrate 2M -qmin 0 -g 250 -bf 3 -b_ref_mode middle -temporal-aq 1 -rc-lookahead 20 -i_qfactor 0.75 -b_qfactor 1.1 {tempfile_480.name}"

I’m using Python and piping to stdin with bytes.

The CPU stays at 100%, while I’m lucky if the GPU ever leaves 0%. I think I’ve seen it hit at most about 4% utilization, while the CPU is completely maxed out.

I’ve tried simpler commands. I thought maybe it was due to the audio, so I dropped the audio, but it didn’t change anything.

I’ve tried different images, 11.8 cuda, 12.0 cuda, 12.1 cuda, 12.2 cuda.

I’ve tried the runtime and devel images for each of those versions.

The drivers are up to date.

It clearly taps into the GPU, because it will slightly bump up to a few percents before going back down to zero. On top of this, the output is also wrong/corrupted, as no video player will open the file, stating that it can’t be played.

I have also swapped “-hwaccel cuda” for “-hwaccel nvdec”.

No errors thrown and nothing changes. I have also tried with hevc_nvenc for the encoder in x265, also made no difference.

Not sure what I’m doing wrong. Maybe this can’t be done via piping?

Read more here: Source link