python – FFmpeg filter_complex to merge audio from two files fails without an error due to album cover PNG stream

When you run FFmpeg filter_complex command to process two mp3 files and merge them into one, the command runs successfully, the output mp3 file is generated. No errors are reported. But the output fails to run. The result log from FFmpeg looks something like this, which points to an PNG stream in one of the mp3.

[mp3 @ 00000215d251e7c0] Estimating duration from bitrate, this may be inaccurate
Input #0, mp3, from 'C:\\vid gen\\Track_ Outwild x She Is Jules - Golden [NCS Release].mp3':
  Metadata:
    artist          : Outwild x She Is Jules
    album_artist    : Outwild x She Is Jules
    TCM             : Outwild, She Is Jules
    album           : Golden [Single]
    title           : Golden
    genre           : Electronic
    date            : 2021
  Duration: 00:04:09.60, 
start: 0.000000, bitrate: 378 kb/s
  Stream #0:0: Audio: mp3, 48000 Hz, stereo, fltp, 320 kb/s
  Stream #0:1: Video: mjpeg (Baseline), yuvj444p(pc, bt470bg/unknown/unknown), 3000x3000, 90k tbr, 90k tbn (attached pic)
    Metadata:
      comment         : Other
[mp3 @ 00000215d253f380] Estimating duration from bitrate, this may be inaccurate
Input #1, mp3, from 'E:\voiceover.mp3':
  Duration: 00:00:51.96, start: 0.000000, bitrate: 32 kb/s
  Stream #1:0: Audio: mp3, 24000 Hz, mono, fltp, 32 kb/s
Stream mapping:
  Stream #0:0 (mp3float) -> volume:default (graph 0)
  Stream #1:0 (mp3float) -> adelay:default (graph 0)
  amerge:default (graph 0) -> Stream #0:0 (libmp3lame)
  Stream #0:1 -> #0:1 (mjpeg (native) -> png (native))
Press [q] to stop, [?] for help
[swscaler @ 00000215d46ff480] deprecated pixel format used, make sure you did set range correctly
    Last message repeated 3 times
[Parsed_amerge_6 @ 00000215d470d2c0] No channel layout for input 1
Output #0, mp3, to 'output_test.mp3':
  Metadata:
    TPE1            : Outwild x She Is Jules
    TPE2            : Outwild x She Is Jules
    TCM             : Outwild, She Is Jules
    TALB            : Golden [Single]
    TIT2            : Golden
    TCON            : Electronic
    TDRC            : 2021
    TSSE            : Lavf60.18.100
  Stream #0:0: Audio: mp3, 48000 Hz, stereo, fltp\r\rame=    0 fps=0.0 q=0.0 size=       0kB time=-00:00:00.02 bitrate=N/A speed=N/A    \rframe=    0 fps=0.0 q=0.0 size=       0kB time=-00:00:00.02 bitrate=N/A speed=N/A    \r[out#0/mp3 @ 00000215d2537400] video:3696kB audio:2kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.011831%
frame=    1 fps=0.8 q=-0.0 Lsize=    3698kB time=00:00:00.07 bitrate=415140.5kbits/s speed=0.0601x

Read more here: Source link