I made a list of what I tried and why I think I failed. Hopefully this will help.
Attempt 1: Enable Hardware Encoding
My first attempt was to simply tell ffmpeg to use the VA-API hardware encoder for the final output.
The idea was that ffmpeg would perform all the software filtering (scaling, stacking) on the CPU and then hand off the final, composited video to the GPU for encoding.
Why It Failed: This failed with the error A hardware device reference is required to upload frames to. This meant that just selecting the h264_vaapi encoder wasn't enough, the hwupload filter needed to know which specific hardware device to use.
Attempt 2: Initialize the Hardware Device
To fix the "missing hardware device" error, I added global flags to the beginning of the ffmpeg command to initialize the VA-API device.
I added -hwaccel vaapi, -hwaccel_device /dev/dri/renderD128, and -hwaccel_output_format vaapi to the start of the ffmpeg command. I was trying to tell ffmpeg to use the GPU for hardware acceleration from the very beginning.
Why It Failed: This created the opposite problem, with the error Impossible to convert between the formats supported by the filter 'Parsed_fps_0' and the filter 'auto_scale_0'. By initializing the hardware at the start, ffmpeg decoded the incoming streams on the GPU. The video frames were now "stuck" on the GPU in a hardware-specific format that the software-based fps filter couldn't understand.
Attempt 3: The Full CPU-to-GPU Pipeline
To resolve the format conflict, I tried to create a full, explicit pipeline to manage the data flow between the CPU and GPU.
I kept the initial -hwaccel flags to decode on the GPU, and then in the complex filter, I added a hwdownload filter for each input stream. This was supposed to move the decoded video from the GPU back to the CPU's memory before the software filters were applied.
Why It Failed: This is where I hit the core of the problem. The process of decoding on the GPU, downloading to the CPU for filtering, and then uploading back to the GPU for encoding is complex. This failed with the error Impossible to convert between the formats supported by the filter 'graph 0 input from stream 1:1' and the filter 'auto_scale_0', indicating that the hand-off between the hardware decoder and the hwdownload filter was failing due to an incompatible pixel format.