OMX Hardware Transcoding on Raspberry PI

Much better: 5s start and seek with Experimental, and 2-3s without. That's with a 1080p output.

With 1080->720p (i.e. scaling), start takes 10's of secs, and seeking similar, sometimes refusing to seek at all. Odd that recorded playback is less stable/performant than live playback (though I'd guess that's due to Channels trying to extrapolate forward and overfill its buffer sufficiently to last the entire program, with mixed success).

BTW, the Gentoo 64bit distro maintainer gave some good pointers about MMAL/OMX 64bit compatibility. Also learned through following links that MMAL has (two) GPU resizers built in: vc.ril.resize, and vc.ril.isp, with the latter claiming to be able to do format conversion and resize realtime for 1080p inputs on Pi3 hardware. There may even be v4l2 drivers that work. Lowest hanging fruit for further optimization?

2 Likes

MMAL resizing could be interesting. So far my experiments with MMAL have not been very fruitful though. It's a pain to setup and use, and often just blows up with obscure errors.

If v4lv2 is in the future that's great news.

Seems that way from what I've been reading. I been playing with the h264_v4l2m2m encoder instead of omx, but keep getting hung up on fastdeint=blend from your ffmpeg. Where does that guy come from? It's impossible to compare apples to apples without it. I've seen and tried yadif as the standard ffmpeg deinterlacer, but it's slow; creating an unfair comparison.

BTW, without any deinterlacing or scaling, h264_v4l2m2m is for me noticeably faster (like 40%) than h264_omx encoding, so maybe worth a try.

I haven't found anywhere that actually discusses which SoCs have which v4l2m2m features implemented, but it clearly targets scaling among many other goodies (not sure about deinterlacing). Presumably if MMAL can do it the v4l2 drivers can do it with the same hardware. "It's all just software(TM)".

Interesting, so the v4l2m2m encoder works out of the box? Or did you have to install some drivers first?

Builtin to my newer ffmpeg:

$ ffmpeg -encoders | grep 264
ffmpeg version N-94563-g3aeb681f07 Copyright (c) 2000-2019 the FFmpeg developers
  built with gcc 8 (Raspbian 8.3.0-6+rpi1)
  configuration: --arch=armhf --target-os=linux --enable-gpl --enable-omx --enable-omx-rpi --enable-nonfree --enable-libx264
  libavutil      56. 33.100 / 56. 33.100
  libavcodec     58. 55.100 / 58. 55.100
  libavformat    58. 30.100 / 58. 30.100
  libavdevice    58.  9.100 / 58.  9.100
  libavfilter     7. 58.100 /  7. 58.100
  libswscale      5.  6.100 /  5.  6.100
  libswresample   3.  6.100 /  3.  6.100
  libpostproc    55.  6.100 / 55.  6.100
 V..... libx264              libx264 H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10 (codec h264)
 V..... libx264rgb           libx264 H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10 RGB (codec h264)
 V..... h264_omx             OpenMAX IL H.264 video encoder (codec h264)
 V..... h264_v4l2m2m         V4L2 mem2mem H.264 encoder wrapper (codec h264)

If it's faster and clearly the target of future development, v4l2 seems like the obvious option to go with.

Yea V4L2 is the future. I was expecting it to work already. I think it does both decoding and encoding as well.

I'm surprised Emby is shipping h264_omx.. I thought they were using v4l2 for other things already.

Well I think it was introduced to ffmpeg quite recently, and they have a "custom" patched version (which they don't share and which is getting them in trouble for license violations). Here's what mine has:

 V..... h263_v4l2m2m         V4L2 mem2mem H.263 decoder wrapper (codec h263)
 V..... h264_v4l2m2m         V4L2 mem2mem H.264 decoder wrapper (codec h264)
 V..... hevc_v4l2m2m         V4L2 mem2mem HEVC decoder wrapper (codec hevc)
 V..... mpeg1_v4l2m2m        V4L2 mem2mem MPEG1 decoder wrapper (codec mpeg1video)
 V..... mpeg2_v4l2m2m        V4L2 mem2mem MPEG2 decoder wrapper (codec mpeg2video)
 V..... mpeg4_v4l2m2m        V4L2 mem2mem MPEG4 decoder wrapper (codec mpeg4)
 V..... vc1_v4l2m2m          V4L2 mem2mem VC1 decoder wrapper (codec vc1)
 V..... vp8_v4l2m2m          V4L2 mem2mem VP8 decoder wrapper (codec vp8)
 V..... vp9_v4l2m2m          V4L2 mem2mem VP9 decoder wrapper (codec vp9)

Still can't find fastdeint filter; is that something custom?

1 Like

Yea fastdeint is something I've been cobbling together. The patch is a mess which is why I haven't sent it to ffmpeg-devel yet, but I will make that a priority.

1 Like

I tried v4l2m2m on my rpi3+ and the performance is quite good. However when I tried to watch the encoded video, the colors are all off.

I'm going out of town for a few days but will pick this back up next week. My rpi4 whould be here by then as well.

The latest build (click and hold Check For Update) is using the fast_bilinear scaler for improved performance.

FYI, once I fixed h264_v4l2m2m to send the entire video frame to the encoder (instead of only half the colors), the speed slowed down and now it's the same as h264_omx.

I've sent about a dozen patches to the ffmpeg-devel mailing list with omx and v4lm2m improvements. You can also see them on https://github.com/ffmpeg/ffmpeg/compare/tmm1:v4l2-rpi and https://github.com/ffmpeg/ffmpeg/compare/tmm1:omx-patches

Nice.

Very interesting. vfl2 still may be a more convenient interface if de-interlacing and scaling gets hardware treatment.

I tried out the new version and it can de-interlace and transcode 1080i->720p! Well, it's not perfectly stable, as it still stalls out (not buffering, just stops) occasionally. And some 1080i channels dip down below 1x and so do end up buffering, but getting much closer.

What's fun is 720p->720p. Since there is no interlacing or scaling, it happens in hardware, and CPU load goes very low at times.

Anyway, getting there. I feel with a bit more optimization this would be perfectly usable.

v4l2 has a scaling API, but the broadcom driver does not implement it. There's no deinterlacing available in the hardware.

I pulled some patches earlier which should fix this. I'm uploading a new build now with those included. Let me know if you still see stalling after that.

Awesome. Both h264 and mpeg2 720p sources do very well in my tests.

The ISP component is complicated to use, but the RESIZE one seems like it would give us hardware scaling and remove the major bottleneck. I'll take a stab at it although I don't know if I have the time/patience required to actually pull it off.

I wrote a basic hardware scaler. Performance is improved, but still not reaching 30fps to do real-time encode.

720p: swscaler=12fps fastswscaler=19fps hwscaler=24fps
480p: swscaler=13fps fastswscaler=27fps hwscaler=29fps

My initial approach is very naive and copies the video frame out of the GPU rescaler and then back into the GPU encoder, so there are still some more performance gains possible.

Finally got my Pi4 today!

The latest build runs very nicely on it. Getting 1.2x converting 1080i mpeg2 -> 720p h264.

2 Likes

Very nice. That's with your new hardware scaler? Sounds like you must have further improved its performance. I'll give it a try. BTW, I'm not able to view remotely in Channels app with the most recent betas, only in the browser. Remotely the app just says "Your Channels DVR computer is not powerful enough".

BTW, did your scaler go the v4l2 route? I'm sure lots of people would appreciate playing with that (thinking PI-based security cameras, home automation video monitoring systems, etc., where you want to scale for mobile live feeds).

Yea this is with the scaler. It uses the ISP hardware via V4L2. I had been testing on the RPI3+ where the perf is still not great, but the RPI4 is so much faster it doesn't matter.

Will fix.

Hey this is fantastic! Tried 2019.08.27.0251 remotely in the web interface, and it works really well for 1080i content. With my weak 5mpbs upload speed, I had to dial it back to 720p3mbps. Monitoring channels-dvr upload usage revealed about 350-420KB/s or 2.8-3.3Mbps, so it seems the hardware encoder is respecting the streaming rate. Quality is decent. It loads pretty quickly (5-10s) and is quite stable at the 720p3mbps (or 4mpbs) setting for 1080i content. Thanks to your hardware scaler, CPU stays below 150%, and often less than 100%. That means the Pi will have plenty of time to do other things. Truly remarkable progress here!

Unfortunately 720p content is a different story — it either stalls regularly, or refuses to start playing entirely. While attempting to stream 720p content, CPU is humming along at 125% or less (seems about right), but what's strange is it's completely saturating my upstream pipe at ~570KB/s for minutes at a time. Kind of seems like it isn't respecting the data rate limits for 720p content when the scaler isn't used.

Interestingly 480p also seems close to correct for upload speed (assuming a 720p 4mbps corresponds to about 1.8mpbs 480p). I would guess you don't "upscale" so not sure why 720p has such specific issues; maybe some asymmetry in how the encoder is setup with and without scaling. Also odd because 720p content used to be the easiest!

Anyway, great progress. Looking forward to seeing how this evolves.

2 Likes

Can you try 720p again with the latest build? I'm not seeing the issues you were having, but maybe it got fixed along the way.

You should also be able to stream from the Channels app now.

EDIT: Never mind, I see that bitrate is not being respected for some reason.

@jdts Your analysis was correct in that bitrate controls were not working in some cases. The issue was not related to the resolution, but rather the framerate of the video source.

The issue is resolved in the latest build, so you should be able to transcode and stream all channels and recordings, including to the Channels app.