KEYWORDS: Video coding, Video, Power consumption, Energy efficiency, Video compression, Open source software, Scalable video coding, Video processing, Clocks
In this paper, we present a methodology for benchmarking the coding efficiency and energy efficiency of software and hardware video transcoding implementations. This study builds upon our previous work, which focused on software encoders such as x264, x265, libvpx, vvenc, and SVT-AV1. We have since added a closed-source video software encoder implementation, EVE-VP9, as well as Meta’s MSVP VP9 encoder as a hardware representative, and expanded the test set to include a wider variety of test content in our analysis. To ensure a fair comparison between software and hardware encoders, we normalize the video encoding efficiency to energy used in watt-hours. Our proposed test methodology includes a detailed description of the process for measuring compression efficiency and energy consumption. We summarize limitations of our methodology and identify future opportunities for improvement.
One of the biggest challenges in Meta’s video delivery system is device fragmentation. Due to the large user base of Meta’s Family of Apps, our supported devices range from single-core Galaxy Y to the latest Galaxy S22, from the first generation of iPad mini to iPhone 14 pro. Moreover, for devices that do not support hardware decoding for more advanced codecs, such as AV1, we have to rely on software decoders which require high compute power and memory bandwidth. It would be ideal if we can deliver high resolution AVC or VP9 encoded ABR lanes along with low resolution AV1 encoded ABR lanes for the same video, so the client device can choose which one to play based on its compute capacity. In addition, when user uploaded videos that are already encoded with advanced codecs, such as VP9 or HEVC, while generating ABR encoding ladders from the uploaded video using AVC, we also want to deliver the original uploaded video as passthrough to maximize the quality. In both use cases, we will need to support streaming ABR manifest with multiple encoded bitstreams from different codecs and play them smoothly on the client side. On top of that, we can also optimize the ABR encoding and delivery system to select ABR lanes encoded with different codecs for different bitrate targets. In this paper, we will describe how we implemented the end-to-end mixed codec manifest support and deployed the solution in production. The proposed approach effectively generalizes encoding selection into a filtering pipeline, evaluating device capacity via a feedback loop, and being adaptive to the bandwidth estimation, viewport size and device capacity.
Videos uploaded to Meta's Family-of-Apps are transcoded into multiple bitstreams of various codec formats, resolutions and quality to provide the best video quality across the wide variety of devices and connection bandwidth constraints. On Facebook alone, there are more than 4 billion video views per day and to address the video processing at this scale, we needed a video processing solution that can deliver the best video quality possible, with the shortest amount of encoding time — all while being energy efficient, programmable, and scalable. In this paper, we present, Meta Scalable Video Processor (MSVP) that can do video processing at on-par quality compared to SW solutions but at a small fraction of the compute time and energy. Each MSVP ASIC can offer a peak SIMO (Single Input Multiple Output) transcoding performance of 4K at 15fps at the highest quality configuration and can scale up to 4K at 60fps at the standard quality configuration. This performance is achieved at ~10W of PCIe module power. We achieved a throughput gain of ~9x for H.264 when compared against libx264 SW encoding. For VP9, we achieved a throughput gain of ~50x when compared with libVPX speed 2 preset. Key components of MSVP transcoding include video decode, scalar, encoding and quality metric computation. In this paper, we go over ASIC architecture of MSVP, design of individual components and compare the perf/W vs quality against standard industry used SW encoders.
The trade-offs between compression performance and encoding complexity are key in software video encoding, even more so with increasing pressure on sustainability. Previous work “Towards much better SVT-AV1 quality-cycles tradeoffs for VOD applications” [1] described three approaches of evaluating compression efficiency vs cycles trade-offs within a convex-hull framework using the Dynamic Optimizer (DO) algorithm developed in [2] [3] for VOD applications. In parallel, the new video codec enhancer LCEVC (Low Complexity Enhancement Video Coding) [4], designed to provide gains in speed-quality trade-offs, has recently been standardized as MPEG-5 Part 2. The core idea of LCEVC is to use any video coding standard (such as AV1) as a base encoder at a lower resolution, and then reduce artifacts and reconstruct a full resolution output by combining the decoded low-resolution output with up to two low-complexity reconstruction enhancement sub-layers of the residual data. This paper starts by applying LCEVC to SVT-AV1 [5], as well as x264 [6] and x265 [7], while using two of the approaches presented in [1] to evaluate the resulting compression efficiency vs cycles trade-offs. The paper then discusses the benefits of LCEVC towards higher playback speed and lower battery power consumption when using AV1 software decoding. Results show that, with fast-encoding parameter selection using the discrete convex hull methodology, LCEVC improves the quality-cycles trade-offs for all the tested codecs and across the full complexity range. In the case of SVT-AV1, LCEVC yields a ~40% reduction in computations while achieving the same quality levels according to VMAF_NEG [8]. LCEVC also enlarges the set of mobile devices capable of playing HD as well as high-frame-rate content encoded with AV1 and extends mobile battery life by up to 50% with respect to state-of-the-art AV1 software decoding.
This paper describes FB-MOS metric that measures video quality at scale in Facebook ecosystem. As the quality of uploaded UGC source itself varies widely, FB-MOS consists of both a no-reference component to assess input (upload) quality and a full-reference component, based on SSIM, to assess quality preserved in the transcoding and delivery pipeline. Note that the same video may be watched on a variety of devices (Mobile/laptop/TV) in varying network conditions that cause quality fluctuations; moreover, the viewer can switch between in-line view and full-screen view during the same viewing session. We show how FB-MOS metric accounts for all this variation in viewing condition while minimizing the computation overhead. Validation of this metric on FB-content has shown that SROCC is 0.9147 using internally selected videos. The paper also discusses some of the optimizations to reduce metric computation complexity and scale the complexity in proportion to video popularity.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.