Huawei 2025 Tech Challenge: HEVC/h.265 Performance Improvement - Personal Notes

Disclaimer: This post is a work in progress and serves as my personal research notes. Information may be incomplete, unverified, or subject to change as I continue developing the content.

The objective of this task is to design a pair of pre-processing and post-processing filters to enhance the performance of an HEVC/h.265 video codec. Performance improvement is defined as either reducing the encoded video stream size while maintaining reconstruction quality, or enhancing reconstruction quality while keeping the stream size constant or a combination of both.


Video Dataset - Why YUV files #

The dataset provided is in YUV 4:2:0 10-bit format. The framework is based on the HEVC/h.265 standard.

A foundational principle of modern video compression, including HEVC, is the separation of color information from brightness information. This is achieved by converting the video from the R’G’B’ color space into a luma-chroma format.

Y (Luma) represents the brightness or luminance of a pixel. U and V represent the color information, more exactly, the color difference relative to Y.

(A color is described as a Y′ component (luma) and two chroma components U and V. The prime symbol (’) denotes that the luma is calculated from gamma-corrected RGB input and that it is different from true luminance. - https://en.wikipedia.org/wiki/Y%E2%80%B2UV)

YUV files store pixel values directly, so no compression artifacts from containers like MP4 or MKV. This ensures all competitors start from the exact same reference quality.

HEVC/H.265 encoders don’t actually work in RGB internally. They use YUV color space, often YUV420p (planar, 4:2:0 chroma subsampling).

Chroma Subsampling Scheme Bandwidth Reduction vs. 4:4:4 Typical Use Cases Impact on Quality
4:4:4 None (uncompressed) High-end post-production, PC monitors, cinematic content Highest quality, best for colored text and fine details
4:2:2 One-third reduction Digital broadcasting, professional video formats (e.g., ProRes) Good quality, generally imperceptible loss in most scenarios
4:2:0 Halved bandwidth Consumer streaming services (e.g., Netflix, YouTube), digital television Lowest data rate, potential for artifacts in gradients and fine-textured text on colored backgrounds

A YUV file is basically just a binary dump of all frames one after the other in a fixed format. No headers, no timing metadata.


Project base structure: #

.  
├── img/  
│   ├── default_pipeline.png  
│   ├── pre_and_post_pipeline.png  
│   ├── thumbnails/             (.y4m.jpg files)
│   └── thumbnails_animated/    (.y4m.gif files) 
├── originals/                  (.y4m files)
├── README.md  
├── requirements.txt  
├── src/  
│   ├── common/  
│   │   ├── codec.py  
│   │   ├── common.py  
│   │   ├── dataset.py  
│   │   ├── frame.py  
│   │   ├── GLOBAL_VARIABLES.py  
│   │   ├── metadata.py  
│   │   ├── metrics.py  
│   │   ├── submission.py  
│   │   └── validation.py  
│   ├── postprocess/  
│   │   └── postprocess_script.py  
│   └── preprocess/  
│       └── preprocess_script.py  
└── submission_check.ipynb

Expected output #

Participants must submit both the implementation code and a detailed description of their algorithms for the pre-processing and post-processing.

Participants must submit the compressed video streams generated by their solution as well as the corresponding reconstructed video frames.

Evaluation #

Submissions will be evaluated primarily based on BD-Rate, which measures the rate-distortion trade-off between compressed size and reconstruction quality across multiple quality points. Additional metrics include PSNR, MS-SSIM, and VMAF, complemented by subjective visual quality assessments. The computational complexity of solutions will also be considered but with lower priority.


Guidelines and Restrictions #

The HEVC/h.265 codec from the FFmpeg library is used for encoding and decoding. Ratified in 2013, HEVC was engineered to meet the growing demands of high-resolution content, supporting resolutions up to 8K UHD (8192x4320 pixels). A primary objective of the standard was to achieve superior compression efficiency, which is defined as the ability to encode video at the lowest possible bit rate while maintaining a desired level of quality.

The enhanced efficiency of HEVC stems from several core architectural innovations. The most notable change is the move from AVC’s fixed 16x16 pixel macroblocks to HEVC’s more flexible Coding Tree Units (CTUs), which can range up to 64x64 pixels.

Modifications to the codec itself are prohibited; only pre-processing and post-processing modules may be altered.

Metadata transmission is allowed but counts toward the total bitrate and must be included in performance considerations.


Environment setup #

Since the project involves a lot of processing and big video files, my laptop crashes quite often, and if I reduce the CPU cores used it will perform very slow, so I got a VPS that should handle the task properly.

Set up a virtual envrionment, install the requirements from requirements.txt, as well as scikit-image and ffmpeg-python and set up a jupyter server (optional).

Forward any port from the server to be able to edit code and open the notebook:

ssh -i ~/.ssh/pubkey -L 8888:localhost:8888 username@ip

And on the server:

jupyter notebook --no-browser --port=8888

Executing the jupyter notebook #

YUV datasets can be tens of GB in size, so disk space and fast I/O (SSD) are important. After running the program with the default almost no-op functions (substract one bit and add one bit), the dataset folder is around 12G in size, while the originals folder is 6.7G.

def submission():
    dataset = Validation()
    dataset.prepare_dataset()

    pool = multiprocessing.Pool(processes=multiprocessing.cpu_count())
    manager = multiprocessing.Manager()
    return_values = manager.dict()
    
    print("Starting submission testing")
    print("It can take 15mins to run on full dataset on a 32 cores machine")
    for folder in dataset.folders:
        pool.apply_async(folder_submission, args=(os.path.basename(folder), return_values))
    pool.close()
    pool.join()

    
    bdrates = return_values.values()
    bdrates = np.asarray(bdrates)
    score = np.nanmean(bdrates)
    print(f"Your score is {score}")
    print("The lower the better, even better if it's lower than 0")

submission()

Note: The notebook’s code can also be run as a plain Python script on the server without Jupyter if needed, which is often faster and more stable in headless environments.


Strategies #

The following example strategies may serve as inspiration:

  • Color Distribution Adjustment
  • Resolution Scaling
  • Noise Management
  • Intensity Transformation

Preprocessing Things to Consider #

The core of H.265’s efficiency is its ability to find and compress redundancies using prediction. A preprocessing filter, if not designed carefully, can disrupt this process. An operation that enhances details or sharpens edges can introduce new high-frequency components that the encoder’s prediction models cannot anticipate. This forces the codec to encode these new, unpredictable details as residuals, which requires more bits. This can lead to a direct contradiction of the goal of preprocessing: an increase in bitrate rather than a reduction.

High-frequency components significantly impact the coding bitrate, necessitating higher bits for the accurate representation and reconstruction involved in coding and transmission. This leads to increased bandwidth and storage costs.

https://arxiv.org/html/2508.08849v1

Postprocessing Things to Consider #

Banding artifacts appear as false, staircase-like edges in areas of a video that should have smooth, continuous gradients. This is a frequent problem caused by the limitations of low bit depths (e.g., 8-bit) and the quantization errors introduced by lossy compression. A post-processing operation like tone mapping can expose or exacerbate these artifacts if it further stretches an already limited tonal range. To combat this, a critical post-processing technique known as dithering is used.

Dither is an intentionally applied form of noise used to randomize quantization error “Quantization error”, preventing large-scale patterns such as color banding in images.*

https://en.wikipedia.org/wiki/Dither

Resolution Scaling #

When downscaling YUV video, it is a best practice to handle the separate Y (luma), U, and V (chroma) planes independently. Since human vision is more sensitive to brightness (luma) than color (chroma), chrominance data is often stored at a lower resolution than luma.

By separating the planes, we can apply a high-quality filter to the luma plane and a simpler, more efficient filter to the chroma planes without a noticeable drop in perceived quality. This leads directly to better performance, which is a key objective of the task.

For the luma channel, the goal is to retain as much detail and sharpness as possible while preventing aliasing. High-quality filters like Lanczos resampling are an excellent choice. Lanczos is a high-performance filter that provides a good balance between sharpness and the reduction of aliasing artifacts. Though, it can introduce ringing artifacts, which appear as subtle halos around sharp edges.

Since the kernel assumes negative values for a > 1, the interpolated signal can be negative even if all samples are positive. More generally, the range of values of the interpolated signal may be wider than the range spanned by the discrete sample values. In particular, there may be ringing artifacts just before and after abrupt changes in the sample values, which may lead to clipping artifacts - https://en.wikipedia.org/wiki/Lanczos_resampling

This directly ties with:

The main cause of ringing artifacts is due to a signal being bandlimited (specifically, not having high frequencies) or passed through a low-pass filter; this is the frequency domain description. - https://en.wikipedia.org/wiki/Ringing_artifacts

For the chroma planes, efficiency matters more, because those channels are not that visually important, we can use less computationally expensive filters. One common practice is to use simpler interpolation filters like bicubic or bilinear for the U and V planes. These filters are less prone to ringing artifacts than Lanczos.

Noise Management #

Grain is high-frequency noise, which is spatially random, so it is very expensive to compress. By denoising before compression, you reduce bitrate.

But why is noise needed? Even though noise might seem undesirable, film grain is part of the original artistic content. Removing it entirely changes the look of the video in a way that can be visually “flat”. To address this, there is a new approach known as Film Grain Synthesis (FGS) that could be used:

  • The process begins by applying a denoiser to the source video to remove the film grain. The encoder then compresses this now “clean” or denoised video, which is far more compressible than the original noisy version. A film grain parameterization process analyzes the difference between the original noisy video and the denoised version to estimate a statistical model of the grain’s characteristics, such as its variance, spatial frequency, and color correlation.

  • Instead of compressing the grain itself, the encoder transmits this compact film grain model as metadata alongside the compressed video bitstream. For HEVC, this is accomplished through the use of Supplemental Enhancement Information messages.

  • At the decoder, the clean video is first reconstructed. The player then parses the FGC SEI messages, which contain the film grain model parameters. A film grain synthesis process uses these parameters to generate new, simulated grain and blend it back into the decoded video before it is displayed.

https://norkin.org/research/film_grain/index.html


BD-Rate #

The code runs the bench_submission function twice, once with is_anchor=True and once with is_anchor=False, to generate two separate rate-distortion curves. This is the core methodology for calculating the Bjontegaard Delta (BD) rate, which is the final score we are trying to compute.

The BD-rate is a single-value metric that quantifies the average bitrate savings of a new video compression method compared to a reference method (the “anchor”) for the same level of quality.

  • A negative BD-rate means the method is more efficient, saving bits.
  • A positive BD-rate means the method is less efficient, requiring more bits.

To calculate this:

  • A set of (bitrate, PSNR) points that represent the baseline performance.
  • A set of (bitrate, PSNR) points that represent the performance of the new method.

After running it with very simple, dummy preprocess and postprocess functions with only two QPs (at a minimum, you need two points (two QPs producing two bitrate–quality pairs) per curve to compute BD-Rate):

BD-rate for AOV_1920x1080_60_yuv420p_04: 0.1780999638542724
bdrates: [np.float64(0.1780999638542724)]
Your score is 0.1780999638542724

A positive BD-Rate means the pre/post-processing is worse than the anchor. In this case, 0.1781 (~17.8%) indicates that, on average, the processed frames require 17.8% more bitrate to reach the same quality as the original frames. Basically, the current dummy preprocessing/postprocessing is hurting compression efficiency rather than improving it. I need to implement strategies that reduce bitrate without degrading reconstructed quality to get a negative BD-Rate. This was to be expected, though it is more important now that the setup works and I can now start the real problem-solving in the processing modules.

Next Steps #

This document establishes the groundwork for basic preprocessing and postprocessing strategies in HEVC/H.265 compression. While no optimized solution has been finalized yet, the environment, and dataset handling pipeline are in place.