YUV4MPEG2

From MultimediaWiki
Jump to navigation Jump to search

YUV4MPEG2 is a simple file format designed to hold uncompressed frames of YCbCr video formatted as YCbCr 4:2:0, YCbCr 4:2:2 or YCbCr 4:4:4 data for the purpose of encoding, likely to MPEG-2.

The part "YUV" in its name just derives from the fact that the color space YCbCr (used for color encoding in digital media) is often falsely mixed up with the color space YUV (used in analog PAL based applications, including analog TV and video tapes).

The file format does not specify what color matrix is used. According to the original MJPEGTools documentation, it is always supposed to be BT.601, but nobody actually cares: everyone just adapts y4m to whatever needs arise for testing their codec. There is also no widely-recognized extension field for indicating which matrix is in use – good luck!

Data format

File header

A Y4M file begins with a plaintext, quasi-freeform header. The first 10 bytes are a file signature of YUV4MPEG2 (last character is a space, ASCII 0x20). Following the signature is any number of parameters preceeded by a space (ASCII 0x20). The parameters that should definitely be present are width, height, and frame rate:

  • frame width: W followed by a plaintext integer; example: W720
  • frame height: H followed by a plaintext integer; example: H480
  • frame rate: F followed by the number of frames per second, expressed as a fraction in the form numerator:denominator. Examples:
    • F30:1 = 30 FPS
    • F25:1 = 25 FPS (PAL/SECAM standard)
    • F24:1 = 24 FPS (Film)
    • F30000:1001 = 29.97 FPS (NTSC standard)
    • F24000:1001 = 23.976 FPS (Film transferred to NTSC)

The following parameters are optional:

I, interlacing
Ip = Progressive
It = Top field first
Ib = Bottom field first
Im = Mixed modes (detailed in FRAME headers)
A, pixel aspect ratio
Note that this is not the ratio of the picture as a whole, just the pixels. Any integer ratio is acceptable. Examples:
  • A0:0 = unknown
  • A1:1 = square pixels
  • A4:3 = NTSC-SVCD (480x480 stretched to 4:3 screen)
  • A4:5 = NTSC-DVD narrow-screen (720x480 compressed to a 4:3 display)
  • A32:27 = NTSC-DVD wide-screen (720x480 stretched to a 16:9 display)
C, Colour space
Called "colorspace", actually defines pixel format and color model.
  • C420jpeg = 4:2:0 with biaxially-displaced chroma planes ("center": 0.5 right, 0.5 down)
  • C420paldv = 4:2:0 with coincident R and vertically-displaced B (down 1 px)
  • C420mpeg2 = 4:2:0 with vertically-displaced chroma planes (MPEG-2: B and R have same displacement of down 0.5 px)
  • C420 = 4:2:0. Not found in original spec, so interpretations differ:
    • with coincident chroma planes, according to rav1e (arguably the more useful interpretation)
    • with JPEG displacement, according to ffmpeg and aom.
  • C422 = 4:2:2 coincident
  • C444 = 4:4:4 coincident
  • C444alpha = 4:4:4 plus a alpha channel of the same size (not supported by libwebm)
  • Cmono = YCbCr plane only
For a much more exhaustive list, check out documentation for y4m::Colorspace, aom-av1 y4menc.c, and ffmpeg yuv4mpegenc.c. Yes, there are higher bit-depth options. If you tell ffmpeg to be less strict, there are even wackier possibilities.
If you have trouble figuring out the difference between C420, C420jpeg, C420paldv, and C420mpeg2 this might help: https://chromium.googlesource.com/webm/libvpx/+/refs/heads/main/y4minput.c
X, Comment or extension.
Ignored, but preserved and passed down, by a YUV4MPEG2 processor.
FFMPEG defines, among others, the following extensions:
  • XCOLORRANGE, defines the color range.
    • XCOLORRANGE=FULL JPEG / Full swing / ITU T.871 range
    • XCOLORRANGE=LIMITED MPEG / Studio swing / TV range
  • XYSCSS, indicates non-standard pixel formats not quite described by C

Frame data

Following the header is any number of frames coded in YCbCr format in Y-Cb-Cr plane order. Each frame begins with the 5 bytes FRAME followed by zero or more parameters each preceded by 0x20, ending with 0x0A. The original specification defines an I parameter for describing the interlace (in a format much more complex than the file-header I) and an X parameter for comments. However, tools rarely go through the trouble of trying to understand frame headers.

This is then followed by the raw bytes from each plane.

The length of each frame (excluding its header) can be computed as:

  • frame length = width * height * 3 / 2 (4:2:0)
  • frame length = width * height * 2 (4:2:2)
  • frame length = width * height * 3 (4:4:4)

Notes

  • FFmpeg yuv4mpegdec's interpretation of C sample locations surprisingly differ from the original. Instead of noting C422 as coincidental, the decoder only reads "unknown".
  • FFmpeg's yuv4mpegenc taxonomy uses C420paldv for "top-left", i.e. coincidental. This is obviously wrong, but nothing can be done without sacrificing the real PAL-DV: ffmpeg uses one AvChromaLocation to describe where both chroma samples are taken.