H4M

From MultimediaWiki
Revision as of 04:20, 29 July 2021 by Kostya (talk | contribs) (update information)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

H4M is game format that uses AOT (Adaptive Orthogonalized Transform) based vector quantization image compression.

  • HVQM3 is used on N64 Mario Party series games
  • HVQM4 is used on some Nintendo GameCube Biohazard/Resident Evil series games

H4M files begin with the ASCII characters "HVQM" followed by versioning information. After 68-byte header a number of blocks follows with each block grouping several audio and video frames.

H4M header

All values are big-endian.

 5 bytes - "HVQM4"
 4 bytes - version(e.g. " 1.3" or " 1.5")
 7 bytes - padding?
 4 bytes - header size
 4 bytes - data size
 4 bytes - number of blocks
 4 bytes - total number of video frames
 4 bytes - total number of audio frames
 4 bytes - frame duration in microseconds
 4 bytes - maximum video frame size?
 4 bytes - unknown
 4 bytes - maximum audio frame size?
 2 bytes - width
 2 bytes - height
 1 byte  - horizontal subsampling
 1 byte  - vertical subsampling
 1 byte  - video mode
 1 byte  - unknown
 1 byte  - number of channels
 1 byte  - bits per audio sample
 1 byte  - audio format
 1 byte  - number of audio tracks minus one
 4 bytes - sampling rate

Each block may contain several audio and video frames and starts with the following 20-byte header:

 4 bytes - previous block size
 4 bytes - current block size
 4 bytes - number of video frames in the block
 4 bytes - number of audio frames in the block
 4 bytes - last block flag?

A frame has its own header:

 2 bytes - frame data type (0 - audio, 1 - video)
 2 bytes - frame type (for video 0x10 - I-frame, 0x20 - P-frame, 0x30 - B-frame)
 4 bytes - frame size

Video coding

Video compression is based on so-called Adaptive Orthogonal Transform, Huffman coding, zero-run coding and halfpel motion compensation. AOT means decomposing block into several orthogonal bases and coding just those bases positions in the special matrix (call nest) and scaling coefficients.

Frame data is split into chunks, each of them containing data of the same type (DC values, zero run values etc). Some of those data chunks are coupled together, so one data chunk contains Huffman tree definition used in several other data sources (e.g. DC values for all planes use the definition stored in luma DC data, there's single tree definition for motion vector components, all zero-run data sources rely on single tree definition as well). There's one special data source used for raw block coefficients and basis parameters that stores only bytes and 16-bit values and thus has no tree definition.

Many block types in plane coded using zero runs: if the decoded coefficient value is zero then you should read the length of the following zero run from the paired tree.

I-frame structure

I-frame header contains the following:

  • DC shift
  • post-transform shift
  • nest parameters
  • offsets to block type and zero run data for both luma and chroma
  • offsets to DC, AOT basis and raw block data for each plane
  • offsets to DC zero run data for each plane

Frame reconstruction involves the following steps:

  • reading transform type for each 4x4 block on each plane
  • decoding block DCs (with simple top and left prediction)
  • reconstructing each block depending on transform
    • type 0 is interpolating block contents from its own DC and DCs of its 4 neighbours (top, left, right and bottom)
    • types 1-5 means combining 1-5 bases selected from the 70x38 nest
    • type 6 is raw block (with coefficients coming from a special data source)
    • type 8 is simple DC-only transform (fill)
    • other types are unused

P- and B-frame structure

P-frames are decoded in the same way as B-frames except that the previous reference frame is also used as the next reference.

P- and B-frames contain these data sources in addition to the I-frame data (but there's no DC zero run data):

  • motion vector components
  • MC block type and procedure

Reconstruction is slightly different as well:

  • decode P/B block types
  • decode DC values for intra blocks
  • decode motion block sub-block types
  • if block is intra, decode it the same way as in I-frames
  • otherwise
    • decode halfpel motion vector (there's no MV prediction though)
    • for block type 6 just read raw data, for block type 0 just perform motion compensation, for block types 1-5 perform coefficient reconstruction like in intra frame and add to the motion compensated block.

Audio coding

Formats:

  • 0 - IMA ADPCM (data starts with status word, top 9 bits - predictor, low 7 bits - step)
  • 1 - PCM
  • 4 - simple DPCM (with 8-bit quantiser and one of three step tables to select from)

Games Using H4M

These games use H4M files. Observed format versions are noted where known.