CMV

From MultimediaWiki
Jump to: navigation, search

The CMV is a video format introduced in Creative's ZEN MX. It probably stands for 'Creative Media Video' (like WMV stands for Windows Media Video). However, there are no resources on this. It is much a guess. In this format Creative has attempted to use as little licensed codecs and complexity as possible. The compression performance of the format is terrible, about 10MB per minute at 320x240 at 25fps.

Technical description

CMV consists of two parts, the audio, and the video.

CMV generial build up:
0        ->              n
+-------+----------------+
| Audio | Video          |
+-------+----------------+

There is no interlacing. The player hops between the audio and video every second. There is also no global format header, it immediately starts with the audio. The audio and the video parts can be concatenated 'cat' command: They are totally separated.

Audio

The first part of the file immediately starts with a RIFF header. It is a Microsoft Wave (.WAV) file. The audio is IMA ADPCM encoded. So, one sample is 4 bits. Its a 24kHz stereo signal. This means its bit rate is 192kbps, but don't expect MP3 quality. The length (in time) is the same as that of the video. See the specifications for more details.

Video

Right after the WAV file, the video start. This is no complicated MPEG-4, or even MPEG-2 or WMV. Basically it's a variant of motion JPEG.

The build up is like this:
+--------+---------+---------+--- - ---+---------+
| HEADER | CHUNK 1 | CHUNK 2 |   ...   | CHUNK n |
+--------+---------+---------+--- - ---+---------+

Header

The header describes the video part of the file. All values are little-endien encoded, eg. A number (decimal) 1025 is in 3 bytes encoded as 01 04 00 (hex). The odd thing is that the whole header seems to use a 3 byte word size... to make things easy :).

Offset	Length	Description
------ ------- ----------------------------------------------------------------------------
0      3       Signature, "CMV"
3      3       Major version, char encoded = "001"
6      3       Minor version, char encoded = "000"
9      3       Width of video pixels
12     3       Height of video in pixels
15     3       ? horizontal block size, usually 16
                 - used for rounding the dimensions of the video
18     3       ? vertical block size, usually 16
                 - used for rounding the dimensions of the video
21     3       chunk-size in bytes
24     3       number of frames per chunk (usually == FPS) <--+
27     3       Number of frames in file                       |
30     3       unknown, value = 1                             | May be the other way around
33     3       unknown, value = 0                             |
36     3       FPS (usually 25)                            <--+
39     12      0x010101 0x010101 0x010101 0x010101 unknown / reserved
51             end of header 

All descriptions starting with a question mark are guesses.

Chunk

A chunk starts with 3 bytes. This is the length of the actual data. The rest of the chunk is 0 padded until is reaches the chunk-size as defined in the header.

|--------- chunk size ------------------------|
      |------------ LEN ------------|
+-----+-----------------------------+---------+--- -
| LEN | DATA (JPEG)                 | padding | next LEN
+-----+-----------------------------+---------+--- -

The data is an actual *one* JPEG. Containing "number of frames per chunk" images. These images are appended below each other. So, each frame, is one image in the big jpeg.

The dimensions of one image can be calculated by taking the width or height as specified in the header, and rounding it up to a multiple of the block-size. Usually this is 16. So suppose your video is 320x200, this means that the actual image size is 320x208, since 320 is a multiple of 16, but 200 is not. The remainder of the image (the 8 pixels at the bottom) are just black pixels.

The dimensions of the actual JPEG are as follows:

width = rounded image width
height = rounded image height * frames per chunk.

So, suppose the video is 320x200, the actual image is 320x208, the JPEG is 320 x 208 * 25 -> 320 x 5200, looking like this:

|---- 320 -------|
+----------------+  -
|                |  |
|    Frame 1     | 208 (of which only the first 200 lines contain video,
|                |  |   rest are black pixels)
+----------------+  -
|                |
|    Frame 2     |
|                |
+----------------+
|                |
       ...
|                |
+----------------+
|                |
|    Frame n     | (usually 25)
|                |
+----------------+

The last chunk may end at any frame. Remaining frames in the JPEG will be black.