NXV is a container format used by the 2006 MP4 Player Watch manufactured by "Shenzhen Adragon Digitek" (China). The wrist watch is reported to support MP4, WMV and WMA, however this is achieved by first converting the files to the NXV format.
The format consists of a sequence of 512-bytes packets.
bytes 0-11 ASCIIZ magic ("NXV File") bytes 12-19 ASCIIZ version ("1.0.0", "3.0.1" or "3.0.2") byte 20 width (pixels) byte 21 height (pixels) byte 22 always 0 byte 23-511 unknown but required for playback (see below)
The unknown sequence requires a couple of bytes set for playback to work on the watch. The official encoder sets different values for different files, but just using one known sequence seems to work fine. One known good sequence is byte 0x17 = 0x87 and byte 0x87 = 0x02, and 0 for the rest of the values in the block.
The second value in this sequence is the number of audio buffers between video buffers. The first value appears to be an offset to the second.
The a/v sequence commences at byte 512 with an audio packet:
uint curSequence = 0 while !eof u8 audio payload u8 unknown be32 sequence number le16 length (bytes) le16 unknown (pixels == bytes/2?) if (sequence == curSequence) curSequence += 1 u8[length] video payload
Note: This encoding is a superset of what the actual watch supports, but will correctly decode all known NXV files from the encoder. On the watch, the number of audio packets between video packets isn't determined by dynamically detecting a counter but rather by the number in the header. Values of 1, 2, and 4 have been tested (anything larger is too low of a framerate to be interesting for most videos)
The audio payload is an MP3 stream, including RIFF WAVE headers.
Raw video - be16 in 565 RGB format. If the version is "1.0.0", the video is at full-resolution. If the version is "3.0.1", the video is at quarter-resolution; that is, the video data is 1/4 of the size indicated and is scaled in each direction by two on playback. Version "3.0.2" seems to indicate a different format - likely sixteenth-resolution.
- Supported video resolutions are: 96x64, 96x80, 96x96, 128x96, 128x128, 160x128, 176x128. Despite the reduction in video resolution, the resulting NXV file size exceeds that of the input video file.
- Intermediate files are used by the NxvConverter program to store the audio and video payloads (filename.mp3 and filename.tmp) prior to muxing to the NXV file.
- All NXV files examined so far have one frame per video packet and the packets appear at regular intervals. Low and Mid quality have one frame per four audio packets; High quality has one frame for every two audio packets.
- The content of the sequence and length values in an audio packet not followed by a video packet is unknown - probably nothing.
- Example file is an encoding of public domain footage from Internet Archive: Moving Image Archive