VMware Video

From MultimediaWiki
Jump to navigation Jump to search

The VMware Workstation (a commercial x86 machine emulator) can record sessions using this codec. Read the brief story behind the reverse engineering of this codec. It is a lossless codec.

Cursory investigation details in this blog post: http://codecs.multimedia.cx/?p=9

The format is now fully documented.

Bitstream Format

Basically it is just a recorded session of the RFB protocol, which is used by VNC. In VMware products, the VNC bit stream is generated by an encoder baked into the virtual hardware. This allows the recording of movies without running a VNC server in the virtual machine (e.g. recording videos in VGA mode, or for virtual machines which do not have networking enabled).

Specifically, the bitstream consists of a series of FrameBufferUpdate RFB messages. Special encoding types within this handle describing the image format (equivalent to ServerInitialization messages). Timestamps are provided by the enclosing container format; everything else is within the messages documented here.

VMware's extensions to the RFB protocol are documented below.

Bitstream structure (this is the VNC FrameBufferUpdate format), everything is big-endian ordered:

 8bit   message type (always 0, indicating FrameBufferUpdate)
 8bit   padding
 16bit  number of rectangles coded

For each rectangle:

 16bit  x position
 16bit  y position
 16bit  coded width
 16bit  coded height
 32bit  encoding type

Encoding type might be 0 to 5 as found in the RFB Protocol specification, but current samples reveal that type 5 (HexTile) is the most common and sometimes type 0 (Raw) occur.

Since the VMNC stream is simply a repacketized RFB stream, a decoder should not be suprised to find an rectangle encoding described by the official protocol specification or on this document in the stream (for example, ZRLE).

RFB encoding types

  • 0x0 Raw block
width * height * depth bits of raw picture
  • 0x1 copy rectangle from x,y in previous frame
16bit  x position in previous frame
16bit  y position in previous frame
  • 0x2 RRE (rise-and-run length encoding) encoded data
  • 0x4 CoRRE encoded data
  • 0x5 Hextile encoded data

WMVd (cursor data)

This block contains cursor bits and mask where cursor hot spot is defined by x,y position of image.

WMVd data:

 8bit cursor type - Either 0 to indicate color cursor, or 1 to indicate an alpha cursor.
 8bit padding - ignore

A color cursor type is followed by:

 width*height*bpp bits - cursor bits
 width*height*bpp bits - cursor mask

Color cursors should be drawn by simple code:

 dst[i] = (dst[i] & bits[i]) ^ mask[i];

An alpha cursor type is followed by:

 width*height*4 bits of 32-bit RGBA data indicating the cursor.

Alpha cursors should be drawn by compositing the cursor image into the framebuffer.

WMVe (cursor state)

 16bit   flags

Describes the state of the cursor.

  bit 0x01 - cursor visible.
  bit 0x02 - cursor absolute.  
  bit 0x04 - cursor warp.

If cursor visible is not set, the cursor should not be rendered into the framebuffer until another cursor state packet is received that turns the cursor back on.

The cursor absolute bit indicates whether the virtual machine was using an absolute or relative mouse at the time of the recording session. It is irrelevant for playback.

The cursor warp bit is sit when the virtual machine artificially moves the position of the cursor. It is also irrelevant for playback.

WMVf (cursor position)

This block is empty and x,y position defines the new position of cursor hot spot (NOT the top left corner of cursor image).

WMVg (keyboard typematic info)

This block describes the typematic info for the virtual keyboard device. It is irrelevant for playback.

  16bit   typematic on - 1 if the VNC client should handle key repeat, 0 if not.
  32bit   period - The period to wait between key repeats.
  32bit   delay - The delay for the first key repeat.

WMVh (keyboard LED state)

This block describes the keyboard LED state of the virtual machine. It's irrelevant for playback (unless you *really* want to change the state of the client keyboard when the video is playing back. That might be neat, actually).

  32bit   leds flags.

Each bit indicates whether the LED is on or off.

  bit 0x01 - Scroll lock
  bit 0x02 - Num lock
  bit 0x04 - Caps lock

WMVi (display mode change)

Indicates a change in the display size. VMware added this encoding for uses not addressed by the DesktopSize RFB pseudo-encoding. Specifically, the DesktopSize encoding does not allow the VNC server to redefine both the color depth and size of the framebuffer. This is useful when the client prefers to receive the framebuffer native color depth at all times. It is defined to be similar to the ServerInitialisation header to facilitate client implementations.

The new width and height of the display are specified in the rectangle header for the packet. The rest of the format is described below:

 8bit  bits per sample
 8bit  depth
 8bit  color stored in big endian order
 8bit  this is TrueColor (i.e. not requring palette)
 16bits  maximum value of red
 16bits  maximum value of green
 16bits  maximum value of blue
 8bit  red value shift
 8bit  green value shift  
 8bit  blue value shift
 24bit  padding

This block occurs at the beginning of the file, before every keyframe, and whenever the format of the stream changes (for example, when the virtual machine changes display resolution).

WMVj (virtual machine errata state)

  16bit flags.

These flags describe various pieces of virtual machine state. They are useful in the implementation of a remote VNC client, and are irrelevant for playback. They are documented here for completeness.

  bit 0x01 - an end-user sitting at a local virtual machine console has entered fullscreen mode.
  bit 0x02 - an end-user sitting at a local virtual machine console has temporarily disabled VNC updates.