RE process
The not so simple introduction to RE'ing of multimedia codecs.
Information gathering
Try to collect as much public knowledge as possible.
- The first thing to do is to collect/create sample files. Without samples files there is nothing to test/verify a reimplementation.
- The second thing is to collect different decoder for the codec. Sometimes debug symbols are available in one binary but not the other.
- Read the product/codec whitepapers, they are mostly useless but can give a hint of the techniques used in the codec.
- Sometimes looking into frame data can give you enough information without even having a decoder. For example, constant frame sizes usually mean DPCM or similar compression; deflated data starts with 'x'; MS RLE data can be detected by special codes at the end; some individuals can even recognise H.26x bitstream.
Get it running
If you get a binary running it is much more easy to figure out how it works exactly. Getting it running in a controlled environment is even better.
- Mplayer has a dll loader which makes it very easy to load acm and dmo codecs.
- Use the technique described here to load a binary under Linux: http://multimedia.cx/pre/re-xan.html
- Run it on the original platform in some form.
Picking it apart
Load up the codec in a dissassembler.
- Get Idapro and let it chew through the binary.
- Somehow locate the code paths in the codec that is used. http://multimedia.cx/eggs/category/reverse-engineering/callret-monitor/ describes one way.
Hijack code flow
If the codec is running in a deterministic way it is possible to runtime patch the code to replace call addresses. The following macros can be used:
#define insert_native_addr(address, replacement) \ { \ unsigned int *padd = (unsigned int *)address; \ *padd = (unsigned int*)replacement; \ }; #define insert_native_call(address, replacement) \ { \ *((uint8_t*)address) = 0xe8; \ *((int32_t*)(address+1)) = (int) replacement - address - 5; \ };
When you redirect the functions make sure you declare your replacement function with the right call convention.
Alternatively, you can install debug trap and use it to gather data at any given point of execution. It may be slow but certainly replaces debugging with GDB.
Strategy
Most codecs work in a init, decode, close fashion. The init step allocated a codec private context that is then passed to the decode function. The private context is used by the codec to store it's internal state needed for decoding. The close function just cleans up the context. First start with reverse engineering the init and use a malloc wrapper to figure out the structure of the private context. Then go for the main decode function. It usually takes the private context as arguments and the indata buffer and size. From here the progress is open ended, try to tackle functions at the end of the decode call tree and at the start. That way knowledge about parameters can propagate through the code.
Reimplementation
When all the code is understood replace the code with lavc equivalents and re implement the stuff that is missing.