FFmpeg codec HOWTO: Difference between revisions

From MultimediaWiki
Jump to navigation Jump to search
(added a skip_bits step)
(30 intermediate revisions by 12 users not shown)
Line 2: Line 2:
It will also show how the codecs are connected with the demuxers. This is by
It will also show how the codecs are connected with the demuxers. This is by
no means a complete guide but enough to understand how to add a codec to FFmpeg.
no means a complete guide but enough to understand how to add a codec to FFmpeg.
Cook is used as an example throughout.
[[RealAudio cook|Cook]] is used as an example throughout.


== libavcodec/avcodec.h ==
== registering the codec ==
 
=== libavcodec/avcodec.h ===
The first thing to look at is the AVCodec struct.
The first thing to look at is the AVCodec struct.


Line 27: Line 29:
pointers for init/encode/decode and close. Now lets see how it is used.
pointers for init/encode/decode and close. Now lets see how it is used.


== libavcodec/cook.c ==
=== libavcodec/cook.c ===
If we look in this file at the bottom we can see this code:
If we look in this file at the bottom we can see this code:


  AVCodec cook_decoder =
  AVCodec cook_decoder =
  {
  {
     .name = "cook",
     .name           = "cook",
     .type = CODEC_TYPE_AUDIO,
     .type           = CODEC_TYPE_AUDIO,
     .id = CODEC_ID_COOK,
     .id             = CODEC_ID_COOK,
     .priv_data_size = sizeof(COOKContext),
     .priv_data_size = sizeof(COOKContext),
     .init = cook_decode_init,
     .init           = cook_decode_init,
     .close = cook_decode_close,
     .close         = cook_decode_close,
     .decode = cook_decode_frame,
     .decode         = cook_decode_frame,
  };
  };


First we get an AVCodec struct named cook_decoder. And then we set the variables of cook_decoder. Note that we only set the variables that are needed. Currently there is no encoder so we don't set any. If we now look at the id variable we can see that CODEC_ID_COOK isn't defined in libavcodec/cook.c. It is declared in avcodec.h.
First we get an AVCodec struct named cook_decoder. And then we set the variables of cook_decoder. Note that we only set the variables that are needed. Currently there is no encoder so we don't set any. If we now look at the id variable we can see that CODEC_ID_COOK isn't defined in libavcodec/cook.c. It is declared in avcodec.h.


== libavcodec/avcodec.h ==
=== libavcodec/avcodec.h ===


Here we will find the CodecID enumeration.
Here we will find the CodecID enumeration.
Line 61: Line 63:
This is all enough to declare a codec. Now we must register them for internal use also. This is done at runtime.
This is all enough to declare a codec. Now we must register them for internal use also. This is done at runtime.


== libavcodec/allcodecs.c ==
=== libavcodec/allcodecs.c ===
In this file we have the avcodec_register_all() function, it has entries like this for all codecs.
In this file we have the avcodec_register_all() function, it has entries like this for all codecs.


  ...
  ...
#ifdef CONFIG_COOK_DECODER
    REGISTER_DECODER(COOK, cook);
    register_avcodec(&cook_decoder);
#endif //CONFIG_COOK_DECODER
  ...
  ...


The register_avcodec() call registers a codec for internal use.
This macro expands to a register_avcodec() call which registers a codec for internal use.
The defines around the register call are used so it is possible to not compile the decoder code for a specific codec.
Note that register_avcodec() will only be called when CONFIG_COOK_DECODER is defined.
But what defines CONFIG_COOK_DECODER? This is extracted by configure with this command line:
This allows to not compile the decoder code for a specific codec.
But where is it defined? This is extracted by configure with this command line:
 
sed -n 's/^[^#]*DEC.*, *\(.*\)).*/\1_decoder/p' libavcodec/allcodecs.c
 
So adding a REGISTER_DECODER(NEW, new) entry in allcodecs.c and reconfigure is enough to add the needed define. Now we have everything to hookup a codec.
 
=== libavcodec/Makefile ===
In this file we define the objects on which a codec depends. For example, cook uses fft and mdct code so it depends on the mdct.o and fft.o object files as well as the cook.o object file.
 
...
OBJS-$(CONFIG_COOK_DECODER)            += cook.o mdct.o fft.o
...


grep 'register_avcodec(&[a-z]' libavcodec/allcodecs.c  | sed 's/.*&\(.*\)).*/\1/'
== FFmpeg demuxer connection ==


So adding a register_avcodec(&new_decoder) entry in allcodecs.c and reconfigure is enough to add the needed define.
: ''[[FFmpeg demuxer howto]]''
One more thing to note is that cook.c isn't included in allcodecs.c so the symbol cook_decoder can't be
found. Thus it has to be declared somewhere, that happens in avcodec.h and it is declared as an external symbol.


Now we have everything to hookup a codec now we will see how the codec. For this we look into libavformat.
=== libavformat/rm.c ===


== libavformat/rm.c ==
If we think of an imaginary rm file that ffmpeg is about to process, the first thing that happens is that it is identified
If we think of an imaginary rm file that ffmpeg is about to process, the first thing that happens is that it is identified
as a rm file. It is passed on to the rm demuxer (rm.c). The rm demuxer looks through the file and finds out that it is a
as a rm file. It is passed on to the rm demuxer ([http://svn.mplayerhq.hu/ffmpeg/trunk/libavformat/rmdec.c?view=markup rmdec.c]). The rm demuxer looks through the file and finds out that it is a
cook file.
cook file.


Line 94: Line 103:
Now ffmpeg knows what codec to init and where to send the payload from the container. So back to cook.c and the initialization process.
Now ffmpeg knows what codec to init and where to send the payload from the container. So back to cook.c and the initialization process.


== libavcodec/cook.c Init ==
== codec code ==
 
=== libavcodec/cook.c Init ===
After ffmpeg knows what codec to use, it calls the declared initialization function pointer declared in the codecs AVCodec struct. In
After ffmpeg knows what codec to use, it calls the declared initialization function pointer declared in the codecs AVCodec struct. In
cook.c it is called cook_decode_init. Here we setup as much as we can before we start decoding. The following things should be handled in the init, vlc table initialization, table generation, memory allocation and extradata parsing.
cook.c it is called cook_decode_init. Here we setup as much as we can before we start decoding. The following things should be handled in the init, vlc table initialization, table generation, memory allocation and extradata parsing.


== libavcodec/cook.c Close ==
=== libavcodec/cook.c Close ===
The cook_decode_close function is the codec clean-up call. All memory, vlc tables, etc. should be freed here.
The cook_decode_close function is the codec clean-up call. All memory, vlc tables, etc. should be freed here.


== libavcodec/cook.c Decode ==
=== libavcodec/cook.c Decode ===
In cook.c the name of the decode call is cook_decode_frame.
In cook.c the name of the decode call is cook_decode_frame.


Line 110: Line 121:


The function has 5 arguments:
The function has 5 arguments:
* avctx is a pointer to a AVCodecContext
* avctx is a pointer to an AVCodecContext
* data is the pointer to the outbuffer
* data is the pointer to the output buffer
* data_size is a variable that should be set to the outbuffer size in bytes
* data_size is a variable that should be set to the output buffer size in bytes (this is usually the number of samples decoded * the number of channels * the byte size of a sample)
* buf is the pointer to the inbuffer
* buf is the pointer to the input buffer
* buf_size is the size of the inbuffer
* buf_size is the byte size of the input buffer


The decode function shall return the amount of bytes consumed from the inbuffer.
The decode function shall return the number of bytes consumed from the input buffer or -1 in case of an error. If there is no error during decoding, the return value is usually buf_size as buf should only contain one 'frame' of data. Bitstream parsers to split the bitstream into 'frames' used to be part of the codec so a call to the decode function could have consumed less than buf_size bytes from buf. It is now encouraged that bitstream parsers be separate.




That's how it works without too much detail.
That's how it works without too much detail.


 
=== The Glue codec template ===
== The Glue codec template ==
The imaginary Glue audio codec will serve as a base to exhibit bitstream reading, vlc decoding and other things.
The imaginary Glue audio codec will serve as a base to exhibit bitstream reading, vlc decoding and other things.
The code is purely fictional and is sometimes written purely for the sake of example. No attempt is made to prevent invalid
The code is purely fictional and is sometimes written purely for the sake of example. No attempt is made to prevent invalid
Line 129: Line 139:
The Glue codec follows.
The Glue codec follows.


  /* First we include some default includes */
[http://wiki.multimedia.cx/index.php?title=FFmpeg_codec_howto&oldid=7347 non-colored version]
  #include <math.h>
 
  #include <stddef.h>
  <span style="font-style: italic;color: #808080;">/* The following includes have the bitstream reader, various dsp functions and the various defaults */</span>
  #include <stdio.h>
<span style="color: #008000;">#define ALT_BITSTREAM_READER</span>
  <span style="color: #008000;">#include "avcodec.h"</span>
  <span style="color: #008000;">#include "bitstream.h"</span>
  <span style="color: #008000;">#include "dsputil.h"</span>
   
   
  /* The following includes have the bitstream reader, various dsp functions and the various defaults */
  <span style="font-style: italic;color: #808080;">/* This includes the tables needed for the Glue codec template */</span>
  #define ALT_BITSTREAM_READER
  <span style="color: #008000;">#include "gluedata.h"</span>
#include "avcodec.h"
#include "bitstream.h"
#include "dsputil.h"
   
   
/* This includes the tables needed for the Glue codec template */
#include "gluedata.h"
   
   
<span style="font-style: italic;color: #808080;">/* Here we declare the struct used for the codec private data */</span>
<span style="font-weight: bold;color: #000000;">typedef</span><span style="color: #000000;"> </span><span style="font-weight: bold;color: #000000;">struct</span><span style="color: #000000;"> {</span>
<span style="color: #000000;">    GetBitContext      gb;</span>
<span style="color: #000000;">    FFTContext          fft_ctx;</span>
<span style="color: #000000;">    VLC                vlc_table;</span>
<span style="color: #000000;">    MDCTContext        mdct_ctx;</span>
<span style="color: #000000;">    </span><span style="color: #800000;">float</span><span style="color: #000000;">*              sample_buffer;</span>
<span style="color: #000000;">} GLUEContext;</span>
   
   
/* Here we declare the struct used for the codec private data */
typedef struct {
    GetBitContext      gb;
    FFTContext          fft_ctx;
    VLC                vlc_table;
    MDCTContext        mdct_ctx;
    float*              sample_buffer;
} GLUEContext;
   
   
<span style="font-style: italic;color: #808080;">/* The init function */</span>
<span style="color: #800000;">static</span><span style="color: #000000;"> </span><span style="color: #800000;">int</span><span style="color: #000000;"> glue_decode_init(AVCodecContext *avctx)</span>
<span style="color: #000000;">{</span>
<span style="color: #000000;">    GLUEContext *q = avctx-&gt;priv_data;</span>
   
   
  /* The init function */
  <span style="color: #000000;">    </span><span style="font-style: italic;color: #808080;">/* This imaginary codec uses one fft, one mdct and one vlc table. */</span>
  static int glue_decode_init(AVCodecContext *avctx)
  <span style="color: #000000;">    ff_mdct_init(&amp;q-&gt;mdct_ctx, </span><span style="color: #0000ff;">10</span><span style="color: #000000;">, </span><span style="color: #0000ff;">1</span><span style="color: #000000;">);    </span><span style="font-style: italic;color: #808080;">// 2^10 == size of mdct, 1 == inverse mdct</span>
  {
  <span style="color: #000000;">    ff_fft_init(&amp;q-&gt;fft_ctx, </span><span style="color: #0000ff;">9</span><span style="color: #000000;">, </span><span style="color: #0000ff;">1</span><span style="color: #000000;">);      </span><span style="font-style: italic;color: #808080;">// 2^9 == size of fft, 0 == inverse fft</span>
    GLUEContext *q = avctx->priv_data;
<span style="color: #000000;">    init_vlc (&amp;q-&gt;vlc_table, </span><span style="color: #0000ff;">9</span><span style="color: #000000;">, </span><span style="color: #0000ff;">24</span><span style="color: #000000;">,</span>
<span style="color: #000000;">          vlctable_huffbits, </span><span style="color: #0000ff;">1</span><span style="color: #000000;">, </span><span style="color: #0000ff;">1</span><span style="color: #000000;">,</span>
<span style="color: #000000;">          vlctable_huffcodes, </span><span style="color: #0000ff;">2</span><span style="color: #000000;">, </span><span style="color: #0000ff;">2</span><span style="color: #000000;">, </span><span style="color: #0000ff;">0</span><span style="color: #000000;">);  </span><span style="font-style: italic;color: #808080;">// look in bitstream.h for the meaning of the arguments</span>
   
   
    /* This imaginary codec uses one fft, one mdct and one vlc table. */
<span style="color: #000000;">    </span><span style="font-style: italic;color: #808080;">/* We also need to allocate a sample buffer */</span>
    ff_mdct_init(&q->mdct_ctx, 10, 1);   // 2^10 == size of mdct, 1 == inverse mdct
<span style="color: #000000;">    q-&gt;sample_buffer = av_mallocz(</span><span style="font-weight: bold;color: #000000;">sizeof</span><span style="color: #000000;">(</span><span style="color: #800000;">float</span><span style="color: #000000;">)*</span><span style="color: #0000ff;">1024</span><span style="color: #000000;">); </span><span style="font-style: italic;color: #808080;">// here we used av_mallocz instead of av_malloc</span>
    ff_fft_init(&q->fft_ctx, 9, 1);       // 2^9 == size of fft, 0 == inverse fft
<span style="color: #000000;">                                                        </span><span style="font-style: italic;color: #808080;">// av_mallocz memsets the whole buffer to 0</span>
    init_vlc (&q->vlc_table, 9, 24,
            vlctable_huffbits, 1, 1,
            vlctable_huffcodes, 2, 2, 0);  // look in bitstream.h for the meaning of the arguments
   
   
    /* We also need to allocate a sample buffer */
<span style="color: #000000;">    </span><span style="font-style: italic;color: #808080;">/* Check if the allocation was successful */</span>
    q->sample_buffer = av_mallocz(sizeof(float)*1024); // here we used av_mallocz instead of av_malloc
<span style="color: #000000;">    </span><span style="font-weight: bold;color: #000000;">if</span><span style="color: #000000;">(q-&gt;sample_buffer == NULL)</span>
                                                        // av_mallocz memsets the whole buffer to 0
<span style="color: #000000;">        </span><span style="font-weight: bold;color: #000000;">return</span><span style="color: #000000;"> -</span><span style="color: #0000ff;">1</span><span style="color: #000000;">;</span>
   
   
    /* Check if the allocation was successful */
<span style="color: #000000;">    </span><span style="font-style: italic;color: #808080;">/* return 0 for a successful init, -1 for failure */</span>
    if(q->sample_buffer == NULL)
<span style="color: #000000;">    </span><span style="font-weight: bold;color: #000000;">return</span><span style="color: #000000;"> </span><span style="color: #0000ff;">0</span><span style="color: #000000;">;</span>
        return -1;
<span style="color: #000000;">}</span>
   
   
    /* return 0 for a successful init, -1 for failure */
    return 0;
}
   
   
<span style="font-style: italic;color: #808080;">/* This is the main decode function */</span>
<span style="color: #800000;">static</span><span style="color: #000000;"> </span><span style="color: #800000;">int</span><span style="color: #000000;"> glue_decode_frame(AVCodecContext *avctx,</span>
<span style="color: #000000;">          </span><span style="color: #800000;">void</span><span style="color: #000000;"> *data, </span><span style="color: #800000;">int</span><span style="color: #000000;"> *data_size,</span>
<span style="color: #000000;">          uint8_t *buf, </span><span style="color: #800000;">int</span><span style="color: #000000;"> buf_size)</span>
<span style="color: #000000;">{</span>
<span style="color: #000000;">    GLUEContext *q = avctx-&gt;priv_data;</span>
<span style="color: #000000;">    int16_t *outbuffer = data;</span>
   
   
  /* This is the main decode function */
  <span style="color: #000000;">    </span><span style="font-style: italic;color: #808080;">/* We know what the arguments for this function are from above</span>
  static int cook_decode_frame(AVCodecContext *avctx,
  <span style="font-style: italic;color: #808080;">      now we just have to decode this imaginary codec, the made up</span>
            void *data, int *data_size,
<span style="font-style: italic;color: #808080;">      bitstream format is as follows:</span>
            uint8_t *buf, int buf_size)
<span style="font-style: italic;color: #808080;">      12 bits representing the amount of samples</span>
  {
<span style="font-style: italic;color: #808080;">      1 bit fft or mdct coded coeffs, 0 for fft/1 for mdct</span>
    COOKContext *q = avctx->priv_data;
  <span style="font-style: italic;color: #808080;">        read 13 bits representing the amount of vlc coded fft data coeffs</span>
    int16_t *outbuffer = data;
<span style="font-style: italic;color: #808080;">         read 10 bits representing the amount of vlc coded mdct data coeffs</span>
<span style="font-style: italic;color: #808080;">      (...bits representing the coeffs...)</span>
<span style="font-style: italic;color: #808080;">      5 bits of dummy data that should be ignored</span>
<span style="font-style: italic;color: #808080;">      32 bits the hex value 0x12345678, used for integrity check</span>
<span style="font-style: italic;color: #808080;">    */</span>
   
   
    /* We know what the arguments for this function are from above
<span style="color: #000000;">    </span><span style="font-style: italic;color: #808080;">/* Declare the needed variables */</span>
        now we just have to decode this imaginary codec, the made up
<span style="color: #000000;">    </span><span style="color: #800000;">int</span><span style="color: #000000;"> samples, coeffs, i, fft;</span>
        bitstream format is as follows:
<span style="color: #000000;">    </span><span style="color: #800000;">float</span><span style="color: #000000;"> mdct_tmp[</span><span style="color: #0000ff;">1024</span><span style="color: #000000;">];</span>
        12 bits representing the amount of samples
        1 bit fft or mdct coded coeffs, 0 for fft/1 for mdct
          read 13 bits representing the amount of vlc coded fft data coeffs
          read 10 bits representing the amount of vlc coded mdct data coeffs
        (...bits representing the coeffs...)
        5 bits of dummy data that should be ignored
        32 bits the hex value 0x12345678, used for integrity check
    */
   
   
    /* Declare the needed variables */
<span style="color: #000000;">    </span><span style="font-style: italic;color: #808080;">/* Now we init the bitstream reader, we start at the beginning of the inbuffer */</span>
    int samples, coeffs, i, fft;
<span style="color: #000000;">    init_get_bits(&amp;q-&gt;gb, buf, buf_size*</span><span style="color: #0000ff;">8</span><span style="color: #000000;">);  </span><span style="font-style: italic;color: #808080;">//the buf_size is in bytes but we need bits</span>
    float mdct_tmp[1024];
   
   
    /* Now we init the bitstream reader, we start at the beginning of the inbuffer */
<span style="color: #000000;">    </span><span style="font-style: italic;color: #808080;">/* Now we take 12 bits to get the amount of samples the current frame has */</span>
    init_get_bits(&q->gb, buf, buf_size*8);  //the buf_size is in bytes but we need bits
<span style="color: #000000;">    samples = get_bits(&amp;q-&gt;gb, </span><span style="color: #0000ff;">12</span><span style="color: #000000;">);</span>
<span style="color: #000000;">    </span>
<span style="color: #000000;">    </span><span style="font-style: italic;color: #808080;">/* Now we check if we have fft or mdct coeffs */</span>
<span style="color: #000000;">   fft = get_bits1(&amp;q-&gt;gb);</span>
<span style="color: #000000;">    </span><span style="font-weight: bold;color: #000000;">if</span><span style="color: #000000;"> (fft) {</span>
<span style="color: #000000;">        </span><span style="font-style: italic;color: #808080;">//fft coeffs, get how many</span>
<span style="color: #000000;">        coeffs = get_bits(&amp;q-&gt;gb, </span><span style="color: #0000ff;">13</span><span style="color: #000000;">);</span>
  <span style="color: #000000;">    } </span><span style="font-weight: bold;color: #000000;">else</span><span style="color: #000000;"> {</span>
<span style="color: #000000;">        </span><span style="font-style: italic;color: #808080;">//mdct coeffs, get how many</span>
<span style="color: #000000;">        coeffs = get_bits(&amp;q-&gt;gb, </span><span style="color: #0000ff;">10</span><span style="color: #000000;">);</span>
<span style="color: #000000;">    }</span>
   
   
    /* Now we take 12 bits to get the amount of samples the current frame has */
<span style="color: #000000;">    </span><span style="font-style: italic;color: #808080;">/* Now decode the vlc coded coeffs to the sample_buffer */</span>
    samples = get_bits(q->gb,12);
<span style="color: #000000;">    </span><span style="font-weight: bold;color: #000000;">for</span><span style="color: #000000;"> (i=</span><span style="color: #0000ff;">0</span><span style="color: #000000;"> ; i&lt;coeffs ; i++)</span>
   
<span style="color: #000000;">        q-&gt;sample_buffer[i] = get_vlc2(&amp;q-&gt;gb, q-&gt;vlc_table.table, vlc_table.bits, </span><span style="color: #0000ff;">3</span><span style="color: #000000;">); </span><span style="font-style: italic;color: #808080;">//read about the arguments in bitstream.h</span>
    /* Now we check if we have fft or mdct coeffs */
    fft = get_bits1(q->gb);
    if (fft) {
        //fft coeffs, get how many
        coeffs = get_bits(q->gb,13);
    } else {
        //mdct coeffs, get how many
        coeffs = get_bits(q->gb,10);
    }
   
   
    /* Now decode the vlc coded coeffs to the sample_buffer */
<span style="color: #000000;">    </span><span style="font-style: italic;color: #808080;">/* Now we need to transform the coeffs to samples */</span>
    for (i=0 ; i<coeffs ; i++)
<span style="color: #000000;">    </span><span style="font-weight: bold;color: #000000;">if</span><span style="color: #000000;"> (fft) {</span>
        q->sample_buffer[i] = get_vlc2(&q->gb, q->vlc_table.table,vlc_table.bits,3);  //read about the arguments in bitstream.h
<span style="color: #000000;">        </span><span style="font-style: italic;color: #808080;">//The fft is done inplace</span>
<span style="color: #000000;">        ff_fft_permute(&amp;q-&gt;fft_ctx, (FFTComplex *) q-&gt;sample_buffer);</span>
<span style="color: #000000;">        ff_fft_calc(&amp;q-&gt;fft_ctx, (FFTComplex *) q-&gt;sample_buffer);</span>
<span style="color: #000000;">    } </span><span style="font-weight: bold;color: #000000;">else</span><span style="color: #000000;"> {</span>
<span style="color: #000000;">        </span><span style="font-style: italic;color: #808080;">//And we pretend that the mdct is also inplace</span>
<span style="color: #000000;">        ff_imdct_calc(&amp;q-&gt;mdct_ctx, q-&gt;sample_buffer, q-&gt;sample_buffer, mdct_tmp);</span>
  <span style="color: #000000;">    }</span>
   
   
    /* Now we need to transform the coeffs to samples */
<span style="color: #000000;">    </span><span style="font-style: italic;color: #808080;">/* To make it easy the stream can only be 16 bits mono, so let's convert it to that */</span>
    if (fft) {
<span style="color: #000000;">    </span><span style="font-weight: bold;color: #000000;">for</span><span style="color: #000000;"> (i=</span><span style="color: #0000ff;">0</span><span style="color: #000000;"> ; i&lt;samples ; i++)</span>
        //The fft is done inplace
<span style="color: #000000;">        outbuffer[i] = (int16_t)q-&gt;sample_buffer[i];</span>
        ff_fft_permute(&q->fft_ctx, (FFTComplex *) q->sample_buffer);
        ff_fft_calc(&q->fft_ctx, (FFTComplex *) q->sample_buffer);
    } else {
        //And we pretend that the mdct is also inplace
        ff_imdct_calc(&q->mdct_ctx, q->sample_buffer, q->sample_buffer, mdct_tmp);
    }
   
   
    /* To make it easy the stream can only be 16 bits mono, so let's convert it to that */
<span style="color: #000000;">    </span><span style="font-style: italic;color: #808080;">/* Report how many samples we got */</span>
    for (i=0 ; i<samples ; i++)
<span style="color: #000000;">    *data_size = samples;</span>
        outbuffer[i] = (int16_t)q->sample_buffer[i];
   
   
    /* Report how many samples we got */
<span style="color: #000000;">    </span><span style="font-style: italic;color: #808080;">/* Skip the dummy data bits */</span>
    *data_size = samples;
<span style="color: #000000;">    skip_bits(&amp;q-&gt;gb, </span><span style="color: #0000ff;">5</span><span style="color: #000000;">);</span>
   
   
    /* Skip the dummy data */
<span style="color: #000000;">    </span><span style="font-style: italic;color: #808080;">/* Check if the buffer was consumed ok */</span>
    skip_bits(q->gb,5);
<span style="color: #000000;">    </span><span style="font-weight: bold;color: #000000;">if</span><span style="color: #000000;"> (get_bits(&amp;q-&gt;gb,</span><span style="color: #0000ff;">32</span><span style="color: #000000;">) != </span><span style="color: #008080;">0x12345678</span><span style="color: #000000;">) {</span>
 
<span style="color: #000000;">        av_log(avctx,AV_LOG_ERROR,</span><span style="color: #dd0000;">"Stream error, integrity check failed!</span><span style="color: #ff00ff;">\n</span><span style="color: #dd0000;">"</span><span style="color: #000000;">);</span>
    /* Check if the buffer was consumed ok */
<span style="color: #000000;">        </span><span style="font-weight: bold;color: #000000;">return</span><span style="color: #000000;"> -</span><span style="color: #0000ff;">1</span><span style="color: #000000;">;</span>
    if (get_bits(q->gb,32) != 0x12345678) {
<span style="color: #000000;">    }</span>
        av_log(avctx,AV_LOG_ERROR,"Stream error, integrity check failed!\n");
<span style="font-style: italic;color: #808080;">
        return -1;
    /* The decision between erroring out or not in case of unexpected data
     }
        should be made so that the output quality is maximized.
        This means that if undamaged data is assumed then unused/resereved values
        should lead to warnings but not failure. (assumption of slightly non compliant
        file)
        OTOH if possibly damaged data is assumed and it is assumed that the original
        did contain specific values in reserved/unused fields then finding unexpected
        values should trigger error concealment code and the decoder/demuxer should
        attempt to resync.
        The decision between these 2 should be made by using
        AVCodecContext.error_recognition unless its a clear case where only one of
        the 2 makes sense.
     */
</span>
   
   
    /* Return the amount of bytes consumed if everything was ok */
<span style="color: #000000;">    </span><span style="font-style: italic;color: #808080;">/* Return the amount of bytes consumed if everything was ok */</span>
    return *data_size*sizeof(int16_t);
<span style="color: #000000;">    </span><span style="font-weight: bold;color: #000000;">return</span><span style="color: #000000;"> *data_size*</span><span style="font-weight: bold;color: #000000;">sizeof</span><span style="color: #000000;">(int16_t);</span>
  }
  <span style="color: #000000;">}</span>
   
   
   
   
  /* the uninit function, here we just do the inverse of the init */  
  <span style="font-style: italic;color: #808080;">/* the uninit function, here we just do the inverse of the init */</span><span style="color: #000000;"> </span>
  static int glue_decode_close(AVCodecContext *avctx)
  <span style="color: #800000;">static</span><span style="color: #000000;"> </span><span style="color: #800000;">int</span><span style="color: #000000;"> glue_decode_close(AVCodecContext *avctx)</span>
  {
  <span style="color: #000000;">{</span>
    GLUEContext *q = avctx->priv_data;
<span style="color: #000000;">    GLUEContext *q = avctx-&gt;priv_data;</span>
   
   
    /* Free allocated memory buffer */
<span style="color: #000000;">    </span><span style="font-style: italic;color: #808080;">/* Free allocated memory buffer */</span>
    av_free(q->sample_buffer);
<span style="color: #000000;">    av_free(q-&gt;sample_buffer);</span>
   
   
    /* Free the fft transform */
<span style="color: #000000;">    </span><span style="font-style: italic;color: #808080;">/* Free the fft transform */</span>
    ff_fft_end(&q->fft_ctx);
<span style="color: #000000;">    ff_fft_end(&amp;q-&gt;fft_ctx);</span>
   
   
    /* Free the mdct transform */
<span style="color: #000000;">    </span><span style="font-style: italic;color: #808080;">/* Free the mdct transform */</span>
    ff_mdct_end(&q->mdct_ctx);
<span style="color: #000000;">    ff_mdct_end(&amp;q-&gt;mdct_ctx);</span>
   
   
    /* Free the vlc table */
<span style="color: #000000;">    </span><span style="font-style: italic;color: #808080;">/* Free the vlc table */</span>
    free_vlc(&q->vlc_table);
<span style="color: #000000;">    free_vlc(&amp;q-&gt;vlc_table);</span>
   
   
    /* Return 0 if everything is ok, -1 if not */
<span style="color: #000000;">    </span><span style="font-style: italic;color: #808080;">/* Return 0 if everything is ok, -1 if not */</span>
    return 0;
<span style="color: #000000;">    </span><span style="font-weight: bold;color: #000000;">return</span><span style="color: #000000;"> </span><span style="color: #0000ff;">0</span><span style="color: #000000;">;</span>
  }
  <span style="color: #000000;">}</span>
   
   
   
   
  AVCodec glue_decoder =
  <span style="color: #000000;">AVCodec glue_decoder =</span>
  {
  <span style="color: #000000;">{</span>
    .name = "glue"
<span style="color: #000000;">    .name           = </span><span style="color: #dd0000;">"glue"</span><span style="color: #000000;">,</span>
    .type = CODEC_TYPE_AUDIO,
<span style="color: #000000;">    .type           = CODEC_TYPE_AUDIO,</span>
    .id = CODEC_ID_GLUE,
<span style="color: #000000;">    .id             = CODEC_ID_GLUE,</span>
    .priv_data_size = sizeof(GLUEContext),
<span style="color: #000000;">    .priv_data_size = </span><span style="font-weight: bold;color: #000000;">sizeof</span><span style="color: #000000;">(GLUEContext),</span>
    .init = glue_decode_init,
<span style="color: #000000;">    .init           = glue_decode_init,</span>
    .close = glue_decode_close,
<span style="color: #000000;">    .close         = glue_decode_close,</span>
    .decode = glue_decode_frame,
<span style="color: #000000;">    .decode         = glue_decode_frame,</span>
  };
  <span style="color: #000000;">};</span>
 
=== trouble shooting ===
Invalid pixel format '(null)'
Output pad "default" with type video of the filter instance "Parsed_null_0" of null not connected to any destination
Error opening filters!
 
meant encoder's description 
.pix_fmts  needs to end with AV_PIX_FMT_NONE
 
Assertion *(const AVClass **)avctx->priv_data == codec->priv_class failed at libavcodec/utils.c:1554
 
meant "your first member of your Context struct needs to be an AVClass *"
 
if it works but just says ""0 packets muxed"" you need to set the *got_packet return value to "1" or the muxer won't mux what you're giving it
 
== see also ==
 
* [[FFmpeg demuxer howto]]
* [[FFmpeg programming conventions]]
* [[FFmpeg_filter_HOWTO]]
 
[[Category:FFmpeg Tutorials]]
[[Category:Tutorials]]

Revision as of 16:50, 13 October 2015

This page is meant as an introduction to the internal codec API in FFmpeg. It will also show how the codecs are connected with the demuxers. This is by no means a complete guide but enough to understand how to add a codec to FFmpeg. Cook is used as an example throughout.

registering the codec

libavcodec/avcodec.h

The first thing to look at is the AVCodec struct.

typedef struct AVCodec {
    const char *name;
    enum CodecType type;
    enum CodecID id;
    int priv_data_size;
    int (*init)(AVCodecContext *);
    int (*encode)(AVCodecContext *, uint8_t *buf, int buf_size, void *data);
    int (*close)(AVCodecContext *);
    int (*decode)(AVCodecContext *, void *outdata, int *outdata_size,
                  uint8_t *buf, int buf_size);
    int capabilities;
    struct AVCodec *next;
    void (*flush)(AVCodecContext *);
    const AVRational *supported_framerates; ///array of supported framerates, or NULL if any, array is terminated by {0,0}
    const enum PixelFormat *pix_fmts;       ///array of supported pixel formats, or NULL if unknown, array is terminanted by -1
} AVCodec;

Here we can see that we have some elements to name the codec, what type it is (audio/video), the supported pixel formats and some function pointers for init/encode/decode and close. Now lets see how it is used.

libavcodec/cook.c

If we look in this file at the bottom we can see this code:

AVCodec cook_decoder =
{
    .name           = "cook",
    .type           = CODEC_TYPE_AUDIO,
    .id             = CODEC_ID_COOK,
    .priv_data_size = sizeof(COOKContext),
    .init           = cook_decode_init,
    .close          = cook_decode_close,
    .decode         = cook_decode_frame,
};

First we get an AVCodec struct named cook_decoder. And then we set the variables of cook_decoder. Note that we only set the variables that are needed. Currently there is no encoder so we don't set any. If we now look at the id variable we can see that CODEC_ID_COOK isn't defined in libavcodec/cook.c. It is declared in avcodec.h.

libavcodec/avcodec.h

Here we will find the CodecID enumeration.

enum CodecID {
...
CODEC_ID_GSM,
CODEC_ID_QDM2,
CODEC_ID_COOK,
CODEC_ID_TRUESPEECH,
CODEC_ID_TTA,
...
};

CODEC_ID_COOK is there in the list. This is the list of all supported codecs in FFmpeg, the list is fixed and used internally to id every codec. Changing the order would break binary compatibility.

This is all enough to declare a codec. Now we must register them for internal use also. This is done at runtime.

libavcodec/allcodecs.c

In this file we have the avcodec_register_all() function, it has entries like this for all codecs.

...
    REGISTER_DECODER(COOK, cook);
...

This macro expands to a register_avcodec() call which registers a codec for internal use. Note that register_avcodec() will only be called when CONFIG_COOK_DECODER is defined. This allows to not compile the decoder code for a specific codec. But where is it defined? This is extracted by configure with this command line:

sed -n 's/^[^#]*DEC.*, *\(.*\)).*/\1_decoder/p' libavcodec/allcodecs.c

So adding a REGISTER_DECODER(NEW, new) entry in allcodecs.c and reconfigure is enough to add the needed define. Now we have everything to hookup a codec.

libavcodec/Makefile

In this file we define the objects on which a codec depends. For example, cook uses fft and mdct code so it depends on the mdct.o and fft.o object files as well as the cook.o object file.

...
OBJS-$(CONFIG_COOK_DECODER)            += cook.o mdct.o fft.o
...

FFmpeg demuxer connection

FFmpeg demuxer howto

libavformat/rm.c

If we think of an imaginary rm file that ffmpeg is about to process, the first thing that happens is that it is identified as a rm file. It is passed on to the rm demuxer (rmdec.c). The rm demuxer looks through the file and finds out that it is a cook file.

...
} else if (!strcmp(buf, "cook")) {
st->codec->codec_id = CODEC_ID_COOK;
...

Now ffmpeg knows what codec to init and where to send the payload from the container. So back to cook.c and the initialization process.

codec code

libavcodec/cook.c Init

After ffmpeg knows what codec to use, it calls the declared initialization function pointer declared in the codecs AVCodec struct. In cook.c it is called cook_decode_init. Here we setup as much as we can before we start decoding. The following things should be handled in the init, vlc table initialization, table generation, memory allocation and extradata parsing.

libavcodec/cook.c Close

The cook_decode_close function is the codec clean-up call. All memory, vlc tables, etc. should be freed here.

libavcodec/cook.c Decode

In cook.c the name of the decode call is cook_decode_frame.

static int cook_decode_frame(AVCodecContext *avctx,
            void *data, int *data_size,
            uint8_t *buf, int buf_size) {
...

The function has 5 arguments:

  • avctx is a pointer to an AVCodecContext
  • data is the pointer to the output buffer
  • data_size is a variable that should be set to the output buffer size in bytes (this is usually the number of samples decoded * the number of channels * the byte size of a sample)
  • buf is the pointer to the input buffer
  • buf_size is the byte size of the input buffer

The decode function shall return the number of bytes consumed from the input buffer or -1 in case of an error. If there is no error during decoding, the return value is usually buf_size as buf should only contain one 'frame' of data. Bitstream parsers to split the bitstream into 'frames' used to be part of the codec so a call to the decode function could have consumed less than buf_size bytes from buf. It is now encouraged that bitstream parsers be separate.


That's how it works without too much detail.

The Glue codec template

The imaginary Glue audio codec will serve as a base to exhibit bitstream reading, vlc decoding and other things. The code is purely fictional and is sometimes written purely for the sake of example. No attempt is made to prevent invalid data manipulation.

The Glue codec follows.

non-colored version

/* The following includes have the bitstream reader, various dsp functions and the various defaults */
#define ALT_BITSTREAM_READER
#include "avcodec.h"
#include "bitstream.h"
#include "dsputil.h"

/* This includes the tables needed for the Glue codec template */
#include "gluedata.h"


/* Here we declare the struct used for the codec private data */
typedef struct {
    GetBitContext       gb;
    FFTContext          fft_ctx;
    VLC                 vlc_table;
    MDCTContext         mdct_ctx;
    float*              sample_buffer;
} GLUEContext;


/* The init function */
static int glue_decode_init(AVCodecContext *avctx)
{
    GLUEContext *q = avctx->priv_data;

    /* This imaginary codec uses one fft, one mdct and one vlc table. */
    ff_mdct_init(&q->mdct_ctx, 10, 1);    // 2^10 == size of mdct, 1 == inverse mdct
    ff_fft_init(&q->fft_ctx, 9, 1);       // 2^9 == size of fft, 0 == inverse fft
    init_vlc (&q->vlc_table, 9, 24,
           vlctable_huffbits, 1, 1,
           vlctable_huffcodes, 2, 2, 0);  // look in bitstream.h for the meaning of the arguments

    /* We also need to allocate a sample buffer */
    q->sample_buffer = av_mallocz(sizeof(float)*1024);  // here we used av_mallocz instead of av_malloc
                                                        // av_mallocz memsets the whole buffer to 0

    /* Check if the allocation was successful */
    if(q->sample_buffer == NULL)
        return -1;

    /* return 0 for a successful init, -1 for failure */
    return 0;
}


/* This is the main decode function */
static int glue_decode_frame(AVCodecContext *avctx,
           void *data, int *data_size,
           uint8_t *buf, int buf_size)
{
    GLUEContext *q = avctx->priv_data;
    int16_t *outbuffer = data;

    /* We know what the arguments for this function are from above
       now we just have to decode this imaginary codec, the made up
       bitstream format is as follows:
       12 bits representing the amount of samples
       1 bit fft or mdct coded coeffs, 0 for fft/1 for mdct
         read 13 bits representing the amount of vlc coded fft data coeffs
         read 10 bits representing the amount of vlc coded mdct data coeffs
       (...bits representing the coeffs...)
       5 bits of dummy data that should be ignored
       32 bits the hex value 0x12345678, used for integrity check
    */

    /* Declare the needed variables */
    int samples, coeffs, i, fft;
    float mdct_tmp[1024];

    /* Now we init the bitstream reader, we start at the beginning of the inbuffer */
    init_get_bits(&q->gb, buf, buf_size*8);  //the buf_size is in bytes but we need bits

    /* Now we take 12 bits to get the amount of samples the current frame has */
    samples = get_bits(&q->gb, 12);
    
    /* Now we check if we have fft or mdct coeffs */
    fft = get_bits1(&q->gb);
    if (fft) {
        //fft coeffs, get how many
        coeffs = get_bits(&q->gb, 13);
    } else {
        //mdct coeffs, get how many
        coeffs = get_bits(&q->gb, 10);
    }

    /* Now decode the vlc coded coeffs to the sample_buffer */
    for (i=0 ; i<coeffs ; i++)
        q->sample_buffer[i] = get_vlc2(&q->gb, q->vlc_table.table, vlc_table.bits, 3);  //read about the arguments in bitstream.h

    /* Now we need to transform the coeffs to samples */
    if (fft) {
        //The fft is done inplace
        ff_fft_permute(&q->fft_ctx, (FFTComplex *) q->sample_buffer);
        ff_fft_calc(&q->fft_ctx, (FFTComplex *) q->sample_buffer);
    } else {
        //And we pretend that the mdct is also inplace
        ff_imdct_calc(&q->mdct_ctx, q->sample_buffer, q->sample_buffer, mdct_tmp);
    }

    /* To make it easy the stream can only be 16 bits mono, so let's convert it to that */
    for (i=0 ; i<samples ; i++)
        outbuffer[i] = (int16_t)q->sample_buffer[i];

    /* Report how many samples we got */
    *data_size = samples;

    /* Skip the dummy data bits */
    skip_bits(&q->gb, 5);

    /* Check if the buffer was consumed ok */
    if (get_bits(&q->gb,32) != 0x12345678) {
        av_log(avctx,AV_LOG_ERROR,"Stream error, integrity check failed!\n");
        return -1;
    }

    /* The decision between erroring out or not in case of unexpected data
       should be made so that the output quality is maximized.
       This means that if undamaged data is assumed then unused/resereved values
       should lead to warnings but not failure. (assumption of slightly non compliant
       file)
       OTOH if possibly damaged data is assumed and it is assumed that the original
       did contain specific values in reserved/unused fields then finding unexpected
       values should trigger error concealment code and the decoder/demuxer should
       attempt to resync.
       The decision between these 2 should be made by using 
       AVCodecContext.error_recognition unless its a clear case where only one of
       the 2 makes sense.
    */


    /* Return the amount of bytes consumed if everything was ok */
    return *data_size*sizeof(int16_t);
}


/* the uninit function, here we just do the inverse of the init */ 
static int glue_decode_close(AVCodecContext *avctx)
{
    GLUEContext *q = avctx->priv_data;

    /* Free allocated memory buffer */
    av_free(q->sample_buffer);

    /* Free the fft transform */
    ff_fft_end(&q->fft_ctx);

    /* Free the mdct transform */
    ff_mdct_end(&q->mdct_ctx);

    /* Free the vlc table */
    free_vlc(&q->vlc_table);

    /* Return 0 if everything is ok, -1 if not */
    return 0;
}


AVCodec glue_decoder =
{
    .name           = "glue",
    .type           = CODEC_TYPE_AUDIO,
    .id             = CODEC_ID_GLUE,
    .priv_data_size = sizeof(GLUEContext),
    .init           = glue_decode_init,
    .close          = glue_decode_close,
    .decode         = glue_decode_frame,
};

trouble shooting

Invalid pixel format '(null)' Output pad "default" with type video of the filter instance "Parsed_null_0" of null not connected to any destination Error opening filters!

meant encoder's description .pix_fmts needs to end with AV_PIX_FMT_NONE

Assertion *(const AVClass **)avctx->priv_data == codec->priv_class failed at libavcodec/utils.c:1554

meant "your first member of your Context struct needs to be an AVClass *"

if it works but just says ""0 packets muxed"" you need to set the *got_packet return value to "1" or the muxer won't mux what you're giving it

see also