FFmpeg audio API: Difference between revisions

Revision as of 04:19, 3 January 2008

This page is for discussion regarding the reworking of the FFmpeg audio API to accommodate the requirements needed for today's audio codecs.

Features needed

Generalized channel mixing (SIMD optimized) - users should be able to set their own channel mixing coefficients.
Codec alterable channel mixing coefficients - the codec should be able to set and update the channel mixing coefficients during runtime (DCA supports this feature, maybe AC-3 also).
Output channel request function - specify the number of output channels, default should be >2 channels mapped to 2 channels
Channel reordering - currently there are different orders depending on the codec.
SIMD optimized interleaving
Allow planar output - don't duplicate the interleaving code in every codec
Support bit depths other than 16-bit - 8-bit/24-bit/32-bit/float
Channel selection - ability to access one channel from a multichannel stream

Feature wish list

Dolby Pro Logic Surround Sound decoding (Prologic 1 and Prologic 2).
Add a better FFT routine. (Would the KISS implementation be a good candidate?)
Fixed point MDCT/FFT implementations
Custom audio filter support. (Basing it on the video filter API ideas?)
Proper API for enabling SIMD optimized code.
Create (or port) additional pre-process and post-process audio filters:
- Psychoacoustic audio processing
- Artificial reverberation
- Audio re-sampler (sample rate converter) filter
  - Possible re-sampler source is SRC (Secret Rabbit Code)
Create a SDK (Software Development Kit) with templates for the A/V filter APIs

Current ideas

Threads with previous discussions in the subject:

http://thread.gmane.org/gmane.comp.video.ffmpeg.devel/47485/focus=48097
- The thread has several ideas already implemented.
http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/2007-November/038323.html
- Discussion of general ideas and requirements for the new API.

Mixing templates

struct codec_mix_struct

/** This struct holds the possible stream channel configurations and the possible output configurations.
 *  The code will have a table of these struct's to define all the channel configurations it support.
 *  This table will be passes to the ff_mix_init function and the init will search through the table
 *  for a matching configuration and load the appropriate mixing coeffs.
 */
typedef struct av_codec_mix_struct {
    unsigned int inchannels,            ///< amount of channels in the input stream
    unsigned int outchannels,           ///< amount of channels in the requested output stream
    unsinged int stream_channel_mask,   ///< channelmask for the input stream
    unsinged int out_channel_mask,      ///< channelmask for the output data
    int8_t* mixing_coeff_index_matrix,  ///< mixing matrix that correspond to the mixing configuration
                                        ///< Table with inchannels*outchannels index elements, a negative index means that the mixing coeffs should be negated.
                                        ///< For example (simplified) [1,2] would mean coeff[1]+coeff[2] while [1,-2] would mean coeff[1]-coeff[2].
} av_codec_mix_struct;

struct AVMIXContext

/** Main AVMIX context
 *
 */
typedef struct AVMIXContext {
    unsigned int inchannels,            ///< amount of channels in the input stream
    unsigned int outchannels,           ///< amount of channels in the requested output stream
    void* inchannel[MAX_MIX_CHANNELS];  ///< pointers to the inchannels in channelmask order
    void* outchannel[MAX_MIX_CHANNELS]; ///< pointers to the outchannels in channelmask order
}

function ff_mix_init

/** Initialization routine for the libavcodec multichannel audio mixer
 *
 * The multichannel mixer does not know the "position" of the speakers and it doesn't need to either. But
 * depending on the mixing matrix it will unknowingly reorder channels to the native order.
 *
 * @param[in|out] mix
 * This is the actual mixing context. It will hold the all the information needed to perform mixing.
 * If the passed argument is NULL it will allocate a context. If not null it will reinit the passed
 * context. The mix context is of fixed size and will be large enough to support a MAX_MIX_CHANNELS
 * amount of channels.
 *
 * @param[in] inchannels
 * Number of inchannels, this is set by the input stream. This value will be stored in the mixing context.
 *
 * @param[in] outchannels 
 * Number of outchannels, this is set by the user. This value will be stored in the mixing context.
 *
 * @param stream_channel_mask
 * This is the parameter describing the possible channel configuration a codec can have. This info is taken from
 * the input stream and converted to a channel mask.
 *
 * @param out_channel_mask
 * This mask will contain the user selected out channel configuration.
 *
 * @param mix_table[in]
 * Table of av_codec_mix_struct's. 
 *
 * @param[in] mixing_coeffs_table
 * Table with mixing coeffs, it is this table the mixing_coeff_index_matrix will refer too. It is declared as void* to
 * make it possible for a future addition of fixed point mixing.
 *
 * @return[out]
 * The init will do a lookup for a matching mixing configuration with the help of the in and out channel masks.
 * If there isn't any matching configuration return 0 otherwise return 1. 
 */
int ff_mix_init(AVMIXContext* mix, unsigned int inchannels, unsigned int outchannels, unsigned int stream_channel_mask,
                unsigned int out_channel_mask, av_codec_mix_table* mix_table, void* mixing_coeffs_table);

select_mixing_matrix

/** Function to get the appropriate mixing_coeff_index_matrix.
 *
 *
 * @param[in] inchannels[in]
 * Number of inchannels, this is set by the input stream. This value will be stored in the mixing context.
 *
 * @param[in] outchannels[in]
 * Number of outchannels, this is set by the user. This value will be stored in the mixing context.
 *
 * @param stream_channel_mask[in]
 * This is the parameter describing the possible channel configuration a codec can have. This info is taken from
 * the input stream and converted to a channel mask.
 *
 * @param out_channel_mask[in]
 * This mask will contain the user selected out channel configuration.
 *
 * @param mix_table[in]
 * Table of av_codec_mix_struct's. 
 *
 * @returns[out]
 * A mixing_coeff_index_matrix if the configuration could be found in the mix_table, NULL if not.
 */
int8_t* select_mixing_matrix(unsigned int inchannels, unsigned int outchannels, unsigned int stream_channel_mask,
                             unsigned int out_channel_mask, av_codec_mix_table* mix_table);

@@ Line 13: / Line 13: @@
 == Feature wish list ==
-* Dolby Pro Logic Surround Sound decoding.
+* Dolby Pro Logic Surround Sound decoding (Prologic 1 and Prologic 2).
 * Add a better FFT routine. (Would the KISS implementation be a good candidate?)
 * Fixed point MDCT/FFT implementations
 * Custom audio filter support. (Basing it on the video filter API ideas?)
 * Proper API for enabling SIMD optimized code.
+* Create (or port) additional pre-process and post-process audio filters:
+** Psychoacoustic audio processing
+** Artificial reverberation
+** Audio re-sampler (sample rate converter) filter
+*** Possible re-sampler source is [http://www.mega-nerd.com/SRC/ SRC (Secret Rabbit Code)]
+*Create a SDK (Software Development Kit) with templates for the A/V filter APIs
 == Current ideas ==

FFmpeg audio API: Difference between revisions

Revision as of 04:19, 3 January 2008

Contents

Features needed

Feature wish list

Current ideas

Mixing templates

struct codec_mix_struct

struct AVMIXContext

function ff_mix_init

select_mixing_matrix

Navigation menu

FFmpeg audio API: Difference between revisions

Revision as of 04:19, 3 January 2008

Features needed

Feature wish list

Current ideas

Mixing templates

struct codec_mix_struct

struct AVMIXContext

function ff_mix_init

select_mixing_matrix

Navigation menu

Search