FFmpeg audio API

From MultimediaWiki
Jump to navigation Jump to search

This page is for discussion regarding the reworking of the FFmpeg audio API to accommodate the requirements needed for today's audio codecs.

Reasons why an audio API is needed in FFmpeg

FFmpeg's already well-known libavcodec module has become the de facto standard library for video decoding and encoding in free software projects. Unfortunately, no similar standard library has surfaced for audio/video filtering and otherwise working with audio/video stream once it has been decoded. Various multimedia projects (such as MPlayer, Xine, GStreamer, VirtualDub, etc.) have implemented their own filter systems to various degrees of success. What is needed is a high quality audio and video filter API - efficient, flexible enough to meet all the requirements which have led various projects to invent their own filter system, and yet easy to use or develop new filters with. This proposal is to implement a high quality audio API and filter library for FFmpeg, where it can be easily used by other multimedia-related software projects.

To Do

  • Decide what will be implemented as functions in the public API and what will be implemented as a filter in the Libavfilter framework

Features needed

  • Generalized channel mixing (SIMD optimized) - users should be able to set their own channel mixing coefficients.
  • Codec alterable channel mixing coefficients - the codec should be able to set and update the channel mixing coefficients during runtime (DCA and AC-3 support this).
  • Output channel request function - specify the number of desired output channels. The decoder may or may not be able to grant the request. If not, a general mixing filter should be used.
  • Distinguish between number of coded channels, requested channels, and output channels. Demuxers and/or parsers would only need to set the number of coded channels.
  • Channel reordering - currently there are different orders depending on the codec and/or container.
  • SIMD optimized interleaving
  • Allow planar output - don't duplicate the interleaving code in every codec
  • Support bit depths other than 16-bit - 8-bit/24-bit/32-bit/float
  • Channel selection - ability to access one channel from a multichannel stream
  • SPDIF passthough support
  • Decide on name of a such A/V filter API.
    • libavmunge, or simply extend on the existing libavfilter have been suggested so far.

Feature wish list

Warning: This is not an official wish list. Before implementing any one of these items, ask first in the ffmpeg-devel mailing-list.

  • Dolby Pro Logic Surround Sound decoding (Prologic 1 and Prologic 2).
  • Add a better FFT routine. (Would the KISS implementation be a good candidate?)
  • Fixed point MDCT/FFT implementations
  • Custom audio filter support. (Basing it on the video filter API ideas?)
  • Proper API for enabling SIMD optimized code.
  • Create (or port) additional pre-process and post-process audio filters:
    • Psychoacoustic audio processing
    • Artificial reverberation
  • Create a SDK (Software Development Kit) with templates for the A/V filter APIs
  • Replace AVCODEC_MAX_AUDIO_FRAME_SIZE with a run-time calculated buffer size, that is tailored to the selected encoder/decoder. Currently, user applications must supply ffmpeg with an input/output buffer of AVCODEC_MAX_AUDIO_FRAME_SIZE bytes, irrespective of whether the codec will read/write that many samples.

Current ideas

Threads with previous discussions in the subject:

Proposal for internal audio mixing api/system

ff_mix usage

To setup the mixing you need an AVMIXContext to hold all settings. After that just initialize the context with the proper mixing coefficient tables and channel layouts.

AVMIXContext* mix=NULL;
stream_dwChannelMask = FRONT_LEFT|FRONT_RIGHT|CENTER|LFE|REAR_LEFT|REAR_RIGHT;
out_dwChannelMask = FRONT_LEFT|FRONT_RIGHT;
result = ff_mix_init(mix, 6, 2, stream_dwChannelMask, out_dwChannelMask, codec_mixing_table ,mixing_coeffs_table);

After this and if the result from ff_mix is 1 then set the pointers to the in and out buffers. Here is a possibility to reorder the channels if that is needed.

mix->inchannel[0]  = codec->channel[0];
mix->inchannel[1]  = codec->channel[1];
mix->inchannel[2]  = codec->channel[2];
mix->inchannel[3]  = codec->channel[3];
mix->inchannel[4]  = codec->channel[4];
mix->inchannel[5]  = codec->channel[5];
mix->outchannel[0] = codec->outchannel[0];
mix->outchannel[1] = codec->outchannel[1];

Now everything should be setup properly. And to mix the buffers just use:

result = ff_mix(mix);

If result is 0 then something was initialized wrongly.


struct codec_mix_struct

/** This struct holds the possible stream channel configurations and the possible output configurations.
 *  The code will have a table of these struct's to define all the channel configurations it support.
 *  This table will be passes to the ff_mix_init function and the init will search through the table
 *  for a matching configuration and load the appropriate mixing coeffs.
 */
typedef struct av_codec_mix_struct {
    unsigned int inchannels,            ///< amount of channels in the input stream
    unsigned int outchannels,           ///< amount of channels in the requested output stream
    unsinged int stream_channel_mask,   ///< channelmask for the input stream
    unsinged int out_channel_mask,      ///< channelmask for the output data
    int8_t* mixing_coeff_index_matrix,  ///< mixing matrix that correspond to the mixing configuration
                                        ///< Table with inchannels*outchannels index elements, a negative index means that the mixing coeffs should be negated.
                                        ///< For example (simplified) [1,2] would mean coeff[1]+coeff[2] while [1,-2] would mean coeff[1]-coeff[2].
} av_codec_mix_struct;

struct AVMIXContext

/** Main AVMIX context
 *
 */
typedef struct AVMIXContext {
    unsigned int inchannels,            ///< amount of channels in the input stream
    unsigned int outchannels,           ///< amount of channels in the requested output stream
    void* inchannel[MAX_MIX_CHANNELS];  ///< pointers to the inchannels in channelmask order
    void* outchannel[MAX_MIX_CHANNELS]; ///< pointers to the outchannels in channelmask order
}

function ff_mix_init

/** Initialization routine for the libavcodec multichannel audio mixer
 *
 * The multichannel mixer does not know the "position" of the speakers and it doesn't need to either. But
 * depending on the mixing matrix it will unknowingly reorder channels to the native order.
 *
 * @param[in|out] mix
 * This is the actual mixing context. It will hold all the information needed to perform mixing.
 * If the passed argument is NULL it will allocate a context. If not null it will reinit the passed
 * context. The mix context is of fixed size and will be large enough to support a MAX_MIX_CHANNELS
 * amount of channels.
 *
 * @param[in] inchannels
 * Number of inchannels, this is set by the input stream. This value will be stored in the mixing context.
 *
 * @param[in] outchannels 
 * Number of outchannels, this is set by the user. This value will be stored in the mixing context.
 *
 * @param stream_channel_mask
 * This is the parameter describing the possible channel configuration a codec can have. This info is taken from
 * the input stream and converted to a channel mask.
 *
 * @param out_channel_mask
 * This mask will contain the user selected out channel configuration.
 *
 * @param mix_table[in]
 * Table of av_codec_mix_struct's. 
 *
 * @param[in] mixing_coeffs_table
 * Table with mixing coeffs, it is this table the mixing_coeff_index_matrix will refer too. It is declared as void* to
 * make it possible for a future addition of fixed point mixing.
 *
 * @return[out]
 * The init will do a lookup for a matching mixing configuration with the help of the in and out channel masks.
 * If there isn't any matching configuration return 0 otherwise return 1. 
 */
int ff_mix_init(AVMIXContext* mix, unsigned int inchannels, unsigned int outchannels, unsigned int stream_channel_mask,
                unsigned int out_channel_mask, av_codec_mix_table* mix_table, void* mixing_coeffs_table);


select_mixing_matrix

/** Function to get the appropriate mixing_coeff_index_matrix.
 *
 *
 * @param[in] inchannels[in]
 * Number of inchannels, this is set by the input stream. This value will be stored in the mixing context.
 *
 * @param[in] outchannels[in]
 * Number of outchannels, this is set by the user. This value will be stored in the mixing context.
 *
 * @param stream_channel_mask[in]
 * This is the parameter describing the possible channel configuration a codec can have. This info is taken from
 * the input stream and converted to a channel mask.
 *
 * @param out_channel_mask[in]
 * This mask will contain the user selected out channel configuration.
 *
 * @param mix_table[in]
 * Table of av_codec_mix_struct's. 
 *
 * @returns[out]
 * A mixing_coeff_index_matrix if the configuration could be found in the mix_table, NULL if not.
 */
int8_t* select_mixing_matrix(unsigned int inchannels, unsigned int outchannels, unsigned int stream_channel_mask,
                             unsigned int out_channel_mask, av_codec_mix_table* mix_table);


/** Function to perform the mixing of audio in the input pointers to the output pointers

*
* This function should be called to initiate the mixing of source channels to the destination channels.
*
* @returns[out]
* If something went bad (NULL pointers for src or dst etc) return 0 and if everything is ok return 1.
*/

int ff_mix(AVMIXContext* mix);

dwChannelMask

This is the mask that is used in wav, that could be used in FFmpeg with some additions.

#define SPEAKER_FRONT_LEFT             0x00000001
#define SPEAKER_FRONT_RIGHT            0x00000002
#define SPEAKER_FRONT_CENTER           0x00000004
#define SPEAKER_LOW_FREQUENCY          0x00000008
#define SPEAKER_BACK_LEFT              0x00000010
#define SPEAKER_BACK_RIGHT             0x00000020
#define SPEAKER_FRONT_LEFT_OF_CENTER   0x00000040
#define SPEAKER_FRONT_RIGHT_OF_CENTER  0x00000080
#define SPEAKER_BACK_CENTER            0x00000100
#define SPEAKER_SIDE_LEFT              0x00000200
#define SPEAKER_SIDE_RIGHT             0x00000400
#define SPEAKER_TOP_CENTER             0x00000800
#define SPEAKER_TOP_FRONT_LEFT         0x00001000
#define SPEAKER_TOP_FRONT_CENTER       0x00002000
#define SPEAKER_TOP_FRONT_RIGHT        0x00004000
#define SPEAKER_TOP_BACK_LEFT          0x00008000
#define SPEAKER_TOP_BACK_CENTER        0x00010000
#define SPEAKER_TOP_BACK_RIGHT         0x00020000

See also