


视频同步的原理(How Video Syncs)


显示时间戳和解码时间戳(PTS and DTS)

幸运的是,音频流和视频有何时播放的信息。音频流有采样频率而视频流有每秒可播放的帧数。然而,如果我们只是简单的使用帧数乘以帧率来同步视频的话,那么很可能就超出音频的播放速度了。然而,从视频流里面解码出来的packet很可能会有DTS和PTS。要理解这2个值的含义,你需要知道视频数如何存储的。有些格式, 比如MPEG,使用了B帧(B表示双向预测帧). I 帧代表一个完整的图像。P帧依赖前面的I帧,P帧可能只是前面I帧的一个变换或者其他什么东西。B帧和P是有一样的地方,但是准确的预测B帧需要考虑到其前后双向的帧。这就是为什么调用avcode_decode_video2没有得到一个完整的帧的原因。
假设有个视频,它的帧呈现的格式为:I B B P. 这样当我们显示B帧的时候,就需要知道P帧的内容。因为这个原因,帧的存储格式很可能为: I P B B. 这就是为什么每个帧都有一个解码时间戳DTS和显示时间戳PTS。解码时间戳告诉我们,什么时候需要解码帧,而显示时间戳告诉我们什么时候需要显示某个帧。因此流很可能是如下的情况:
PTS: 1 4 2 3
DTS: 1 2 3 4
Stream: I P B B
通常PTS与DTS只有当流中含有B帧时才会不一样。当我们从av_read_frame()获取一个包时,该包里面将含有PTS和DTS这2个数据。但是我们真正想要知道的是新解码的raw frame的PTS,这样我们才能知道何时显示它。


现在,我们已经知道了什么时候显示一个特定的视频帧,但是实际该如何操作呢?这里有个方法:当我们播放一个帧时,计算出下一个帧显示的时间。 然后设置一个定时器,当超时后我们发出一个刷新的事件来刷新视频。正如你所想象的,我们检查下一个视频帧的PTS值和系统时钟对比确定定时器的时间。这个方法可以工作,但是有2个问题需要处理。
1.  第一个问题就是下一个PTS该如何计算。你有可能会想我们可以在当前的PTS上面加上帧速率,但其实这是错误的。然而,某些种类的视频要求帧重复。这意味着我们应该重复当前帧特定的次数。这可能导致程序过早的显示下一个视频帧。而我们需要做点解释。
2. 第二个问题正如程序显示的那样,视频和音频会嗡嗡作响,而不是同步好了。如果一切都很好了,我们就没必要担心了。但是你的电脑并不完美,许多食品文件也不是很好。因此我们有3个选择:将音频同步到视频;将视频同步到音频;或者将两者都同步到外部时钟(比如计算机的时钟)。而目前,我们打算将视频同步到音频。

编码:获取帧的PTS(Coding it: getting the frame PTS)

So now we've got our PTS all set. 
Now we've got to take care of the two synchronization problems we talked about above. 
We're going to define a function called  synchronize_video  that will update the PTS to be in sync with everything. 
This function will also finally deal with cases where we don't get a PTS value for our frame. 
At the same time we need to keep track of when the next frame is expected so we can set our refresh rate properly. 
We can accomplish this by using an internal video_clock value which keeps track of how much time has passed according to the video. 
We add this value to our big struct.
typedef struct VideoState {double          video_clock; // pts of last decoded frame / predicted pts of next decoded frame

Here's the synchronize_video function, which is pretty self-explanatory:

这里是synchronize_video函数,自己看看, 不需要过多说明了。

double synchronize_video(VideoState *is, AVFrame *src_frame, double pts) {double frame_delay;if(pts != 0) {/* if we have pts, set video clock to it */is->video_clock = pts;} else {/* if we aren't given a pts, set it to the clock */pts = is->video_clock;}/* update the video clock */frame_delay = av_q2d(is->video_st->codec->time_base);/* if we are repeating a frame, adjust clock accordingly */frame_delay += src_frame->repeat_pict * (frame_delay * 0.5);is->video_clock += frame_delay;return pts;

You'll notice we account for repeated frames in this function, too.


Now let's get our proper PTS and queue up the frame using queue_picture, adding a new pts argument:


// Did we get a video frame?if(frameFinished) {pts = synchronize_video(is, pFrame, pts);if(queue_picture(is, pFrame, pts) < 0) {break;}}

The only thing that changes about  queue_picture  is that we save that pts value to the VideoPicture structure that we queue up.

So we have to add a pts variable to the struct and add a line of code:
typedef struct VideoPicture {...double pts;
int queue_picture(VideoState *is, AVFrame *pFrame, double pts) {... stuff ...if(vp->bmp) {... convert picture ...vp->pts = pts;... alert queue ...}

So now we've got pictures lining up onto our picture queue with proper PTS values, so let's take a look at our video refreshing function.

You may recall from last time that we just faked it and put a refresh of 80ms. 
Well, now we're going to find out how to actually figure it out.
Our strategy is going to be to predict the time of the next PTS by simply measuring the time between the previous pts and this one. 
At the same time, we need to sync the video to the audio. 
We're going to make an  audio clock
an internal value that keeps track of what position the audio we're playing is at. 
It's like the digital readout on any mp3 player. 
Since we're synching the video to the audio, the video thread uses this value to figure out if it's too far ahead or too far behind.
We'll get to the implementation later; 
for now let's assume we have a  get_audio_clock  function that will give us the time on the audio clock. 
Once we have that value, though, what do we do if the video and audio are out of sync? 
It would silly to simply try and leap to the correct packet through seeking or something. 
Instead, we're just going to adjust the value we've calculated for the next refresh: 
if the PTS is too far behind the audio time, we double our calculated delay. 
if the PTS is too far ahead of the audio time, we simply refresh as quickly as possible. 
Now that we have our adjusted refresh time, or  delay , we're going to compare that with our computer's clock by keeping a running  frame_timer
This frame timer will sum up all of our calculated delays while playing the movie. 
In other words, this frame_timer is  what time it should be when we display the next frame.  
We simply add the new delay to the frame timer, compare it to the time on our computer's clock, and use that value to schedule the next refresh. 
This might be a bit confusing, so study the code carefully:
void video_refresh_timer(void *userdata) {VideoState *is = (VideoState *)userdata;VideoPicture *vp;double actual_delay, delay, sync_threshold, ref_clock, diff;if(is->video_st) {if(is->pictq_size == 0) {schedule_refresh(is, 1);} else {vp = &is->pictq[is->pictq_rindex];delay = vp->pts - is->frame_last_pts; /* the pts from last time */if(delay <= 0 || delay >= 1.0) {/* if incorrect delay, use previous one */delay = is->frame_last_delay;}/* save for next time */is->frame_last_delay = delay;is->frame_last_pts = vp->pts;/* update delay to sync to audio */ref_clock = get_audio_clock(is);diff = vp->pts - ref_clock;/* Skip or repeat the frame. Take delay into accountFFPlay still doesn't "know if this is the best guess." */sync_threshold = (delay > AV_SYNC_THRESHOLD) ? delay : AV_SYNC_THRESHOLD;if(fabs(diff) < AV_NOSYNC_THRESHOLD) {if(diff <= -sync_threshold) {delay = 0;} else if(diff >= sync_threshold) {delay = 2 * delay;}}is->frame_timer += delay;/* computer the REAL delay */actual_delay = is->frame_timer - (av_gettime() / 1000000.0);if(actual_delay < 0.010) {/* Really it should skip the picture instead */actual_delay = 0.010;}schedule_refresh(is, (int)(actual_delay * 1000 + 0.5));/* show the picture! */video_display(is);/* update queue for next picture! */if(++is->pictq_rindex == VIDEO_PICTURE_QUEUE_SIZE) {is->pictq_rindex = 0;}SDL_LockMutex(is->pictq_mutex);is->pictq_size--;SDL_CondSignal(is->pictq_cond);SDL_UnlockMutex(is->pictq_mutex);}} else {schedule_refresh(is, 100);}

There are a few checks we make:

first, we make sure that the delay between the PTS and the previous PTS make sense. 
If it doesn't we just guess and use the last delay. 
Next, we make sure we have a synch threshold because things are never going to be perfectly in synch.
ffplay uses 0.01 for its value. 
We also make sure that the synch threshold is never smaller than the gaps in between PTS values. 
Finally, we make the minimum refresh value 10 milliseconds*.
We added a bunch of variables to the big struct so don't forget to check the code. 
Also, don't forget to initialize the frame timer and the initial previous frame delay in stream_component_open:
is->frame_timer = (double)av_gettime() / 1000000.0;
is->frame_last_delay = 40e-3;

Synching: The Audio Clock

Now it's time for us to implement the audio clock.

We can update the clock time in our audio_decode_frame function, which is where we decode the audio. 
Now, remember that we don't always process a new packet every time we call this function, so there are two places we have to update the clock at. 
The first place is where we get the new packet: we simply set the audio clock to the packet's PTS. 
Then if a packet has multiple frames, we keep time the audio play by counting the number of samples and multiplying them by the given samples-per-second rate.
So once we have the packet: 
    /* if update, update the audio clock w/pts */if(pkt->pts != AV_NOPTS_VALUE) {is->audio_clock = av_q2d(is->audio_st->time_base)*pkt->pts;}

And once we are processing the packet:

      /* Keep audio_clock up-to-date */pts = is->audio_clock;*pts_ptr = pts;n = 2 * is->audio_st->codec->channels;is->audio_clock += (double)data_size /(double)(n * is->audio_st->codec->sample_rate);

A few fine details: the template of the function has changed to include  pts_ptr , so make sure you change that.

pts_ptr  is a pointer we use to inform  audio_callback  the pts of the audio packet. 
This will be used next time for synchronizing the audio with the video.
Now we can finally implement our get_audio_clock function. 
It's not as simple as getting the is->audio_clock value, thought. 
Notice that we set the audio PTS every time we process it, but if you look at the audio_callback function, it takes time to move all the data from our audio packet into our output buffer. 
That means that the value in our audio clock could be too far ahead. 
So we have to check how much we have left to write. 
Here's the complete code:
double get_audio_clock(VideoState *is) {double pts;int hw_buf_size, bytes_per_sec, n;pts = is->audio_clock; /* maintained in the audio thread */hw_buf_size = is->audio_buf_size - is->audio_buf_index;bytes_per_sec = 0;n = is->audio_st->codec->channels * 2;if(is->audio_st) {bytes_per_sec = is->audio_st->codec->sample_rate * n;}if(bytes_per_sec) {pts -= (double)hw_buf_size / bytes_per_sec;}return pts;


#include "stdafx.h"
#ifdef TUTORIAL_05
// tutorial05.c
// A pedagogical video player that really works!
// This tutorial was written by Stephen Dranger (dranger@gmail.com).
// Code based on FFplay, Copyright (c) 2003 Fabrice Bellard,
// and a tutorial by Martin Bohme (boehme@inb.uni-luebeckREMOVETHIS.de)
// Tested on Gentoo, CVS version 5/01/07 compiled with GCC 4.1.1
// Use the Makefile to build all the samples.
// Run using
// tutorial05 myvideofile.mpg
// to play the video.extern "C"
#include "libavutil/avstring.h"
#include "libavutil/mathematics.h"
#include "libavutil/pixdesc.h"
#include "libavutil/imgutils.h"
#include "libavutil/dict.h"
#include "libavutil/parseutils.h"
#include "libavutil/samplefmt.h"
#include "libavutil/avassert.h"
#include "libavutil/time.h"
#include "libavformat/avformat.h"
#include "libavdevice/avdevice.h"
#include "libswscale/swscale.h"
#include "libavutil/opt.h"
#include "libavcodec/avfft.h"
#include "libswresample/swresample.h"#include "SDL1.2/SDL.h"
#include "SDL1.2/SDL_thread.h"
}#pragma comment(lib, "avcodec.lib")
#pragma comment(lib, "avformat.lib")
#pragma comment(lib, "avutil.lib")
#pragma comment(lib, "avdevice.lib")
#pragma comment(lib, "avfilter.lib")
#pragma comment(lib, "postproc.lib")
#pragma comment(lib, "swresample.lib")
#pragma comment(lib, "swscale.lib")
#pragma comment(lib, "SDL.lib")#ifdef __MINGW32__
#undef main /* Prevents SDL from overriding main() */
#endif#include <stdio.h>
#include <math.h>#define SDL_AUDIO_BUFFER_SIZE               1024
#define MAX_AUDIO_FRAME_SIZE                192000#define MAX_AUDIOQ_SIZE                     (5 * 16 * 1024)
#define MAX_VIDEOQ_SIZE                     (5 * 256 * 1024)#define AV_SYNC_THRESHOLD                   0.01
#define AV_NOSYNC_THRESHOLD                 10.0#define FF_ALLOC_EVENT                      (SDL_USEREVENT)
#define FF_REFRESH_EVENT                    (SDL_USEREVENT + 1)
#define FF_QUIT_EVENT                       (SDL_USEREVENT + 2)#define VIDEO_PICTURE_QUEUE_SIZE            1// BD
int         g_iIndex_video_pkt = 0;
// EDtypedef struct PacketQueue {AVPacketList *first_pkt, *last_pkt;int nb_packets;int size;SDL_mutex *mutex;SDL_cond *cond;
} PacketQueue;typedef struct VideoPicture {SDL_Overlay *bmp;int width, height; /* source height & width */int allocated;double pts;// BDAVPictureType type;int iIndex;// ED
} VideoPicture;typedef struct VideoState {AVFormatContext *pFormatCtx;int             videoStream, audioStream;// audiodouble          audio_clock;AVStream        *audio_st;PacketQueue     audioq;AVFrame         audio_frame;uint8_t         audio_buf[(MAX_AUDIO_FRAME_SIZE * 3) / 2];unsigned int    audio_buf_size;unsigned int    audio_buf_index;AVPacket        audio_pkt;uint8_t         *audio_pkt_data;int             audio_pkt_size;int             audio_hw_buf_size;double          frame_timer;double          frame_last_pts;double          frame_last_delay;// videodouble          video_clock; ///<pts of last decoded frame / predicted pts of next decoded frameAVStream        *video_st;PacketQueue     videoq;VideoPicture    pictq[VIDEO_PICTURE_QUEUE_SIZE];int             pictq_size, pictq_rindex, pictq_windex;SDL_mutex       *pictq_mutex;SDL_cond        *pictq_cond;SDL_Thread      *parse_tid;SDL_Thread      *video_tid;char            filename[1024];int             quit;AVIOContext     *io_context;struct SwsContext *sws_ctx;
} VideoState;SDL_Surface     *screen;/* Since we only have one decoding thread, the Big Struct
can be global in case we need it. */
VideoState *global_video_state;struct SwrContext *swr_ctx;
DECLARE_ALIGNED(16, uint8_t, audio_buf2)[MAX_AUDIO_FRAME_SIZE * 4];static inline double rint(double x)
{return x >= 0 ? floor(x + 0.5) : ceil(x - 0.5);
}void packet_queue_init(PacketQueue *q) {memset(q, 0, sizeof(PacketQueue));q->mutex = SDL_CreateMutex();q->cond = SDL_CreateCond();
}int packet_queue_put(PacketQueue *q, AVPacket *pkt) {AVPacketList *pkt1;if( av_dup_packet(pkt) < 0 ) {return -1;}pkt1 = (AVPacketList *)av_malloc(sizeof(AVPacketList));if( !pkt1 ) {return -1;}pkt1->pkt = *pkt;pkt1->next = NULL;SDL_LockMutex(q->mutex);if( !q->last_pkt ) {q->first_pkt = pkt1;} else {q->last_pkt->next = pkt1;}q->last_pkt = pkt1;q->nb_packets ++;q->size += pkt1->pkt.size;SDL_CondSignal(q->cond);SDL_UnlockMutex(q->mutex);return 0;
static int packet_queue_get(PacketQueue *q, AVPacket *pkt, int block)
{AVPacketList *pkt1;int ret;SDL_LockMutex(q->mutex);for( ; ; ) {if( global_video_state->quit ) {ret = -1;break;}pkt1 = q->first_pkt;if( pkt1 ) {q->first_pkt = pkt1->next;if( !q->first_pkt ) {q->last_pkt = NULL;}q->nb_packets --;q->size -= pkt1->pkt.size;*pkt = pkt1->pkt;av_free(pkt1);ret = 1;break;} else if( !block ) {ret = 0;break;} else {SDL_CondWait(q->cond, q->mutex);}}SDL_UnlockMutex(q->mutex);return ret;
}double get_audio_clock(VideoState *is)
{double pts;int hw_buf_size, bytes_per_sec, n;// 当前音频buffer播放完的时间pts = is->audio_clock; /* maintained in the audio thread */// 当前音频buffer的剩余时间hw_buf_size = is->audio_buf_size - is->audio_buf_index;bytes_per_sec = 0;// 计算音频1秒钟所需的数据量n = is->audio_st->codec->channels * 2;if( is->audio_st ) {bytes_per_sec = is->audio_st->codec->sample_rate * n;}// (double)hw_buf_size / bytes_per_sec;为当前音频播放完还需要的时间// pts减去上面的值得到当前的时间戳if( bytes_per_sec ) {pts -= (double)hw_buf_size / bytes_per_sec;}return pts;
}int audio_decode_frame(VideoState *is, double *pts_ptr)
{int len1, data_size = 0, n;AVPacket *pkt = &is->audio_pkt;double pts;for( ; ; ) {while( is->audio_pkt_size > 0 ) {int got_frame;len1 = avcodec_decode_audio4(is->audio_st->codec, &is->audio_frame, &got_frame, pkt);if( len1 < 0 ) {/* if error, skip frame */is->audio_pkt_size = 0;break;}if( got_frame ) {AVCodecContext* aCodecCtx = is->audio_st->codec;uint64_t dec_channel_layout =(aCodecCtx->channel_layout && aCodecCtx->channels == av_get_channel_layout_nb_channels(aCodecCtx->channel_layout)) ?aCodecCtx->channel_layout : av_get_default_channel_layout(aCodecCtx->channels);AVSampleFormat tgtFmt = AV_SAMPLE_FMT_S16;if( aCodecCtx->sample_fmt != tgtFmt ) {// 需要重采样if( swr_ctx == NULL ) {swr_ctx = swr_alloc();swr_ctx = swr_alloc_set_opts(swr_ctx,dec_channel_layout, tgtFmt, aCodecCtx->sample_rate,dec_channel_layout, aCodecCtx->sample_fmt, aCodecCtx->sample_rate, 0, NULL);if( !swr_ctx || swr_init(swr_ctx) < 0 ) {assert(false);}}if( swr_ctx ) {const uint8_t **in = (const uint8_t **)is->audio_frame.extended_data;uint8_t *out[] = {audio_buf2};int out_count = sizeof(audio_buf2) / aCodecCtx->channels / av_get_bytes_per_sample(aCodecCtx->sample_fmt);int len2 = swr_convert(swr_ctx, out, out_count, in, is->audio_frame.nb_samples);if( len2 < 0 ) {LogPrintfA("swr_convert() failed\n");break;}if( len2 == out_count ) {LogPrintfA("warning: audio buffer is probably too small\n");swr_init(swr_ctx);}data_size = len2 * aCodecCtx->channels * av_get_bytes_per_sample(tgtFmt);memcpy(is->audio_buf, audio_buf2, data_size);}} else {// 不需要重采样data_size = av_samples_get_buffer_size(NULL,aCodecCtx->channels,is->audio_frame.nb_samples,aCodecCtx->sample_fmt,1);assert(data_size <= is->audio_buf_size);memcpy(is->audio_buf, is->audio_frame.data[0], data_size);}}is->audio_pkt_data += len1;is->audio_pkt_size -= len1;if( data_size <= 0 ) {/* No data yet, get more frames */continue;}pts = is->audio_clock;*pts_ptr = pts;// 2为: 16位采样, 一次占用的字节数, 若非16位采样, 就要修改字节数了// 这里是为了计算播放本次音频buffer所需的时间n = 2 * is->audio_st->codec->channels;is->audio_clock += (double)data_size /(double)(n * is->audio_st->codec->sample_rate);//LogPrintf(_T("is->audio_clock: %f, plus: %f\n"), is->audio_clock, (double)data_size / (double)(n * is->audio_st->codec->sample_rate) );/* We have data, return it and come back for more later */return data_size;}if( pkt->data ) {av_free_packet(pkt);}if( is->quit ) {return -1;}/* next packet */if( packet_queue_get(&is->audioq, pkt, 1) < 0 ) {return -1;}is->audio_pkt_data = pkt->data;is->audio_pkt_size = pkt->size;/* if update, update the audio clock w/pts */if( pkt->pts != AV_NOPTS_VALUE ) {is->audio_clock = av_q2d(is->audio_st->time_base)*pkt->pts;}}
}void audio_callback(void *userdata, Uint8 *stream, int len)
{VideoState *is = (VideoState *)userdata;int len1, audio_size;double pts;while( len > 0 ) {if(is->audio_buf_index >= is->audio_buf_size) {/* We have already sent all our data; get more */audio_size = audio_decode_frame(is, &pts);if( audio_size < 0 ) {/* If error, output silence */is->audio_buf_size = 1024;memset(is->audio_buf, 0, is->audio_buf_size);} else {is->audio_buf_size = audio_size;}is->audio_buf_index = 0;}len1 = is->audio_buf_size - is->audio_buf_index;if( len1 > len ) {len1 = len;}memcpy(stream, (uint8_t *)is->audio_buf + is->audio_buf_index, len1);len -= len1;stream += len1;is->audio_buf_index += len1;}
}static Uint32 sdl_refresh_timer_cb(Uint32 interval, void *opaque)
{SDL_Event event;event.type = FF_REFRESH_EVENT;event.user.data1 = opaque;SDL_PushEvent(&event);return 0; /* 0 means stop timer */
}/* schedule a video refresh in 'delay' ms */
static void schedule_refresh(VideoState *is, int delay)
{SDL_AddTimer(delay, sdl_refresh_timer_cb, is);
}void video_display(VideoState *is)
{SDL_Rect rect;VideoPicture *vp;//AVPicture pict;float aspect_ratio;int w, h, x, y;//int i;vp = &is->pictq[is->pictq_rindex];if( vp->bmp ) {if(is->video_st->codec->sample_aspect_ratio.num == 0) {aspect_ratio = 0;} else {aspect_ratio = av_q2d(is->video_st->codec->sample_aspect_ratio) *is->video_st->codec->width / is->video_st->codec->height;}if( aspect_ratio <= 0.0 ) {aspect_ratio = (float)is->video_st->codec->width /(float)is->video_st->codec->height;}h = screen->h;w = ((int)rint(h * aspect_ratio)) & -3;if( w > screen->w ) {w = screen->w;h = ((int)rint(w / aspect_ratio)) & -3;}x = (screen->w - w) / 2;y = (screen->h - h) / 2;rect.x = x;rect.y = y;rect.w = w;rect.h = h;// BD//LogPrintfA("---------------------------------------------------------- [%05d] refresh bmp, Packet:%d, type: %s, pts: %f\n",//    ::GetCurrentThreadId(), vp->iIndex, GetPictureTypeString(vp->type).c_str(), vp->pts);// EDSDL_DisplayYUVOverlay(vp->bmp, &rect);}
}void video_refresh_timer(void *userdata)
{VideoState *is = (VideoState *)userdata;VideoPicture *vp;double actual_delay, delay, sync_threshold, ref_clock, diff;if( is->video_st ) {if( is->pictq_size == 0 ) {schedule_refresh(is, 1);} else {// 目标: 计算下一帧图像的显示时间vp = &is->pictq[is->pictq_rindex];// frame_last_pts存着上一帧图像的pts, 用当前帧的pts减去上一帧的pts, 从而计算出一个估计的delay值// 该delay值是上一帧图像已播放的时长delay = vp->pts - is->frame_last_pts; /* the pts from last time */// BDstatic int iIndex = 0;//LogPrintfA("上一帧播放时长为: %f\n", delay);// ED// 这个delay值有一个范围,如果超出范围的话,则用再上一次的delay值if( delay <= 0 || delay >= 1.0 ) {/* if incorrect delay, use previous one */delay = is->frame_last_delay;}/* save for next time */is->frame_last_delay = delay;// 将当前帧的pts保存下来is->frame_last_pts = vp->pts;/* update delay to sync to audio */// ref_clock: audio播放的时间戳ref_clock = get_audio_clock(is);diff = vp->pts - ref_clock;// BD//LogPrintfA("vp->pts: %f, ref_clock: %f, diff: %f; delay: %f\n", vp->pts, ref_clock, diff, delay);// ED/* Skip or repeat the frame. Take delay into accountFFPlay still doesn't "know if this is the best guess." */// delay和AV_SYNC_THRESHOLD之间取一个最大值// newsync_threshold = FFMAX(delay, AV_SYNC_THRESHOLD);// 时间正负在(-0.01, 0.01)范围之外需要重新计算延迟if( fabs(diff) < AV_NOSYNC_THRESHOLD ) {if( diff <= -sync_threshold ) { // 如果diff是个很小的负数,则说明当前视频帧已经落后于主时钟源了,下一帧图像应该快点显示,所以delay=0delay = 0;} else if( diff >= sync_threshold ) { // 如果diff是一个比较大的正数,则说明当前视频帧已经超前于主时钟源了,下一帧图像应该延迟显示delay = 2 * delay;} else {// diff是个可接受的数值, 可直接使用上一个delay// LogPrintfA("abcd\n");}} else {assert(false);}// BDdouble frame_timer_old = is->frame_timer;// ED// frame_timer是一个delay累加的值, 加上delay后, frame_timer即为下一帧图像开始显示的时间is->frame_timer += delay;/* computer the REAL delay */// frame_timer减去当前系统时钟,得到一个actual_delay值actual_delay = is->frame_timer - (av_gettime() / 1000000.0);if( actual_delay < 0.010 ) {/* Really it should skip the picture instead */actual_delay = 0.010;}schedule_refresh(is, (int)(actual_delay * 1000 + 0.5));/* show the picture! */video_display(is);/* update queue for next picture! */if( ++ is->pictq_rindex == VIDEO_PICTURE_QUEUE_SIZE ) {is->pictq_rindex = 0;}SDL_LockMutex(is->pictq_mutex);is->pictq_size--;SDL_CondSignal(is->pictq_cond);SDL_UnlockMutex(is->pictq_mutex);}} else {schedule_refresh(is, 100);}
}void alloc_picture(void *userdata) {VideoState *is = (VideoState *)userdata;VideoPicture *vp;vp = &is->pictq[is->pictq_windex];if( vp->bmp ) {// we already have one make another, bigger/smallerSDL_FreeYUVOverlay(vp->bmp);}// Allocate a place to put our YUV image on that screenvp->bmp = SDL_CreateYUVOverlay(is->video_st->codec->width,is->video_st->codec->height,SDL_YV12_OVERLAY,screen);vp->width = is->video_st->codec->width;vp->height = is->video_st->codec->height;SDL_LockMutex(is->pictq_mutex);vp->allocated = 1;SDL_CondSignal(is->pictq_cond);SDL_UnlockMutex(is->pictq_mutex);
}int queue_picture(VideoState *is, AVFrame *pFrame, double pts, int iIndex)
{VideoPicture *vp;AVPicture pict;/* wait until we have space for a new pic */SDL_LockMutex(is->pictq_mutex);while( is->pictq_size >= VIDEO_PICTURE_QUEUE_SIZE && !is->quit ) {SDL_CondWait(is->pictq_cond, is->pictq_mutex);}SDL_UnlockMutex(is->pictq_mutex);if( is->quit ) {return -1;}// windex is set to 0 initiallyvp = &is->pictq[is->pictq_windex];/* allocate or resize the buffer! */if( !vp->bmp ||vp->width != is->video_st->codec->width ||vp->height != is->video_st->codec->height ) {SDL_Event event;vp->allocated = 0;/* we have to do it in the main thread */event.type = FF_ALLOC_EVENT;event.user.data1 = is;SDL_PushEvent(&event);/* wait until we have a picture allocated */SDL_LockMutex(is->pictq_mutex);while( !vp->allocated && !is->quit ) {SDL_CondWait(is->pictq_cond, is->pictq_mutex);}SDL_UnlockMutex(is->pictq_mutex);if( is->quit ) {return -1;}}/* We have a place to put our picture on the queue *//* If we are skipping a frame, do we set this to nullbut still return vp->allocated = 1? */if( vp->bmp ) {SDL_LockYUVOverlay(vp->bmp);/* point pict at the queue */pict.data[0] = vp->bmp->pixels[0];pict.data[1] = vp->bmp->pixels[2];pict.data[2] = vp->bmp->pixels[1];pict.linesize[0] = vp->bmp->pitches[0];pict.linesize[1] = vp->bmp->pitches[2];pict.linesize[2] = vp->bmp->pitches[1];// Convert the image into YUV format that SDL usessws_scale(is->sws_ctx,(uint8_t const * const *)pFrame->data,pFrame->linesize,0,is->video_st->codec->height,pict.data,pict.linesize);SDL_UnlockYUVOverlay(vp->bmp);vp->pts = pts;// BDvp->type = pFrame->pict_type;vp->iIndex = iIndex;// ED/* now we inform our display thread that we have a pic ready */if( ++ is->pictq_windex == VIDEO_PICTURE_QUEUE_SIZE ) {is->pictq_windex = 0;}SDL_LockMutex(is->pictq_mutex);is->pictq_size++;SDL_UnlockMutex(is->pictq_mutex);}return 0;
}/** 这里就是简单的计算video_clock的值*/
double synchronize_video(VideoState *is, AVFrame *src_frame, double pts)
{double frame_delay;if( pts != 0 ) {/* if we have pts, set video clock to it */is->video_clock = pts;} else {/* if we aren't given a pts, set it to the clock */pts = is->video_clock;}/* update the video clock */// 若视频帧率为25fps, 则1帧耗时0.04s, 而这里time_base的值为1/50, 即0.02秒frame_delay = av_q2d(is->video_st->codec->time_base);/* if we are repeating a frame, adjust clock accordingly */frame_delay += src_frame->repeat_pict * (frame_delay * 0.5);is->video_clock += frame_delay;return pts;
}uint64_t global_video_pkt_pts = AV_NOPTS_VALUE;/* These are called whenever we allocate a frame
* buffer. We use this to store the global_pts in
* a frame at the time it is allocated.
int our_get_buffer(struct AVCodecContext *c, AVFrame *pic, int flags)
{int ret = avcodec_default_get_buffer(c, pic);uint64_t *pts = (uint64_t *)av_malloc(sizeof(uint64_t));*pts = global_video_pkt_pts;pic->opaque = pts;return ret;
void our_release_buffer(struct AVCodecContext *c, AVFrame *pic)
{if( pic ) {av_freep(&pic->opaque);}avcodec_default_release_buffer(c, pic);
}int video_thread(void *arg)
{VideoState *is = (VideoState *)arg;AVPacket pkt1, *packet = &pkt1;int frameFinished;AVFrame *pFrame;double pts;pFrame = av_frame_alloc();for( ; ; ) {if( packet_queue_get(&is->videoq, packet, 1) < 0 ) {// means we quit getting packetsbreak;}pts = 0;// Save global pts to be stored in pFrame in first callglobal_video_pkt_pts = packet->pts;// Decode video frameint iRet = avcodec_decode_video2(is->video_st->codec, pFrame, &frameFinished, packet);if( iRet < 0 ) {// errorint a=2;int b=a;} else if( iRet == 0 ) {// no frame could be decompressedint a=2;int b=a;} else {// ok}// BDLogPrintfA("[%05d] Packet:%d, type: %s, dts: %I64d, pts: %I64d\n", ::GetCurrentThreadId(),++ g_iIndex_video_pkt, GetPictureTypeString(pFrame->pict_type).c_str(),packet->dts, packet->pts);// EDif( packet->dts == AV_NOPTS_VALUE&& pFrame->opaque&& *(uint64_t*)pFrame->opaque != AV_NOPTS_VALUE ) {pts = *(uint64_t *)pFrame->opaque;} else if( packet->dts != AV_NOPTS_VALUE ) {pts = packet->dts;} else {pts = 0;}// 根据pts来计算一桢在整个视频中的时间位置pts *= av_q2d(is->video_st->time_base);// BDAVRational a1 = is->video_st->r_frame_rate;int64_t ptsBst = av_frame_get_best_effort_timestamp(pFrame);double ptsOld = pts;if( AV_PICTURE_TYPE_I == pFrame->pict_type ) {int a=2;int b=a;}// ED// Did we get a video frame?if( frameFinished ) {pts = synchronize_video(is, pFrame, pts);// BDif( ptsOld != pts ) {int a=2;int b=a;}//LogPrintfA("[%05d] Packet:%d, truely pts: %f\n", ::GetCurrentThreadId(), g_iIndex_video_pkt, pts);// EDif( queue_picture(is, pFrame, pts, g_iIndex_video_pkt) < 0 ) {break;}}av_free_packet(packet);}av_free(pFrame);return 0;
}int stream_component_open(VideoState *is, int stream_index)
{AVFormatContext *pFormatCtx = is->pFormatCtx;AVCodecContext *codecCtx = NULL;AVCodec *codec = NULL;AVDictionary *optionsDict = NULL;SDL_AudioSpec wanted_spec, spec;if(stream_index < 0 || stream_index >= pFormatCtx->nb_streams) {return -1;}// Get a pointer to the codec context for the video streamcodecCtx = pFormatCtx->streams[stream_index]->codec;if( codecCtx->codec_type == AVMEDIA_TYPE_AUDIO ) {// Set audio settings from codec infowanted_spec.freq = codecCtx->sample_rate;wanted_spec.format = AUDIO_S16SYS;wanted_spec.channels = codecCtx->channels;wanted_spec.silence = 0;wanted_spec.samples = SDL_AUDIO_BUFFER_SIZE;wanted_spec.callback = audio_callback;wanted_spec.userdata = is;if( SDL_OpenAudio(&wanted_spec, &spec) < 0 ) {fprintf(stderr, "SDL_OpenAudio: %s\n", SDL_GetError());return -1;}is->audio_hw_buf_size = spec.size;}codec = avcodec_find_decoder(codecCtx->codec_id);if( !codec || (avcodec_open2(codecCtx, codec, &optionsDict) < 0) ) {fprintf(stderr, "Unsupported codec!\n");return -1;}switch( codecCtx->codec_type ) {case AVMEDIA_TYPE_AUDIO:{is->audioStream = stream_index;is->audio_st = pFormatCtx->streams[stream_index];is->audio_buf_size = 0;is->audio_buf_index = 0;memset(&is->audio_pkt, 0, sizeof(is->audio_pkt));packet_queue_init(&is->audioq);SDL_PauseAudio(0);}break;case AVMEDIA_TYPE_VIDEO:{is->videoStream = stream_index;is->video_st = pFormatCtx->streams[stream_index];is->frame_timer = (double)av_gettime() / 1000000.0;is->frame_last_delay = 40e-3;// BDLogPrintfA("初始化: frame_timer: %f, frame_last_delay: %f\n", is->frame_timer, is->frame_last_delay);// EDpacket_queue_init(&is->videoq);is->video_tid = SDL_CreateThread(video_thread, is);is->sws_ctx =sws_getContext(is->video_st->codec->width,is->video_st->codec->height,is->video_st->codec->pix_fmt,is->video_st->codec->width,is->video_st->codec->height,PIX_FMT_YUV420P,SWS_BILINEAR,NULL,NULL,NULL);codecCtx->get_buffer2 = our_get_buffer;codecCtx->release_buffer = our_release_buffer;}break;default:break;}return 0;
}int decode_interrupt_cb(void *opaque) {return (global_video_state && global_video_state->quit);
}int decode_thread(void *arg)
{VideoState *is = (VideoState *)arg;AVFormatContext *pFormatCtx = NULL;AVPacket pkt1, *packet = &pkt1;AVDictionary *io_dict = NULL;AVIOInterruptCB callback;int video_index = -1;int audio_index = -1;int i;is->videoStream = -1;is->audioStream = -1;global_video_state = is;// will interrupt blocking functions if we quit!callback.callback = decode_interrupt_cb;callback.opaque = is;if( avio_open2(&is->io_context, is->filename, 0, &callback, &io_dict) ) {fprintf(stderr, "Unable to open I/O for %s\n", is->filename);return -1;}// Open video fileif( avformat_open_input(&pFormatCtx, is->filename, NULL, NULL) != 0 ) {return -1; // Couldn't open file}is->pFormatCtx = pFormatCtx;// Retrieve stream informationif( avformat_find_stream_info(pFormatCtx, NULL) < 0 ) {return -1; // Couldn't find stream information}// Dump information about file onto standard errorav_dump_format(pFormatCtx, 0, is->filename, 0);// Find the first video streamfor( i = 0; i < pFormatCtx->nb_streams; i++ ) {if( pFormatCtx->streams[i]->codec->codec_type == AVMEDIA_TYPE_VIDEO &&video_index < 0 ) {video_index = i;}if( pFormatCtx->streams[i]->codec->codec_type == AVMEDIA_TYPE_AUDIO &&audio_index < 0 ) {audio_index = i;}}if( audio_index >= 0 ) {stream_component_open(is, audio_index);}if( video_index >= 0 ) {stream_component_open(is, video_index);}if( is->videoStream < 0 || is->audioStream < 0 ) {fprintf(stderr, "%s: could not open codecs\n", is->filename);goto fail;}// Begin -- set video size by oldmtn// Make a screen to put our videoint width = pFormatCtx->streams[video_index]->codec->width;int height = pFormatCtx->streams[video_index]->codec->height;screen = SDL_SetVideoMode(width, height, 0, 0);if( !screen ) {fprintf(stderr, "SDL: could not set video mode - exiting\n");exit(1);}// End -- set video size by oldmtn// main decode loopfor( ; ; ) {if( is->quit ) {break;}// seek stuff goes hereif( is->audioq.size > MAX_AUDIOQ_SIZE ||is->videoq.size > MAX_VIDEOQ_SIZE ) {SDL_Delay(10);continue;}if( av_read_frame(is->pFormatCtx, packet) < 0 ) {if( is->pFormatCtx->pb->error == 0 ) {SDL_Delay(100); /* no error; wait for user input */continue;} else {break;}}// Is this a packet from the video stream?if( packet->stream_index == is->videoStream ) {packet_queue_put(&is->videoq, packet);} else if( packet->stream_index == is->audioStream ) {packet_queue_put(&is->audioq, packet);} else {av_free_packet(packet);}}/* all done - wait for it */while( !is->quit ) {SDL_Delay(100);}fail:{SDL_Event event;event.type = FF_QUIT_EVENT;event.user.data1 = is;SDL_PushEvent(&event);}return 0;
}int _tmain() {SDL_Event       event;VideoState      *is;is = (VideoState *)av_mallocz(sizeof(VideoState));//char szFile[] = "cuc_ieschool.flv";char szFile[] = "edu.flv";//char szFile[] = "song.flv";//char szFile[] = "drj.mkv";//char szFile[] = "city.mkv";// Register all formats and codecsav_register_all();if(SDL_Init(SDL_INIT_VIDEO | SDL_INIT_AUDIO | SDL_INIT_TIMER)) {fprintf(stderr, "Could not initialize SDL - %s\n", SDL_GetError());exit(1);}av_strlcpy(is->filename, szFile, 1024);is->pictq_mutex = SDL_CreateMutex();is->pictq_cond = SDL_CreateCond();schedule_refresh(is, 40);is->parse_tid = SDL_CreateThread(decode_thread, is);if(!is->parse_tid) {av_free(is);return -1;}for( ; ; ) {SDL_WaitEvent(&event);switch(event.type) {case FF_QUIT_EVENT:case SDL_QUIT:is->quit = 1;/** If the video has finished playing, then both the picture and* audio queues are waiting for more data.  Make them stop* waiting and terminate normally.*/SDL_CondSignal(is->audioq.cond);SDL_CondSignal(is->videoq.cond);SDL_Quit();exit(0);break;case FF_ALLOC_EVENT:alloc_picture(event.user.data1);break;case FF_REFRESH_EVENT:video_refresh_timer(event.user.data1);break;default:break;}}return 0;
#endif // TUTORIAL_05





所以对于一个电影,帧是这样来显示的:I B B P。现在我们需要在显示B帧之前知道P帧中的信息。因此,帧可能会按照这样的方式来存储:IPBB。这就是为什么我们会有一个解码时间戳和一个显示时间戳的原因。解码时间戳告诉我们什么时候需要解码,显示时间戳告诉我们什么时候需要显示。所以,在这种情况下,我们的流可以是这样的:

PTS: 1 4 2 3

DTS: 1 2 3 4

Stream: I P B B




不用担心,因为有另外一种办法可以找到帧的PTS,我们可以让程序自己来重新排序包。我们保存一帧的第一个包的PTS:这将作为整个这一帧的 PTS。我们 可以通过函数avcodec_decode_video()来计算出哪个包是一帧的第一个包。怎样实现呢?任何时候当一个包开始一帧的时候,avcodec_decode_video()将调用一个函数来为一帧申请一个缓冲。当然,ffmpeg允许我们重新定义那个分配内存的函数。所以我们制作了一个新的函数来保存一个包的时间戳。








double pts;

for(;;) {

if(packet_queue_get(&is->videoq, packet, 1) < 0) {

// means we quit getting packets



pts = 0;

// Decode video frame

len1 = avcodec_decode_video(is->video_st->codec,

pFrame, &frameFinished,

packet->data, packet->size);

if(packet->dts != AV_NOPTS_VALUE) {

pts = packet->dts;

} else {

pts = 0;


pts *= av_q2d(is->video_st->time_base);//这里就是1/frame_rate这里是1/25



int get_buffer(struct AVCodecContext *c, AVFrame *pic);

void release_buffer(struct AVCodecContext *c, AVFrame *pic);


uint64_t global_video_pkt_pts = AV_NOPTS_VALUE;


int our_get_buffer(struct AVCodecContext *c, AVFrame *pic) {

int ret = avcodec_default_get_buffer(c, pic);

uint64_t *pts = av_malloc(sizeof(uint64_t));

*pts = global_video_pkt_pts;

pic->opaque = pts;

return ret;


void our_release_buffer(struct AVCodecContext *c, AVFrame *pic) {

if(pic) av_freep(&pic->opaque);

avcodec_default_release_buffer(c, pic);




codecCtx->get_buffer = our_get_buffer;

codecCtx->release_buffer = our_release_buffer;


for(;;) {

if(packet_queue_get(&is->videoq, packet, 1) < 0) {

// means we quit getting packets



pts = 0;

// Save global pts to be stored in pFrame in first call

global_video_pkt_pts = packet->pts;

// Decode video frame

len1 = avcodec_decode_video(is->video_st->codec, pFrame, &frameFinished,

packet->data, packet->size);

if(packet->dts == AV_NOPTS_VALUE

&& pFrame->opaque && *(uint64_t*)pFrame->opaque != AV_NOPTS_VALUE) {

pts = *(uint64_t *)pFrame->opaque;

} else if(packet->dts != AV_NOPTS_VALUE) {

pts = packet->dts;

} else {

pts = 0;


pts *= av_q2d(is->video_st->time_base);

技术提示:你可能已经注意到我们使用int64来表示PTS。这是因为PTS是以整型来保存的。这个值是一个时间戳相当于时间的度量,用来以流的 time_base为单位进行时间度量。例如,如果一个流是24帧每秒,值为42的PTS表示这一帧应该排在第42个帧的位置如果我们每秒有24帧(这里并不完全正确)。



现在我们得到了PTS。我们要注意前面讨论到的两个同步问题。我们将定义一个函数叫做synchronize_video,它可以更新同步的 PTS。这个函数也能最终处理我们得不到PTS的情况。同时我们要知道下一帧的时间以便于正确设置刷新速率。我们可以使用内部的反映当前视频已经播放时间的时钟 video_clock来完成这个功能。我们把这些值添加到大结构体中。

typedef struct VideoState {

double video_clock; ///


double synchronize_video(VideoState *is, AVFrame *src_frame, double pts) {

double frame_delay;

if(pts != 0) {

is->video_clock = pts;

} else {

pts = is->video_clock;


frame_delay = av_q2d(is->video_st->codec->time_base);

frame_delay += src_frame->repeat_pict * (frame_delay * 0.5);

is->video_clock += frame_delay;

return pts;




// Did we get a video frame?

if(frameFinished) {

pts = synchronize_video(is, pFrame, pts);

if(queue_picture(is, pFrame, pts) < 0) {





typedef struct VideoPicture {


double pts;


int queue_picture(VideoState *is, AVFrame *pFrame, double pts) {

... stuff ...

if(vp->bmp) {

... convert picture ...

vp->pts = pts;

... alert queue ...



我们的策略是通过简单计算前一帧和现在这一帧的时间戳来预测出下一个时间戳的时间。同时,我们需要同步视频到音频。我们将设置一个音频时间 audio clock;一个内部值记录了我们正在播放的音频的位置。就像从任意的mp3播放器中读出来的数字一样。既然我们把视频同步到音频,视频线程使用这个值来算出是否太快还是太慢。

我们将在后面来实现这些代码;现在我们假设我们已经有一个可以给我们音频时间的函数get_audio_clock。一旦我们有了这个值,我们在音频和视频失去同步的时候应该做些什么呢?简单而有点笨的办法是试着用跳过正确帧或者其它的方式来解决。作为一种替代的手段,我们会调整下次刷新的值;如果时间戳太落后于音频时间,我们加倍计算延迟。如果时间戳太领先于音频时间,我们将尽可能快的刷新。既然我们有了调整过的时间和延迟,我们将把它和我们通过 frame_timer计算出来的时间进行比较。这个帧时间frame_timer将会统计出电影播放中所有的延时。换句话说,这个 frame_timer就是指我们什么时候来显示下一帧。我们简单的添加新的帧定时器延时,把它和电脑的系统时间进行比较,然后使用那个值来调度下一次刷新。这可能有点难以理解,所以请认真研究代码:

void video_refresh_timer(void *userdata) {

VideoState *is = (VideoState *)userdata;

VideoPicture *vp;

double actual_delay, delay, sync_threshold, ref_clock, diff;

if(is->video_st) {

if(is->pictq_size == 0) {

schedule_refresh(is, 1);

} else {

vp = &is->pictq[is->pictq_rindex];

delay = vp->pts - is->frame_last_pts;

if(delay <= 0 || delay >= 1.0) {

delay = is->frame_last_delay;


is->frame_last_delay = delay;

is->frame_last_pts = vp->pts;

ref_clock = get_audio_clock(is);

diff = vp->pts - ref_clock;

sync_threshold = (delay > AV_SYNC_THRESHOLD) ? delay : AV_SYNC_THRESHOLD;

if(fabs(diff) < AV_NOSYNC_THRESHOLD) {

if(diff <= -sync_threshold) {

delay = 0;

} else if(diff >= sync_threshold) {

delay = 2 * delay;



is->frame_timer += delay;

actual_delay = is->frame_timer - (av_gettime() / 1000000.0);

if(actual_delay < 0.010) {

actual_delay = 0.010;


schedule_refresh(is, (int)(actual_delay * 1000 + 0.5));


if(++is->pictq_rindex == VIDEO_PICTURE_QUEUE_SIZE) {

is->pictq_rindex = 0;







} else {

schedule_refresh(is, 100);



is->frame_timer = (double)av_gettime() / 1000000.0;记录播放的初始时刻。然后在每次播放之前首先计算要播放的那帧的时刻,算好了时间才好设定定时器进行刷新。
actual_delay = is->frame_timer - (av_gettime() / 1000000.0);表示具体设定需要延迟的时间(is->frame_timer是将要播放的时刻,av_gettime() / 1000000.0是当前的时刻,它们的差值就是实际要延迟的时间)。



is->frame_timer = (double)av_gettime() / 1000000.0; 获得系统时间作为第一帧播放的初始时刻,之后每一帧延迟delay都被累加进来,因此is->frame_timer就是当前帧的播放时间。

is->frame_timer += delay;


diff = vp->pts - ref_clock;


is->frame_timer += delay;

actual_delay = is->frame_timer - (av_gettime() / 1000000.0);



我们给大结构体添加了很多的变量,所以不要忘记检查一下代码。同时也不要忘记在函数streame_component_open中初始化帧时间frame_timer和前面的帧延迟frame delay:


is->frame_timer = (double)av_gettime() / 1000000.0;

is->frame_last_delay = 40e-3;



if(pkt->pts != AV_NOPTS_VALUE) {
is->audio_clock = av_q2d(is->audio_st->time_base)*pkt->pts;


pts = is->audio_clock;
*pts_ptr = pts;
n = 2 * is->audio_st->codec->channels;
is->audio_clock += (double)data_size /
(double)(n * is->audio_st->codec->sample_rate);

double get_audio_clock(VideoState *is) {
double pts;
int hw_buf_size, bytes_per_sec, n;
pts = is->audio_clock;
hw_buf_size = is->audio_buf_size - is->audio_buf_index;
bytes_per_sec = 0;
n = is->audio_st->codec->channels * 2;
if(is->audio_st) {
bytes_per_sec = is->audio_st->codec->sample_rate * n;
if(bytes_per_sec) {
pts -= (double)hw_buf_size / bytes_per_sec;
return pts;

