這幾天在做dxva2硬件加速,找不到什麼資料,翻譯了一下微軟的兩篇相關文檔。這是第二篇,記錄用ffmpeg實現dxva2。
第一篇翻譯的Direct3D device manager,鏈接:http://www.cnblogs.com/betterwgo/p/6124588.html
第二篇翻譯的在DirectShow中支持DXVA 2.0,鏈接:http://www.cnblogs.com/betterwgo/p/6125351.html
在做dxva2的過程中,參考了許多網上的代碼,這些代碼又多參考VLC和ffmpeg的例子。
1.ffmpeg支持dxva2硬件加速的格式
當前我所使用的ffmpeg的版本是3.2,支持dxva2硬件加速的有以下幾種文件格式: AV_CODEC_ID_MPEG2VIDEO、AV_CODEC_ID_H264、AV_CODEC_ID_VC1、AV_CODEC_ID_WMV3、AV_CODEC_ID_HEVC、AV_CODEC_ID_VP9。ffmpeg識別為這幾種格式的文件都可以嘗試使用dxva2做硬件加速。但這並不代表是這幾種格式的文件就一定支持dxva2硬件加速,因為我就遇到了一個AV_CODEC_ID_HEVC文件在初始化配置dxva2的過程中會失敗,PotPlayer在播放這個文件時也不能用dxva2硬件加速。
2.一些要注意的地方
(1)ffmpeg只實現了dxva2硬件解碼的內容。我所翻譯的第一篇、第二篇文章的那部分內容除了解碼部分,都要由用戶自己去實現。這一塊頗有一點復雜,不過不用擔心,VLC和ffmpeg都有例子可以參考。這一部分的內容需要對以上兩篇翻譯的內容有所了解才能比較好的理解代碼的邏輯。
(2)要想真正看到硬件加速的效果,解碼後的數據不建議再copy到內存中用CPU進行處理。我一開始就是因為拷貝到吧解碼後的數據又copy回內存導致不僅gpu的使用率看不到明顯變化,而且CPU的使用率相對於不使用dxva2反而提高了。後來我修改為把解碼後的數據直接顯示出來,GPU使用率一下子就上去了,CPU使用率也降下來了。
3.關鍵代碼
由於網上已經有從ffmpeg的例子中分離出來的配置dxva2解碼器的代碼,所以具體實現起來也相當簡單。
(1)頭文件ffmpeg_dxva2.h
/* * This file is part of FFmpeg. * * FFmpeg is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * FFmpeg is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with FFmpeg; if not, write to the Free Software * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA */ #ifndef FFMPEG_DXVA2_H #define FFMPEG_DXVA2_H //#include "windows.h" extern "C"{ #include "libavcodec/avcodec.h" #include "libavutil/pixfmt.h" #include "libavutil/rational.h" } enum HWAccelID { HWACCEL_NONE = 0, HWACCEL_AUTO, HWACCEL_VDPAU, HWACCEL_DXVA2, HWACCEL_VDA, HWACCEL_VIDEOTOOLBOX, HWACCEL_QSV, }; typedef struct AVStream AVStream; typedef struct AVCodecContext AVCodecContext; typedef struct AVCodec AVCodec; typedef struct AVFrame AVFrame; typedef struct AVDictionary AVDictionary; typedef struct InputStream { int file_index; AVStream *st; int discard; /* true if stream data should be discarded */ int user_set_discard; int decoding_needed; /* non zero if the packets must be decoded in 'raw_fifo', see DECODING_FOR_* */ #define DECODING_FOR_OST 1 #define DECODING_FOR_FILTER 2 AVCodecContext *dec_ctx; AVCodec *dec; AVFrame *decoded_frame; AVFrame *filter_frame; /* a ref of decoded_frame, to be sent to filters */ int64_t start; /* time when read started */ /* predicted dts of the next packet read for this stream or (when there are * several frames in a packet) of the next frame in current packet (in AV_TIME_BASE units) */ int64_t next_dts; int64_t dts; ///< dts of the last packet read for this stream (in AV_TIME_BASE units) int64_t next_pts; ///< synthetic pts for the next decode frame (in AV_TIME_BASE units) int64_t pts; ///< current pts of the decoded frame (in AV_TIME_BASE units) int wrap_correction_done; int64_t filter_in_rescale_delta_last; int64_t min_pts; /* pts with the smallest value in a current stream */ int64_t max_pts; /* pts with the higher value in a current stream */ int64_t nb_samples; /* number of samples in the last decoded audio frame before looping */ double ts_scale; int saw_first_ts; int showed_multi_packet_warning; AVDictionary *decoder_opts; AVRational framerate; /* framerate forced with -r */ int top_field_first; int guess_layout_max; int autorotate; int resample_height; int resample_width; int resample_pix_fmt; int resample_sample_fmt; int resample_sample_rate; int resample_channels; uint64_t resample_channel_layout; int fix_sub_duration; struct { /* previous decoded subtitle and related variables */ int got_output; int ret; AVSubtitle subtitle; } prev_sub; struct sub2video { int64_t last_pts; int64_t end_pts; AVFrame *frame; int w, h; } sub2video; int dr1; /* decoded data from this stream goes into all those filters * currently video and audio only */ //InputFilter **filters; //int nb_filters; //int reinit_filters; /* hwaccel options */ enum HWAccelID hwaccel_id; char *hwaccel_device; /* hwaccel context */ enum HWAccelID active_hwaccel_id; void *hwaccel_ctx; void(*hwaccel_uninit)(AVCodecContext *s); int(*hwaccel_get_buffer)(AVCodecContext *s, AVFrame *frame, int flags); int(*hwaccel_retrieve_data)(AVCodecContext *s, AVFrame *frame); enum AVPixelFormat hwaccel_pix_fmt; enum AVPixelFormat hwaccel_retrieved_pix_fmt; /* stats */ // combined size of all the packets read uint64_t data_size; /* number of packets successfully read for this stream */ uint64_t nb_packets; // number of frames/samples retrieved from the decoder uint64_t frames_decoded; uint64_t samples_decoded; } InputStream; int dxva2_init(AVCodecContext *s, HWND hwnd); int dxva2_retrieve_data_call(AVCodecContext *s, AVFrame *frame); #endif /* FFMPEG_DXVA2_H */
以上代碼其實是從ffmpeg中抽出來的。HWAccelID為硬件加速器的ID,在初始化配置解碼器的時候會用到,我們實際用的是HWACCEL_DXVA2。InputStream這個結構體水很深,包含了一些在初始化配置中會用到的數據,還包含了一些函數指針,注意這些函數指針的使用。我要說的其實是以下兩個函數:
int dxva2_init(AVCodecContext *s, HWND hwnd); int dxva2_retrieve_data_call(AVCodecContext *s, AVFrame *frame);
函數dxva2_init是初始化配置dxva2解碼器的入口,配置工作主要就是由它來完成。在文章最後我會上傳整個工程的源碼。前兩篇翻譯的文章的內容幾乎都是為它服務的,我上傳的源碼中的ffmpeg_dxva2.cpp主要就是為了做這一部分工作,當然dxva2_retrieve_data_call也包含在了其中。要想看懂dxva2_init函數的邏輯,你最好看看前面兩篇翻譯的內容,另外你還需要懂一點D3D渲染的基本知識。
函數dxva2_retrieve_data_call用來獲得解碼後的數據的。如我前面所說,如果不必要,最後不要再把它copy出來,直接用D3D繪制出來就行了,把數據從GPU再copy到內存中會極大的降低GPU的使用率,在我的試驗中這樣做完全沒達到GPU加速的目的,反而是CPU的使用率增高了。所以你在我上傳的源碼中看到的是直接繪制數據。
static int dxva2_retrieve_data(AVCodecContext *s, AVFrame *frame) { LPDIRECT3DSURFACE9 surface = (LPDIRECT3DSURFACE9)frame->data[3]; InputStream *ist = (InputStream *)s->opaque; DXVA2Context *ctx = (DXVA2Context *)ist->hwaccel_ctx; EnterCriticalSection(&cs); //直接渲染 ctx->d3d9device->Clear(0, NULL, D3DCLEAR_TARGET, D3DCOLOR_XRGB(0, 0, 0), 1.0f, 0); ctx->d3d9device->BeginScene(); if (m_pBackBuffer) { m_pBackBuffer->Release(); m_pBackBuffer = NULL; } ctx->d3d9device->GetBackBuffer(0, 0, D3DBACKBUFFER_TYPE_MONO, &m_pBackBuffer); GetClientRect(d3dpp.hDeviceWindow, &m_rtViewport); ctx->d3d9device->StretchRect(surface, NULL, m_pBackBuffer, &m_rtViewport, D3DTEXF_LINEAR); ctx->d3d9device->EndScene(); ctx->d3d9device->Present(NULL, NULL, NULL, NULL); LeaveCriticalSection(&cs); return 0; }
(2)實現
有了ffmpeg_dxva2.h和ffmpeg_dxva2.cpp這兩個文件後實現起來就非常簡單了。
主流程中配置dxva2部分的代碼:
switch (codec->id) { case AV_CODEC_ID_MPEG2VIDEO: case AV_CODEC_ID_H264: case AV_CODEC_ID_VC1: case AV_CODEC_ID_WMV3: case AV_CODEC_ID_HEVC: case AV_CODEC_ID_VP9: { codecctx->thread_count = 1; // Multithreading is apparently not compatible with hardware decoding InputStream *ist = new InputStream(); ist->hwaccel_id = HWACCEL_AUTO; ist->active_hwaccel_id = HWACCEL_AUTO; ist->hwaccel_device = "dxva2"; ist->dec = codec; ist->dec_ctx = codecctx; codecctx->opaque = ist; if (dxva2_init(codecctx, hWnd) == 0) { codecctx->get_buffer2 = ist->hwaccel_get_buffer; codecctx->get_format = GetHwFormat; codecctx->thread_safe_callbacks = 1; break; } bAccel = false; break; } default: bAccel = false; break; }
可以看出其中主要就是調用dxva2_init函數。
解碼並渲染的代碼:
if (pkt.stream_index == videoindex) { int got_picture = 0; DWORD t_start = GetTickCount(); int bytes_used = avcodec_decode_video2(codecctx, picture, &got_picture, &pkt); if (got_picture) { if (bAccel) { //獲取數據同時渲染 dxva2_retrieve_data_call(codecctx, picture); DWORD t_end = GetTickCount(); printf("dxva2 time using: %lu\n", t_end - t_start); } else { //非dxva2情形 if (img_convert_ctx &&pFrameBGR && out_buffer) { //轉換數據並渲染 sws_scale(img_convert_ctx, (const uint8_t* const*)picture->data, picture->linesize, 0, codecctx->height, pFrameBGR->data, pFrameBGR->linesize); m_D3DVidRender.Render_YUV(out_buffer, picture->width, picture->height); DWORD t_end = GetTickCount(); printf("normal time using: %lu\n", t_end - t_start); } } count++; } av_packet_unref(&pkt); }
在dxva2_init函數中其實已經對D3D的渲染進行了配置,所以只需要穿進去窗口句柄,然後調用dxva2_retrieve_data_call函數就可以直接把數據繪制在句柄所對應得窗口上。
源碼:http://download.csdn.net/download/qq_33892166/9698473
工程基於VS2013,需要對ffmpeg有一定了解,對D3D也要有一定的了解。注意在代碼中修改要播放的視頻的路徑,否則控制台退出不正常,VS會卡死的,我也是剛發現有這個問題。最後自己修改一下控制台的代碼。直接把調出控制台的代碼注釋掉也可以正常運行,不過就看不到調試信息了。