H.266/VVC帧间预测技术学习:双向光流技术(Bi-directional optical flow, BDOF)

双向光流技术(Bi-directional optical flow，BDOF )

VVC中采用了双向光流技术来修正双向预测的像素值。BDOF以前被称为BIO，包含在JEM参考软件中。与JEM中的版本相比，VVC中的BDOF是一个更简单的版本，计算复杂度更低，特别是在乘法次数和乘法器的大小方面。

BDOF仅应用于亮度分量。顾名思义，BDOF模式基于光流概念，该概念假定对象的运动是平滑的。 BDOF用于修正CU的4×4子块的双向预测信号。对于每个4×4子块，通过最小化L0和L1预测像素之间的差异来计算运动修正（V_x，V_y）。然后，将运动修正值用于调整4x4子块中的双向预测样本值。

应用条件：

使用“true”双向预测模式对CU进行编码，即按显示顺序，两个参考帧一个在当前帧之前，另一个参考帧在当前帧之后
从两个参考帧到当前帧的距离（即POC差）相同
两个参考帧均为短期参考帧。
CU未使用仿射模式或SbTMVP Merge模式
CU有超过64个亮度像素
CU高度和CU宽度均大于或等于8个亮度像素
BCW权重相等
当前CU未启用WP
当前CU不使用CIIP模式

对于应用DMVR的块，应用BDOF时，除了需要满足上述条件，还需要满足 dmvrSad > 2 * subWidth * subHeight：

VTM10.0中用于判断是否使用BDOF技术的代码：

  bool bioApplied = false;if (pu.cs->sps->getBDOFEnabledFlag() && (!pu.cs->picHeader->getDisBdofFlag())){if (pu.cu->affine || m_subPuMC){bioApplied = false;}else{const bool biocheck0 =!((WPScalingParam::isWeighted(wp0) || WPScalingParam::isWeighted(wp1)) && slice.getSliceType() == B_SLICE);const bool biocheck1 = !(pps.getUseWP() && slice.getSliceType() == P_SLICE);if (biocheck0&& biocheck1&& PU::isBiPredFromDifferentDirEqDistPoc(pu)&& (pu.Y().height >= 8)&& (pu.Y().width >= 8)&& ((pu.Y().height * pu.Y().width) >= 128)){bioApplied = true;}}if (bioApplied && pu.ciipFlag){bioApplied = false;}if (bioApplied && pu.cu->smvdMode){bioApplied = false;}if (pu.cu->cs->sps->getUseBcw() && bioApplied && pu.cu->BcwIdx != BCW_DEFAULT){bioApplied = false;}} //判断是否使用BDOF

处理过程：

1、计算梯度：对前向和后向的预测值分别计算水平和垂直梯度。梯度值通过相邻值直接相减得到。

计算梯度的代码：（基于VTM10.0）

template<bool PAD = true>
void gradFilterCore(Pel* pSrc, int srcStride, int width, int height, int gradStride, Pel* gradX, Pel* gradY, const int bitDepth)
{Pel* srcTmp = pSrc + srcStride + 1;Pel* gradXTmp = gradX + gradStride + 1;Pel* gradYTmp = gradY + gradStride + 1;int  shift1 = 6;for (int y = 0; y < (height - 2 * BIO_EXTEND_SIZE); y++){for (int x = 0; x < (width - 2 * BIO_EXTEND_SIZE); x++){gradYTmp[x] = ( srcTmp[x + srcStride] >> shift1 ) - ( srcTmp[x - srcStride] >> shift1 );//垂直梯度gradXTmp[x] = ( srcTmp[x + 1] >> shift1 ) - ( srcTmp[x - 1] >> shift1 );//水平梯度}gradXTmp += gradStride;gradYTmp += gradStride;srcTmp += srcStride;}if (PAD){gradXTmp = gradX + gradStride + 1;gradYTmp = gradY + gradStride + 1;for (int y = 0; y < (height - 2 * BIO_EXTEND_SIZE); y++){// 边界梯度gradXTmp[-1] = gradXTmp[0];gradXTmp[width - 2 * BIO_EXTEND_SIZE] = gradXTmp[width - 2 * BIO_EXTEND_SIZE - 1];gradXTmp += gradStride;gradYTmp[-1] = gradYTmp[0];gradYTmp[width - 2 * BIO_EXTEND_SIZE] = gradYTmp[width - 2 * BIO_EXTEND_SIZE - 1];gradYTmp += gradStride;}gradXTmp = gradX + gradStride;gradYTmp = gradY + gradStride;::memcpy(gradXTmp - gradStride, gradXTmp, sizeof(Pel)*(width));::memcpy(gradXTmp + (height - 2 * BIO_EXTEND_SIZE)*gradStride, gradXTmp + (height - 2 * BIO_EXTEND_SIZE - 1)*gradStride, sizeof(Pel)*(width));::memcpy(gradYTmp - gradStride, gradYTmp, sizeof(Pel)*(width));::memcpy(gradYTmp + (height - 2 * BIO_EXTEND_SIZE)*gradStride, gradYTmp + (height - 2 * BIO_EXTEND_SIZE - 1)*gradStride, sizeof(Pel)*(width));}
}

2、计算梯度的自相关和互相关S1,S2,S3,S5,S6

相关代码如下（基于VTM10.0）：

void calcBIOSumsCore(const Pel* srcY0Tmp, const Pel* srcY1Tmp, Pel* gradX0, Pel* gradX1, Pel* gradY0, Pel* gradY1, int xu, int yu, const int src0Stride, const int src1Stride, const int widthG, const int bitDepth, int* sumAbsGX, int* sumAbsGY, int* sumDIX, int* sumDIY, int* sumSignGY_GX)
{int shift4 = 4;int shift5 = 1;for (int y = 0; y < 6; y++){for (int x = 0; x < 6; x++){int tmpGX = (gradX0[x] + gradX1[x]) >> shift5;int tmpGY = (gradY0[x] + gradY1[x]) >> shift5;int tmpDI = (int)((srcY1Tmp[x] >> shift4) - (srcY0Tmp[x] >> shift4));*sumAbsGX += (tmpGX < 0 ? -tmpGX : tmpGX); //S1*sumAbsGY += (tmpGY < 0 ? -tmpGY : tmpGY); //S5*sumDIX += (tmpGX < 0 ? -tmpDI : (tmpGX == 0 ? 0 : tmpDI)); //S3*sumDIY += (tmpGY < 0 ? -tmpDI : (tmpGY == 0 ? 0 : tmpDI)); //S6*sumSignGY_GX += (tmpGY < 0 ? -tmpGX : (tmpGY == 0 ? 0 : tmpGX)); //S2}srcY1Tmp += src1Stride;srcY0Tmp += src0Stride;gradX0 += widthG;gradX1 += widthG;gradY0 += widthG;gradY1 += widthG;}
}

3. 使用互相关项和自相关项计算运动修正值

4.基于运动修正和梯度，为4×4子块中的每个样本计算以下调整：

上述计算中，乘数不超过15位，并且BDOF处理中的中间参数的最大位宽保持在32位以内。

void addBIOAvgCore(const Pel* src0, int src0Stride, const Pel* src1, int src1Stride, Pel *dst, int dstStride, const Pel *gradX0, const Pel *gradX1, const Pel *gradY0, const Pel*gradY1, int gradStride, int width, int height, int tmpx, int tmpy, int shift, int offset, const ClpRng& clpRng)
{int b = 0;for (int y = 0; y < height; y++){for (int x = 0; x < width; x += 4){b = tmpx * (gradX0[x] - gradX1[x]) + tmpy * (gradY0[x] - gradY1[x]);
#if JVET_R0351_HIGH_BIT_DEPTH_SUPPORTdst[x] = ClipPel(rightShift((src0[x] + src1[x] + b + offset), shift), clpRng);
#elsedst[x] = ClipPel((int16_t)rightShift((src0[x] + src1[x] + b + offset), shift), clpRng);
#endifb = tmpx * (gradX0[x + 1] - gradX1[x + 1]) + tmpy * (gradY0[x + 1] - gradY1[x + 1]);
#if JVET_R0351_HIGH_BIT_DEPTH_SUPPORTdst[x + 1] = ClipPel(rightShift((src0[x + 1] + src1[x + 1] + b + offset), shift), clpRng);
#elsedst[x + 1] = ClipPel((int16_t)rightShift((src0[x + 1] + src1[x + 1] + b + offset), shift), clpRng);
#endifb = tmpx * (gradX0[x + 2] - gradX1[x + 2]) + tmpy * (gradY0[x + 2] - gradY1[x + 2]);
#if JVET_R0351_HIGH_BIT_DEPTH_SUPPORTdst[x + 2] = ClipPel(rightShift((src0[x + 2] + src1[x + 2] + b + offset), shift), clpRng);
#elsedst[x + 2] = ClipPel((int16_t)rightShift((src0[x + 2] + src1[x + 2] + b + offset), shift), clpRng);
#endifb = tmpx * (gradX0[x + 3] - gradX1[x + 3]) + tmpy * (gradY0[x + 3] - gradY1[x + 3]);
#if JVET_R0351_HIGH_BIT_DEPTH_SUPPORTdst[x + 3] = ClipPel(rightShift((src0[x + 3] + src1[x + 3] + b + offset), shift), clpRng);
#elsedst[x + 3] = ClipPel((int16_t)rightShift((src0[x + 3] + src1[x + 3] + b + offset), shift), clpRng);
#endif}dst += dstStride;       src0 += src0Stride;     src1 += src1Stride;gradX0 += gradStride; gradX1 += gradStride; gradY0 += gradStride; gradY1 += gradStride;}
}

为了计算梯度值，需要生成在当前CU边界之外的列表k (k=0,1)中的一些预测样本。如下图所示，VVC中的BDOF在CU的边界周围使用了一个扩展的行/列。为了控制生成边界外预测样本的计算复杂性，通过在附近的整数位置获取参考样本（在坐标上使用floor运算）来生成扩展区域（白色位置）中的预测样本。直接在不进行插值的情况下，正常的8抽头运动补偿插值滤波器用于在CU（灰色位置）内生成预测样本。这些扩展的样本值仅用于梯度计算中。对于BDOF过程中的其余步骤，如果需要在CU边界之外的任何样本和梯度值，则从其最近的邻居中填充（即重复）它们。

当CU的宽度和/或高度大于16个亮度像素时，它将被分成宽度和/或高度等于16个亮度像素的子块，并且在BDOF处理中将子块边界视为CU边界。 BDOF处理的最大单位大小限制为16x16。对于每个子块，可以跳过BDOF过程。当初始L0和L1预测像素之间的SAD小于阈值时( 阈值设置为等于（8 * W *（H >> 1），其中W表示子块宽度，H表示子块高度)，将BDOF处理不应用于子块。为了避免SAD计算的额外复杂性，在DMVR过程中计算的初始L0和L1预测像素之间的SAD值在此重复使用。

如果当前块启用了BCW，即BCW权重值不相等，则双向光流将被禁用。类似地，如果为当前块启用了WP，即，对于两个参考图片中的任一个，luma_weight_lx_flag为1，则也禁用了BDOF。当CU以对称MVD模式或CIIP模式编码时，BDOF也被禁用。

应用BDOF的代码及注释如下：（基于VTM10.0）

void InterPrediction::applyBiOptFlow(const PredictionUnit &pu, const CPelUnitBuf &yuvSrc0, const CPelUnitBuf &yuvSrc1, const int &refIdx0, const int &refIdx1, PelUnitBuf &yuvDst, const BitDepths &clipBitDepths)
{// 当前PU的宽度和高度const int     height = yuvDst.Y().height; const int     width = yuvDst.Y().width;// 当前PU进行扩展后的宽度和高度int           heightG = height + 2 * BIO_EXTEND_SIZE;int           widthG = width + 2 * BIO_EXTEND_SIZE;int           offsetPos = widthG*BIO_EXTEND_SIZE + BIO_EXTEND_SIZE;Pel*          gradX0 = m_gradX0;Pel*          gradX1 = m_gradX1;Pel*          gradY0 = m_gradY0;Pel*          gradY1 = m_gradY1;int           stridePredMC = widthG + 2;const Pel*    srcY0 = m_filteredBlockTmp[2][COMPONENT_Y] + stridePredMC + 1;const Pel*    srcY1 = m_filteredBlockTmp[3][COMPONENT_Y] + stridePredMC + 1;const int     src0Stride = stridePredMC;const int     src1Stride = stridePredMC;Pel*          dstY = yuvDst.Y().buf;const int     dstStride = yuvDst.Y().stride;const Pel*    srcY0Temp = srcY0;const Pel*    srcY1Temp = srcY1;// 遍历参考列表for (int refList = 0; refList < NUM_REF_PIC_LIST_01; refList++){Pel* dstTempPtr = m_filteredBlockTmp[2 + refList][COMPONENT_Y] + stridePredMC + 1;Pel* gradY = (refList == 0) ? m_gradY0 : m_gradY1;Pel* gradX = (refList == 0) ? m_gradX0 : m_gradX1;// 计算梯度xBioGradFilter(dstTempPtr, stridePredMC, widthG, heightG, widthG, gradX, gradY, clipBitDepths.recon[toChannelType(COMPONENT_Y)]);Pel* padStr = m_filteredBlockTmp[2 + refList][COMPONENT_Y] + 2 * stridePredMC + 2;for (int y = 0; y< height; y++){padStr[-1] = padStr[0];padStr[width] = padStr[width - 1];padStr += stridePredMC;}padStr = m_filteredBlockTmp[2 + refList][COMPONENT_Y] + 2 * stridePredMC + 1;::memcpy(padStr - stridePredMC, padStr, sizeof(Pel)*(widthG));::memcpy(padStr + height*stridePredMC, padStr + (height - 1)*stridePredMC, sizeof(Pel)*(widthG));}const ClpRng& clpRng = pu.cu->cs->slice->clpRng(COMPONENT_Y);const int   bitDepth = clipBitDepths.recon[toChannelType(COMPONENT_Y)];
#if JVET_R0351_HIGH_BIT_DEPTH_SUPPORTconst int   shiftNum = IF_INTERNAL_FRAC_BITS(bitDepth) + 1;
#elseconst int   shiftNum = IF_INTERNAL_PREC + 1 - bitDepth;
#endifconst int   offset = (1 << (shiftNum - 1)) + 2 * IF_INTERNAL_OFFS;const int   limit = ( 1 << 4 ) - 1;int xUnit = (width >> 2);int yUnit = (height >> 2);Pel *dstY0 = dstY;gradX0 = m_gradX0; gradX1 = m_gradX1;gradY0 = m_gradY0; gradY1 = m_gradY1;for (int yu = 0; yu < yUnit; yu++){for (int xu = 0; xu < xUnit; xu++){int tmpx = 0, tmpy = 0;int sumAbsGX = 0, sumAbsGY = 0, sumDIX = 0, sumDIY = 0;int sumSignGY_GX = 0;Pel* pGradX0Tmp = m_gradX0 + (xu << 2) + (yu << 2) * widthG;Pel* pGradX1Tmp = m_gradX1 + (xu << 2) + (yu << 2) * widthG;Pel* pGradY0Tmp = m_gradY0 + (xu << 2) + (yu << 2) * widthG;Pel* pGradY1Tmp = m_gradY1 + (xu << 2) + (yu << 2) * widthG;const Pel* SrcY1Tmp = srcY1 + (xu << 2) + (yu << 2) * src1Stride;const Pel* SrcY0Tmp = srcY0 + (xu << 2) + (yu << 2) * src0Stride;// 计算自相关和互相关g_pelBufOP.calcBIOSums(SrcY0Tmp, SrcY1Tmp, pGradX0Tmp, pGradX1Tmp, pGradY0Tmp, pGradY1Tmp, xu, yu, src0Stride, src1Stride, widthG, bitDepth, &sumAbsGX, &sumAbsGY, &sumDIX, &sumDIY, &sumSignGY_GX);tmpx = (sumAbsGX == 0 ? 0 : rightShiftMSB(sumDIX << 2, sumAbsGX));tmpx = Clip3(-limit, limit, tmpx);int     mainsGxGy = sumSignGY_GX >> 12;int     secsGxGy = sumSignGY_GX & ((1 << 12) - 1);int     tmpData = tmpx * mainsGxGy;tmpData = ((tmpData << 12) + tmpx*secsGxGy) >> 1;tmpy = (sumAbsGY == 0 ? 0 : rightShiftMSB(((sumDIY << 2) - tmpData), sumAbsGY));tmpy = Clip3(-limit, limit, tmpy);srcY0Temp = srcY0 + (stridePredMC + 1) + ((yu*src0Stride + xu) << 2);srcY1Temp = srcY1 + (stridePredMC + 1) + ((yu*src0Stride + xu) << 2);gradX0 = m_gradX0 + offsetPos + ((yu*widthG + xu) << 2);gradX1 = m_gradX1 + offsetPos + ((yu*widthG + xu) << 2);gradY0 = m_gradY0 + offsetPos + ((yu*widthG + xu) << 2);gradY1 = m_gradY1 + offsetPos + ((yu*widthG + xu) << 2);dstY0 = dstY + ((yu*dstStride + xu) << 2);// 进行预测修正xAddBIOAvg4(srcY0Temp, src0Stride, srcY1Temp, src1Stride, dstY0, dstStride, gradX0, gradX1, gradY0, gradY1, widthG, (1 << 2), (1 << 2), (int)tmpx, (int)tmpy, shiftNum, offset, clpRng);}  // xu}  // yu
}

H.266/VVC帧间预测技术学习:双向光流技术(Bi-directional optical flow, BDOF)相关推荐

H.266/VVC帧间预测总结
一.帧间预测基本原理帧间预测是利用视频帧与帧之间的相关性,去除视频帧间的时间冗余信息.统计表明,帧间差绝对值超过3的像素平均不到一帧像素的4%,因此,采用高效的帧间编码方式,可以很大程度上提高视频压 ...
H.266/VVC帧间预测技术学习:高级运动矢量预测(Advanced Motion Vector Prediction, AMVP)
高级运动矢量预测模式(Advanced Motion Vector Prediction,AMVP) AMVP模式是H.265/HEVC中提出的新的MV预测技术,H.266/VVC仍采用了该技术,并在 ...
H.266/VVC帧间预测技术学习:帧间和帧内联合预测(Combined inter and intra prediction, CIIP)
在HEVC中一个CU在预测时要么使用帧内预测要么使用帧间预测,二者只能取其一.而VVC中提出的CIIP技术,是将帧间预测信号与帧内预测信号相结合. 在VVC中,当CU以Merge模式编码时,且CU包含 ...
H.266/VVC帧间预测技术学习：带有运动矢量差的Merge技术(Merge mode with MVD)
在VVC的扩展Merge模式当中,当前CU生成的Merge list中选择一个率失真代价值最小的候选项直接作为自己的运动信息.除了常规Merge模式,VVC还引入了带运动矢量差(Merge mode ...
H.266/VVC帧间预测技术学习:几何划分模式(Geometric partitioning mode, GPM)
几何划分模式 (Geometric partitioning mode ,GPM)原理针对图像中运动物体的边界部分,VVC采用了几何划分模式进行帧间预测.如下图所示,GPM模式在运动物体的边界处进行 ...
H.266/VVC帧间预测技术学习:CU级双向加权预测(Bi-prediction with CU-level weight)
CU级双向加权预测(Bi-prediction with CU-level weight ,BCW) 在HEVC中,通过对从两个不同参考图片获得的两个预测信号求平均和/或使用两个不同运动矢量来生成双向 ...
H.266/VVC帧间预测技术学习:解码端运动矢量细化(Decoder side motion vector refinement, DMVR)
解码端运动矢量细化(Decoder side motion vector refinement, DMVR) 为了提高Merge模式的MV的准确性,在VVC中使用了基于双边匹配(BM)的解码端运动矢量 ...
【十三】 H.266/VVC | 帧间预测技术 | 解码端运动向量修正技术(DMVR)
目的:为了提高merge模式下双向预测MV的准确性基本思路:双向预测是在list0和list1中分别寻找一个运动向量,然后将MV0和MV1所指向的预测块进行加权得到最终预测块,而DMVR技术不是直接 ...
H.266/VVC帧内预测总结
一.帧内预测基本原理帧内预测技术是利用同一帧中相邻像素的相关性,利用当前块相邻区域的重建像素预测当前块中像素的技术,如下图所示,当前CU可以利用相邻A.B.C.D和E位置处的重建像素来预测当前CU中 ...

H.266/VVC帧间预测技术学习:双向光流技术(Bi-directional optical flow, BDOF)

双向光流技术(Bi-directional optical flow，BDOF )

H.266/VVC帧间预测技术学习:双向光流技术(Bi-directional optical flow, BDOF)相关推荐

最新文章

热门文章