Adaptive quarter-pel motion estimation and motion vector coding algorithm for the H.264/AVC standard

Seung-Won Jung; Chun-Su Park; Le Thanh Ha; Sung-Jea Ko

doi:10.1117/1.3257262

1 November 2009 Adaptive quarter-pel motion estimation and motion vector coding algorithm for the H.264/AVC standard

Seung-Won Jung, Chun-Su Park, Le Thanh Ha, Sung-Jea Ko

Author Affiliations +

Optical Engineering, Vol. 48, Issue 11, 110502 (November 2009). https://doi.org/10.1117/1.3257262

Abstract

We present an adaptive quarter-pel (Qpel) motion estimation (ME) method for H.264/AVC. Instead of applying Qpel ME to all macroblocks (MBs), the proposed method selectively performs Qpel ME in an MB level. In order to reduce the bit rate, we also propose a motion vector (MV) encoding technique that adaptively selects a different variable length coding (VLC) table according to the accuracy of the MV. Experimental results show that the proposed method can achieve about 3% average bit rate reduction.

1. Introduction

In block-based video coders, motion estimation (ME) carries a great significance because of its impact on the compression efficiency. In order to achieve high compression efficiency, ME is performed with quarter-pel (Qpel) accuracy as well as half-pel (Hpel) and integer-pel accuracies in the H.264/AVC standard.¹ Even though ME with a high accuracy generally reduces the bits required for encoding the difference frame, it often compromises the total bit rate because the bits required for encoding the motion vector (MV) grows as the motion vector accuracy (MVA) increases.

Various methods of obtaining the optimal MVA has been introduced in the literature.^{2, 3} In Ref. 2, an optimal MVA is derived for each macroblock (MB) and for each frame. The optimal MVA formula in Ref. 2 reveals that the MVA is dependent on the texture and the interframe noise of the MB. In Ref. 3, the MVA is adaptively determined for each MB by examining all possible MVAs and selecting the one with the minimum Lagrange cost. However, the coding gain of these methods is limited, since additional bits indicating the MVA need to be encoded.

In this letter, a novel MVA decision algorithm for H.264/AVC is presented. The proposed method determines the validity of Qpel ME for each MB. Since no additional bit is required to indicate the MVA, the proposed algorithm can be implemented without modifying the syntax of the H.264/AVC standard. Then, in order to achieve the coding gain, we also propose an MV encoding technique that adaptively changes the variable length coding (VLC) table according to the MVA of the MB.

2. Proposed Algorithm

The proposed algorithm consists of two techniques. We first present an adaptive MV encoding technique for H.264/AVC. Then, based on the proposed MV coding technique, we also propose an MVA decision technique.

In H.264/AVC, not an original MV itself but the difference between the original MV and the predicted motion vector (PMV),¹ the motion vector difference (MVD), is encoded. Let $Δ V$ denote the MVD defined as follows:

Eq. 1

Δ V = V_{o} - V_{p},

where

V_{0}

and

V_{p}

represent the original MV and PMV, respectively. In H.264/AVC, each horizontal and vertical element of

Δ V

is independently encoded by using a common VLC table without considering the MVA of the MB.

For notational simplicity, we first define three motion vector sets:

Eq. 2

{MV}_{all} = {(u, v) | u, v ∊ 0, - 1 ∕ 4, 1 ∕ 4; - 1 ∕ 2, 1 ∕ 2, \dots},

Eq. 3

{MV}_{Qpel} = {(u, v) | u, v ∊ 0, - 1 ∕ 4, 1 ∕ 4, - 3 ∕ 4, 3 ∕ 4, \dots},

Eq. 4

{MV}_{Hpel} = {(u, v) | u, v ∊ 0, - 1 ∕ 2, 1 ∕ 2, - 1, 1, \dots} .

Note that

{MV}_{All}

is a union set of

{MV}_{Qpel}

and

{MV}_{Hpel}

. If ME is performed up to Hpel accuracy, the resulting MV should belong to

{MV}_{Hpel}

. If we allow Qpel ME, the additional MV set,

{MV}_{Qpel}

, is required to express the MV.

Assume that ME is performed up to Hpel accuracy for the current MB, i.e., $V_{0} ∊ {MV}_{Hpel}$ . Then, $Δ V$ is an element of either ${MV}_{Hpel}$ or ${MV}_{Qpel}$ depending on $V_{p}$ as follows:

Eq. 5

Δ V ∊ {\begin{matrix} {MV}_{Hpel}, & if V_{p} ∊ {MV}_{Hpel}, \\ {MV}_{Qpel}, & if V_{p} ∊ {MV}_{Qpel} . \end{matrix}

Note that the number of possible $Δ V$ values is halved if Qpel ME is not applied. Therefore, instead of using a VLC table including all possible MVD values, ${VLC}_{All}$ , a reduced-size VLC table containing either Hpel or Qpel MVD values, ${VLC}_{Hpel}$ or ${VLC}_{Qpel}$ , can be used. By adaptively changing the VLC table according to the MVA of the MB, the bits required to encode MVD in H.264/AVC can be effectively reduced.

The bits required for encoding the MV can be reduced by skipping Qpel ME. However, Qpel ME needs to be omitted when it has a negligible impact on ME performance. In order to determine whether Qpel ME is necessary, we consider two cases. In the first case (C1), we skip Qpel ME for all MBs and encode the MVDs using ${VLC}_{Hpel}$ . We allow, in the second case (C2), Qpel ME for all MBs. Then, the resulting MVDs are encoded using ${VLC}_{All}$ . By comparing C1 and C2, the effect of Qpel ME can be analyzed.

Let dRDcost denote the difference between the rate distortion costs (RDcosts) of C1 and C2. If dRDcost is positive, we can interpret that Qpel ME is required for the MB. This is because the loss caused by skipping Qpel ME is larger than the gain achieved by the proposed MVD coding technique. In the other case, Qpel ME can be considered unnecessary. Now, the remaining problem is to find the elements which affect dRDcost.

Motivated by the optimal MVA formula,² we claim that the necessity of Qpel ME increases as the spatial and temporal complexity of the MB increases. In our work, we estimate the spatial and temporal complexity of the current MB from the MB at the reference frame indicated by $V_{p}$ , which is also available at the decoder. First, we measure the sum of the gradients of the luminance component as a spatial complexity metric.⁴ Let $Δ Y_{h}$ and $Δ Y_{v}$ denote the horizontal and vertical gradient defined as

6.

Δ Y_{h} (x, y) = | \hat{Y} (x + 1, y) - \hat{Y} (x, y) |,

Δ Y_{v} (x, y) = | \hat{Y} (x, y - 1) - \hat{Y} (x, y) |,

where

\hat{Y}

is the luminance component of the reference frame, and

(x, y)

is the pixel coordinate. Then, the spatial complexity of the MB,

D_{p}

, is obtained by averaging the gradient values inside of the MB as follows:

Eq. 7

D_{p} = \frac{1}{N^{2}} \sum_{i = 0}^{N - 1} \sum_{j = 0}^{N - 1} \max [Δ Y_{h} (x_{p} + i, y_{p} + j), Δ Y_{v} (x_{p} + i, y_{p} + j)],

where

N

is the size of the MB,

(x_{p}, y_{p})

is the pixel coordinate determined by

V_{p}

, and max(⋅,⋅) returns the maximum value between two values. The temporal complexity of the MB is simply defined as a magnitude of

V_{p}

,

Eq. 8

| V_{p} | = {(V_{p, h}^{2} + V_{p, v}^{2})}^{1 ∕ 2},

where

V_{p, h}

and

V_{p, v}

represent the horizontal and vertical elements of

V_{p}

, respectively.

Based on the preceding spatial and temporal complexity metrics, we examine the relation between dRDcost and complexity metrics. Figure 1 shows an example of the dRDcost result obtained by using the 80th frame of the Foreman test sequences. In this example, the quantization parameter (QP) is set to 36, one reference frame is used with CABAC coding, and the other experimental conditions are given in Table 1. For each MB, dRDcost is computed by performing C1 and C2, and $o$ or $x$ is assigned to represent whether its value is positive or not. We can see that if the temporal or spatial complexity of the MB is high, Qpel ME tends to be advantageous. In the other case, where the spatial and temporal complexities are both low, the skipping of Qpel ME is beneficial in most cases. This tendency is consistent with the results in Ref. 2. To this end, we define an exponential curve to determine whether Qpel ME is advantageous:

Eq. 9

f (| V_{p} |) = a e^{b | V_{p} |},

where

a

and

b

are modeling parameters. These parameters are obtained by taking logarithm function to Eq. 9 and applying the least-squares linear classification, so that

a

and

b

minimize the squared classification error.⁵ By using test sequences in Ref. 6 and different QPs in Table 1,

a

and

b

are achieved by 9.29 and

- 2.99 \times 10^{- 2}

, respectively. Then, Qpel ME is applied only when the estimated complexity point,

(| V_{p} |, D_{p})

, is located above the curve of Eq. 9, where Qpel ME tends to improve the coding efficiency.

Fig. 1

Example of the dRDcost result for the Foreman sequence.

Table 1

Experimental conditions.


Software	JM 16.0 (Ref. 7)
Profile	High
Prediction structure	IPPP
Sequence resolution	CIF $(352 \times 288)$ , $720 p$ $(1280 \times 720)$
Number of encoded frames	100
Number of reference frames	3
QP	32, 36, 40, 44
ME	Exhaustive ( $1 ∕ 4$ resolution)
ME search range	32 (CIF), 64 $(720 p)$
Rate distortion optimization	Enabled
Entropy coding	CAVLC, CABAC

Figure 2 summarizes the procedure of the proposed algorithm at the encoder. When encoding the MB, $D_{p}$ and $f (| V_{p} |)$ are calculated by using the PMV of INTER- $16 \times 16$ mode, and the Qpel ME is applied if $D_{p} < f (| V_{p} |)$ . This Qpel ME decision result is shared for all subpartitioned blocks. In other words, we do not allow each subpartitioned block to have different MVA. At the decoder, $D_{p}$ is similarly obtained by Eq. 7. Then, if $D_{p} < f (| V_{p} |)$ , the received MVD bits are decoded using either ${VLC}_{Qpel}$ or ${VLC}_{Hpel}$ , depending on the PMV. Since the same PMV should be used at both encoder and decoder, the PMV of INTER- $16 \times 16$ mode is used at the decoder. In the other case, the MVD bits are decoded by using ${VLC}_{All}$ as the conventional decoder.

Fig. 2

Flowchart of the proposed algorithm.

3. Experimental Results and Conclusion

In order to evaluate the performance of the proposed algorithm, the proposed method is compared with the conventional algorithm in Ref. 3. A PC with an Intel Core2 Quad, $2.67 - GHz$ CPU, and 8 GB RAM is used. The detailed experimental conditions are given in Table 1. The changes of Bjontegaard Delta (BD) rate $(Δ bit)$ , encoding time $(Δ T_{E})$ , and decoding time $(Δ T_{D})$ are used to measure the performance.⁷

Table 2 indicates that the proposed algorithm improves the coding efficiency of the original JM 16.0 by 2.97% and 2.77% for CABAC and CALVC, respectively.⁸ Since the proposed algorithm does not require any overhead bit to indicate the MVA, superior coding efficiency is obtained when compared to the conventional algorithm. Here, it should be noted that the Bigships and Jets sequences are encoded by using 151th to 250th and 301th to 400th frames, where shot changes occur, respectively. Since the intracoding outperforms the intercoding in such cases, the performance of the proposed algorithm is not deteriorated.

Table 2

Performance of the proposed algorithm.

Sequence		Ref. 2 (CABAC/CAVLC)			Proposed (CABAC/CAVLC)
Sequence		Δbit(%)	ΔTE(%)	ΔTD(%)	Δbit(%)	ΔTE(%)	ΔTD(%)
CIF	Crew	$- 0.69 ∕ - 0.28$	$0.75 ∕ 1.24$	$0.15 ∕ 0.04$	$- 2.72 ∕ - 2.03$	$- 4.90 ∕ - 4.54$	$- 3.15 ∕ - 4.15$
	Foreman	$- 0.07 ∕ - 0.19$	$1.01 ∕ 0.19$	$0.28 ∕ 0.21$	$- 3.15 ∕ - 3.15$	$- 5.01 ∕ - 4.17$	$- 5.28 ∕ - 4.58$
	Salesman	$- 0.49 ∕ - 0.18$	$- 0.15 ∕ - 0.36$	$0.17 ∕ 0.09$	$- 2.63 ∕ - 2.18$	$- 5.41 ∕ - 4.45$	$- 3.45 ∕ - 1.67$
	Soccer	$- 0.23 ∕ - 0.42$	$0.23 ∕ 0.22$	$0.13 ∕ - 0.12$	$- 2.65 ∕ - 2.80$	$- 3.91 ∕ - 4.92$	$- 4.93 ∕ - 4.24$
$720 p$	Bigships	$- 1.12 ∕ - 0.69$	$1.88 ∕ 1.02$	$0.48 ∕ 0.31$	$- 2.11 ∕ - 2.66$	$- 1.97 ∕ - 1.83$	$- 5.78 ∕ - 6.12$
	City	$- 1.23 ∕ - 1.36$	$0.39 ∕ 1.03$	$0.75 ∕ 0.41$	$- 2.46 ∕ - 2.08$	$- 2.01 ∕ - 1.89$	$- 4.57 ∕ - 3.99$
	Jets	$- 1.20 ∕ - 1.06$	$0.44 ∕ 1.29$	$1.23 ∕ 0.89$	$- 3.62 ∕ - 2.51$	$- 2.31 ∕ - 3.17$	$- 4.78 ∕ - 4.45$
	Raven	$- 2.03 ∕ - 2.70$	$0.92 ∕ 0.35$	$0.69 ∕ 0.98$	$- 4.43 ∕ - 4.72$	$- 2.02 ∕ - 2.32$	$- 5.46 ∕ - 4.96$
Average		$- 0.88 ∕ - 0.86$	$0.68 ∕ 0.71$	$0.48 ∕ 0.34$	$- 2.97 ∕ - 2.77$	$- 3.57 ∕ - 3.41$	$- 4.68 ∕ - 4.27$

From the viewpoint of the computational complexity, since all MVAs are examined and the best one is selected,³ the complexity of the original JM 16.0 encoder and decoder is maintained or slightly increased. In the proposed algorithm, by skipping unnecessary Qpel ME, additional computation for $D_{p}$ at the encoder is compensated and even a slight encoding time saving of 3.57% and 3.41% is achieved by CABAC and CAVLC, respectively. In addition, although the decoder should compute $D_{p}$ for each MB, the decoding time is also saved by 4.68% and 4.27% on average for CABAC and CAVLC, respectively. This is because the interpolation time for the motion compensation is decreased due to the skipping of the unnecessary Qpel ME at the encoder.

In this letter, we first presented an adaptive MVD coding scheme. Then, in order to apply the adaptive MVD coding technique effectively, we also proposed an algorithm that selectively performs Qpel ME based on the spatial and temporal complexity of the MB. The experimental results demonstrated that the proposed algorithm improves coding efficiency without requiring the computational overhead at both encoder and decoder.

Acknowledgments

This research was supported by Seoul Future Contents Convergence (SFCC) Cluster established by Seoul R&BD Program. This work was also supported by a Korea Science and Engineering Foundation (KOSEF) Grant funded by the Korean Government (MEST) (No. 2009-0080547).

References

1.

T. Wiegand, G. J. Sullivan, G. Bjontegaard, and A. Luthra, “Overview of the H.264/AVC video coding standard,” IEEE Trans. Circuits Syst. Video Technol., 13 (7), 560 –576 (2003). https://doi.org/10.1109/TCSVT.2003.815165 1051-8215 Google Scholar

2.

J. Ribas-Corbera and D. L. Neuhoff, “Optimizing motion vector accuracy in block-based video coding,” IEEE Trans. Circuits Syst. Video Technol., 11 (4), 497 –511 (2001). https://doi.org/10.1109/76.915356 1051-8215 Google Scholar

3.

J. Shen and J. Ribas-Corbera, “Benefits of adaptive motion accuracy in H.26L video coding,” Proc. IEEE ICIP, 1 1012 –1015 (2000). Google Scholar

4.

A. R. Varkonyi-Koczy, A. Rovid, and T. Hashimoto, “Gradient-based synthesized multiple exposure time color HDR image,” IEEE Trans. Instrum. Meas., 57 (8), 1779 –1785 (2008). https://doi.org/10.1109/TIM.2008.925715 0018-9456 Google Scholar

5.

J. A. K. Suykens and J. Vandewalle, “Least squares support vector machine classifiers,” Neural Process. Lett., 9 293 –300 (1999). https://doi.org/10.1023/A:1018628609742 Google Scholar

6.

T. Tan, G. J. Sullivan, and T. Wedi, “Recommended simulation common conditions for coding efficiency experiments revision 4,” (2008) Google Scholar

7.

G. Bjontegaard, “Calculation of average PSNR differences between RD-curves,” (2001) Google Scholar

8.

K. Sühring, http:/iphome.hhi.de/suehring/tml/download/ Google Scholar

Citation Download Citation

Seung-Won Jung, Chun-Su Park, Le Thanh Ha, and Sung-Jea Ko "Adaptive quarter-pel motion estimation and motion vector coding algorithm for the H.264/AVC standard," Optical Engineering 48(11), 110502 (1 November 2009). https://doi.org/10.1117/1.3257262

Published: 1 November 2009

Access the abstract

JOURNAL ARTICLE
3 PAGES

DOWNLOAD PAPER SAVE TO MY LIBRARY

GET CITATION

CITATIONS

Cited by 4 patents.

Explore citations on Lens.org

RIGHTS & PERMISSIONS

Get copyright permission Get copyright permission on Copyright Marketplace

KEYWORDS

Computer programming

Motion estimation

Distortion

Optical engineering

Electrical engineering

Quantization

Video

1.

Introduction

2.

Proposed Algorithm

Eq. 1

Eq. 2

Eq. 3

Eq. 4

Eq. 5

6.

Eq. 7

Eq. 8

Eq. 9

Fig. 1

Table 1

Fig. 2

3.

Experimental Results and Conclusion

Table 2

Acknowledgments

References

Show All Keywords

Keywords/Phrases

Search In:

Publication Years