Video Stabilization

5 minute read

Published: September 08, 2020

Video stabilization is a process that aims to reduce the vibration and jitter inside videos.

Formulation

In video visualization, the input and output of the problem are obvious:

Input: Initial image sequence ${I_i}$.
Output: Stabilized image sequence ${I’_i}$.

where $I_i, I’_i: \Omega: \rightarrow \mathbb{R}^3$ are RGB image in domain $\Omega \subset \mathbb{R}^2$.

As we mentioned before, the movement of a camera recording a video is not perfect, which leads to vibration inside videos. Our goal is to smooth that movement to get stable frames.

Figure 1: Correlation between the unstable frame and the stable

Due to the imperfection of the real camera’s pose (“noise - vibration” in the camera’s pose), we need to find a virtual camera, that is more stable than the original. The difference between these two cameras can be presented by a transformation matrix $G$.

Expanding this to all shaky frames, we have this:

where the transformation matrix $H_{i, i+1}$ presents the movement of the camera from frame $I_i$ to frame $I_{i + 1}$ and $G_i$ is the stabilizing transformation matrix.

Our goal now is clearer - finding ${G_i}$.

Framework

Traditional video stabilization methods all use the same framework, which is:

The framework includes three main steps:

Motion estimation: used to estimate the camera movement
Motion compensation: smoothing the camera motion and estimating transformation matrix G for each frame
Post Processing: applying the transformation to each frame and reducing side effects

Motion Estimation

Feature Detection and Tracking

To stabilize frames, we must know the camera movement, but we do not have any information on camera movement in the world space. One way of estimating the camera movement is based on the motion vectors of features inside the image. So, we need to find image features and track them over time.

These algorithms are usually used:

SHIFT/ SURF
ORB/ FAST
Harris Corners + KLT tracker.

Motion Model

After detecting and tracking features, we have pairs of corresponding key points $\{(p^i_{k}, p^{i+1}_{k})\}$ of two successive frames $I_i$ and $I_{i+1}$.

With the assumption that the transformation matrix $H_{i, i+1}$ between two frames is a homography with eight unknown parameters, we have the formulation:

\[\begin{aligned} w' \textbf{p}^{i+1}_k &= H_{i, i+1} \textbf{p}^i_k \\ w'\left[\begin{matrix} x_k^{i + 1} \\ y_k^{i + 1} \\ 1 \end{matrix}\right] &= \left[ \begin{matrix} h_{11} & h_{12} & h_{13} \\ h_{21} & h_{22} & h_{23} \\ h_{31} & h_{32} & 1 \end{matrix} \right] \left[ \begin{matrix} x_k^{i} \\ y_k^{i} \\ 1 \end{matrix}\right] \end{aligned}\]

where $\textbf{p}$ is homogeneuos form of $p$.

With $n$ pairs of corresponding key points ${(p^i_k, p^{i+1}_k)}$, we will have $n$ similar homography equations as above.

To find these eight parameters, we will rewrite the homography equation to the linear system: $A \cdot h = b$ where $h^T =[h_{11}, h_{12}, h_{13}, h_{21}, h_{22},h_{23}, h_{31}, h_{32}, 1]$ is unknown vector with 8 parameters.

So, we need at least eight equations to solve, but more equations are better since there may be errors/ noise in previous steps.

The linear system above can be solved by the Jacobi method, close form or least square, etc.

In addition, it depends on our assumption of camera movement between two successive frames. If we suppose that the camera translates, the number of parameters needed to find is just 2. But, the more parameters, the more accurate camera movement we can find.

Motion Compensation

Motion Smoothing

First, we have an assumption that the first original frame $I_1$ is always stable.

Because of the linearity of transformation matrices, we can find the cumulative transformation matrix $H_{1,i}$ transforming original frame $I_1$ to original frame $I_i$ by:

\[H_{1,i} = \prod_{k=2}^{i}H_{k-1,k} = H_{i-1, i}...H_{2,3}H_{1,2}\]

Due to vibration while camera is moving, sequence ${H_{1,i}}$ will have noise, which lead to frames ${I_i}$ unstable. What we have to do is to smooth ${H_{1, i}}$.

Suppose that smoothed matrix $\bar{H}_{1, i}$ of $H_{1, n}$ with gaussian convolution:

\[\bar{H}_{1, i} = \operatorname{smooth}(..., H_{1, i - 1}, H_{1, i}, H_{1, i + 1}, ...)\]

is the transformation matrix from the first stable frame $I_1$ to the stable frame $I_i’$.

with $\operatorname{smooth}(.)$ being an element wise gaussian convolution operator.

Motion Compensation

After estimating $\bar{H}_{1,i}$, we can easily find the stabilizing matrix $G_i$ by:

\[G_i = \bar{H}_{1,i} \cdot H^{-1}_{1,i}\]

Post-processing

To get a stabilize frame $I_i’$, we just wrap $I_i$ with $G_i$ by applying the equation:

\[I_i'(G_i \textbf{x}) = I_i(\textbf{x})\]

with $\textbf{x} = [x, y, 1]^T$.

After stabilizing frame $I_i$, we also need to eliminate empty space in the transformed image. This step is called “crop and zoom.”

Results

This is an example after being applied the above strategy. The scene is static ( there is not any moving object inside the video).

Shaky Video	Stabilized but Uncropped Video	Fully Processed Video

This video was recorded by a man while he was walking on the street. There are some moving people in the video but not too many, and their sizes are small. Some details, like trees, are distorted after being stabilized.

Shaky Video	Stabilized but Uncropped Video	Fully Processed Video

This video recorded a dancing woman, and also, there are some jitters inside it. The output is really bad. Because our assumption only deals with a static scene. In this example, it is no longer true.

Shaky Video	Stabilized but Uncropped Video

You can read [1] to know more about other assumptions as well as different smoothing strategies.

Reference

[1] Sánchez, Javier. “Comparison of motion smoothing strategies for video stabilization using parametric models.” Image Processing On Line (2017).

Share on

Twitter Facebook LinkedIn

Bao Huy Nguyen

Video Stabilization

Formulation

Framework

Motion Estimation

Feature Detection and Tracking

Motion Model

Motion Compensation

Motion Smoothing

Motion Compensation

Post-processing

Results

Reference

Share on

You May Also Enjoy

Computer Graphics 2024 - Project

Team Members

Bayesian Occam’s Razor

Variational Methods and 3D Reconstruction

Variational Methods and 3D Tracking