Prev Tutorial: Reading Geospatial Raster files with GDAL
Next Tutorial: Creating a video with OpenCV
| |
Original author | Bernát Gábor |
Compatibility | OpenCV >= 3.0 |
Goal
Today it is common to have a digital video recording system at your disposal. Therefore, you will eventually come to the situation that you no longer process a batch of images, but video streams. These may be of two kinds: real-time image feed (in the case of a webcam) or prerecorded and hard disk drive stored files. Luckily OpenCV treats these two in the same manner, with the same C++ class. So here's what you'll learn in this tutorial:
- How to open and read video streams
- Two ways for checking image similarity: PSNR and SSIM
The source code
As a test case where to show off these using OpenCV I've created a small program that reads in two video files and performs a similarity check between them. This is something you could use to check just how well a new video compressing algorithms works. Let there be a reference (original) video like this small Megamind clip and a compressed version of it. You may also find the source code and these video file in the samples/data
folder of the OpenCV source library.
C++
#include <iostream>
#include <string>
#include <iomanip>
#include <sstream>
double getPSNR (
const Mat& I1,
const Mat& I2);
static void help()
{
cout
<< "------------------------------------------------------------------------------" << endl
<< "This program shows how to read a video file with OpenCV. In addition, it "
<< "tests the similarity of two input videos first with PSNR, and for the frames "
<< "below a PSNR trigger value, also with MSSIM." << endl
<< "Usage:" << endl
<< "./video-input-psnr-ssim <referenceVideo> <useCaseTestVideo> <PSNR_Trigger_Value> <Wait_Between_Frames> " << endl
<< "--------------------------------------------------------------------------" << endl
<< endl;
}
int main(
int argc,
char *argv[])
{
help();
if (argc != 5)
{
cout << "Not enough parameters" << endl;
return -1;
}
stringstream conv;
const string sourceReference = argv[1], sourceCompareWith = argv[2];
int psnrTriggerValue, delay;
conv << argv[3] << endl << argv[4];
conv >> psnrTriggerValue >> delay;
int frameNum = -1;
VideoCapture captRefrnc(sourceReference), captUndTst(sourceCompareWith);
if (!captRefrnc.isOpened())
{
cout << "Could not open reference " << sourceReference << endl;
return -1;
}
if (!captUndTst.isOpened())
{
cout << "Could not open case test " << sourceCompareWith << endl;
return -1;
}
Size refS =
Size((
int) captRefrnc.get(CAP_PROP_FRAME_WIDTH),
(int) captRefrnc.get(CAP_PROP_FRAME_HEIGHT)),
uTSi =
Size((
int) captUndTst.get(CAP_PROP_FRAME_WIDTH),
(int) captUndTst.get(CAP_PROP_FRAME_HEIGHT));
if (refS != uTSi)
{
cout << "Inputs have different size!!! Closing." << endl;
return -1;
}
const char* WIN_UT = "Under Test";
const char* WIN_RF = "Reference";
cout <<
"Reference frame resolution: Width=" << refS.
width <<
" Height=" << refS.
height
<< " of nr#: " << captRefrnc.get(CAP_PROP_FRAME_COUNT) << endl;
cout << "PSNR trigger value " << setiosflags(ios::fixed) << setprecision(3)
<< psnrTriggerValue << endl;
Mat frameReference, frameUnderTest;
double psnrV;
for(;;)
{
captRefrnc >> frameReference;
captUndTst >> frameUnderTest;
if (frameReference.empty() || frameUnderTest.empty())
{
cout << " < < < Game over! > > > ";
break;
}
++frameNum;
cout << "Frame: " << frameNum << "# ";
psnrV = getPSNR(frameReference,frameUnderTest);
cout << setiosflags(ios::fixed) << setprecision(3) << psnrV << "dB";
if (psnrV < psnrTriggerValue && psnrV)
{
mssimV = getMSSIM(frameReference, frameUnderTest);
cout << " MSSIM: "
<<
" R " << setiosflags(ios::fixed) << setprecision(2) << mssimV.
val[2] * 100 <<
"%"
<<
" G " << setiosflags(ios::fixed) << setprecision(2) << mssimV.
val[1] * 100 <<
"%"
<<
" B " << setiosflags(ios::fixed) << setprecision(2) << mssimV.
val[0] * 100 <<
"%";
}
cout << endl;
imshow(WIN_RF, frameReference);
imshow(WIN_UT, frameUnderTest);
if (c == 27) break;
}
return 0;
}
double getPSNR(
const Mat& I1,
const Mat& I2)
{
if( sse <= 1e-10)
return 0;
else
{
double psnr = 10.0 *
log10((255 * 255) / mse);
return psnr;
}
}
{
const double C1 = 6.5025, C2 = 58.5225;
Mat sigma1_2, sigma2_2, sigma12;
sigma1_2 -= mu1_2;
sigma2_2 -= mu2_2;
sigma12 -= mu1_mu2;
t1 = 2 * mu1_mu2 + C1;
t2 = 2 * sigma12 + C2;
t1 = mu1_2 + mu2_2 + C1;
t2 = sigma1_2 + sigma2_2 + C2;
return mssim;
}
n-dimensional dense array class
Definition mat.hpp:830
MatExpr mul(InputArray m, double scale=1) const
Performs an element-wise multiplication or division of the two matrices.
int channels() const
Returns the number of matrix channels.
size_t total() const
Returns the total number of array elements.
void convertTo(OutputArray m, int rtype, double alpha=1, double beta=0) const
Converts an array to another data type with optional scaling.
_Tp val[m *n]
matrix elements
Definition matx.hpp:218
Template class for specifying the size of an image or rectangle.
Definition types.hpp:335
_Tp height
the height
Definition types.hpp:363
_Tp width
the width
Definition types.hpp:362
Class for video capturing from video files, image sequences or cameras.
Definition videoio.hpp:772
Scalar mean(InputArray src, InputArray mask=noArray())
Calculates an average (mean) of array elements.
void divide(InputArray src1, InputArray src2, OutputArray dst, double scale=1, int dtype=-1)
Performs per-element division of two arrays or a scalar by an array.
void absdiff(InputArray src1, InputArray src2, OutputArray dst)
Calculates the per-element absolute difference between two arrays or between an array and a scalar.
Scalar sum(InputArray src)
Calculates the sum of array elements.
#define CV_32F
Definition interface.h:78
__device__ __forceinline__ float1 log10(const uchar1 &a)
Definition vec_math.hpp:276
void imshow(const String &winname, InputArray mat)
Displays an image in the specified window.
int waitKey(int delay=0)
Waits for a pressed key.
void namedWindow(const String &winname, int flags=WINDOW_AUTOSIZE)
Creates a window.
void moveWindow(const String &winname, int x, int y)
Moves the window to the specified position.
void GaussianBlur(InputArray src, OutputArray dst, Size ksize, double sigmaX, double sigmaY=0, int borderType=BORDER_DEFAULT, AlgorithmHint hint=cv::ALGO_HINT_DEFAULT)
Blurs an image using a Gaussian filter.
int main(int argc, char *argv[])
Definition highgui_qt.cpp:3
Python
from __future__ import print_function
import numpy as np
import cv2 as cv
import argparse
import sys
def getPSNR(I1, I2):
s1 = np.float32(s1)
s1 = s1 * s1
sse = s1.sum()
if sse <= 1e-10:
return 0
else:
shape = I1.shape
mse = 1.0 * sse / (shape[0] * shape[1] * shape[2])
psnr = 10.0 * np.log10((255 * 255) / mse)
return psnr
def getMSSISM(i1, i2):
C1 = 6.5025
C2 = 58.5225
I1 = np.float32(i1)
I2 = np.float32(i2)
I2_2 = I2 * I2
I1_2 = I1 * I1
I1_I2 = I1 * I2
mu1_2 = mu1 * mu1
mu2_2 = mu2 * mu2
mu1_mu2 = mu1 * mu2
sigma1_2 -= mu1_2
sigma2_2 -= mu2_2
sigma12 -= mu1_mu2
t1 = 2 * mu1_mu2 + C1
t2 = 2 * sigma12 + C2
t3 = t1 * t2
t1 = mu1_2 + mu2_2 + C1
t2 = sigma1_2 + sigma2_2 + C2
t1 = t1 * t2
return mssim
parser = argparse.ArgumentParser()
parser.add_argument("-d", "--delay", type=int, default=30, help=" Time delay")
parser.add_argument("-v", "--psnrtriggervalue", type=int, default=30, help="PSNR Trigger Value")
parser.add_argument("-r", "--ref", type=str, default="Megamind.avi", help="Path to reference video")
parser.add_argument("-t", "--undertest", type=str, default="Megamind_bugy.avi",
help="Path to the video to be tested")
args = parser.parse_args()
sourceReference = args.ref
sourceCompareWith = args.undertest
delay = args.delay
psnrTriggerValue = args.psnrtriggervalue
framenum = -1
if not captRefrnc.isOpened():
print("Could not open the reference " + sourceReference)
sys.exit(-1)
if not captUndTst.isOpened():
print("Could not open case test " + sourceCompareWith)
sys.exit(-1)
refS = (int(captRefrnc.get(cv.CAP_PROP_FRAME_WIDTH)), int(captRefrnc.get(cv.CAP_PROP_FRAME_HEIGHT)))
uTSi = (int(captUndTst.get(cv.CAP_PROP_FRAME_WIDTH)), int(captUndTst.get(cv.CAP_PROP_FRAME_HEIGHT)))
if refS != uTSi:
print("Inputs have different size!!! Closing.")
sys.exit(-1)
WIN_UT = "Under Test"
WIN_RF = "Reference"
print("Reference frame resolution: Width={} Height={} of nr#: {}".format(refS[0], refS[1],
captRefrnc.get(cv.CAP_PROP_FRAME_COUNT)))
print("PSNR trigger value {}".format(psnrTriggerValue))
while True:
_, frameReference = captRefrnc.read()
_, frameUnderTest = captUndTst.read()
if frameReference is None or frameUnderTest is None:
print(" < < < Game over! > > > ")
break
framenum += 1
psnrv = getPSNR(frameReference, frameUnderTest)
print("Frame: {}# {}dB".format(framenum, round(psnrv, 3)), end=" ")
if (psnrv < psnrTriggerValue and psnrv):
mssimv = getMSSISM(frameReference, frameUnderTest)
print("MSSISM: R {}% G {}% B {}%".format(round(mssimv[2] * 100, 2), round(mssimv[1] * 100, 2),
round(mssimv[0] * 100, 2)), end=" ")
print()
if k == 27:
break
sys.exit(0)
if __name__ == "__main__":
cv::String findFileOrKeep(const cv::String &relative_path, bool silentMode=false)
Definition utility.hpp:1257
How to read a video stream (online-camera or offline-file)?
Essentially, all the functionalities required for video manipulation is integrated in the cv::VideoCapture C++ class. This on itself builds on the FFmpeg open source library. This is a basic dependency of OpenCV so you shouldn't need to worry about this. A video is composed of a succession of images, we refer to these in the literature as frames. In case of a video file there is a frame rate specifying just how long is between two frames. While for the video cameras usually there is a limit of just how many frames they can digitize per second, this property is less important as at any time the camera sees the current snapshot of the world.
The first task you need to do is to assign to a cv::VideoCapture class its source. You can do this either via the cv::VideoCapture::VideoCapture or its cv::VideoCapture::open function. If this argument is an integer then you will bind the class to a camera, a device. The number passed here is the ID of the device, assigned by the operating system. If you have a single camera attached to your system its ID will probably be zero and further ones increasing from there. If the parameter passed to these is a string it will refer to a video file, and the string points to the location and name of the file. For example, to the upper source code a valid command line is:
video/Megamind.avi video/Megamind_bug.avi 35 10
We do a similarity check. This requires a reference and a test case video file. The first two arguments refer to this. Here we use a relative address. This means that the application will look into its current working directory and open the video folder and try to find inside this the Megamind.avi and the Megamind_bug.avi.
const string sourceReference = argv[1],sourceCompareWith = argv[2];
captUndTst.
open(sourceCompareWith);
virtual bool open(const String &filename, int apiPreference=CAP_ANY)
Opens a video file or a capturing device or an IP video stream for video capturing.
To check if the binding of the class to a video source was successful or not use the cv::VideoCapture::isOpened function:
if ( !captRefrnc.isOpened())
{
cout << "Could not open reference " << sourceReference << endl;
return -1;
}
Closing the video is automatic when the objects destructor is called. However, if you want to close it before this you need to call its cv::VideoCapture::release function. The frames of the video are just simple images. Therefore, we just need to extract them from the cv::VideoCapture object and put them inside a Mat one. The video streams are sequential. You may get the frames one after another by the cv::VideoCapture::read or the overloaded >> operator:
Mat frameReference, frameUnderTest;
captRefrnc >> frameReference;
captUndTst.
read(frameUnderTest);
virtual bool read(OutputArray image)
Grabs, decodes and returns the next video frame.
The upper read operations will leave empty the Mat objects if no frame could be acquired (either cause the video stream was closed or you got to the end of the video file). We can check this with a simple if:
if( frameReference.
empty() || frameUnderTest.
empty())
{
}
bool empty() const
Returns true if the array has no elements.
A read method is made of a frame grab and a decoding applied on that. You may call explicitly these two by using the cv::VideoCapture::grab and then the cv::VideoCapture::retrieve functions.
Videos have many-many information attached to them besides the content of the frames. These are usually numbers, however in some case it may be short character sequences (4 bytes or less). Due to this to acquire these information there is a general function named cv::VideoCapture::get that returns double values containing these properties. Use bitwise operations to decode the characters from a double type and conversions where valid values are only integers. Its single argument is the ID of the queried property. For example, here we get the size of the frames in the reference and test case video file; plus the number of frames inside the reference.
Size refS =
Size((
int) captRefrnc.get(CAP_PROP_FRAME_WIDTH),
(int) captRefrnc.get(CAP_PROP_FRAME_HEIGHT)),
cout <<
"Reference frame resolution: Width=" << refS.
width <<
" Height=" << refS.
height
<< " of nr#: " << captRefrnc.get(CAP_PROP_FRAME_COUNT) << endl;
When you are working with videos you may often want to control these values yourself. To do this there is a cv::VideoCapture::set function. Its first argument remains the name of the property you want to change and there is a second of double type containing the value to be set. It will return true if it succeeds and false otherwise. Good examples for this is seeking in a video file to a given time or frame:
captRefrnc.set(CAP_PROP_POS_MSEC, 1.2);
captRefrnc.set(CAP_PROP_POS_FRAMES, 10);
For properties you can read and change look into the documentation of the cv::VideoCapture::get and cv::VideoCapture::set functions.
Image similarity - PSNR and SSIM
We want to check just how imperceptible our video converting operation went, therefore we need a system to check frame by frame the similarity or differences. The most common algorithm used for this is the PSNR (aka Peak signal-to-noise ratio). The simplest definition of this starts out from the mean squared error. Let there be two images: I1 and I2; with a two dimensional size i and j, composed of c number of channels.
\[MSE = \frac{1}{c*i*j} \sum{(I_1-I_2)^2}\]
Then the PSNR is expressed as:
\[PSNR = 10 \cdot \log_{10} \left( \frac{MAX_I^2}{MSE} \right)\]
Here the \(MAX_I\) is the maximum valid value for a pixel. In case of the simple single byte image per pixel per channel this is 255. When two images are the same the MSE will give zero, resulting in an invalid divide by zero operation in the PSNR formula. In this case the PSNR is undefined and as we'll need to handle this case separately. The transition to a logarithmic scale is made because the pixel values have a very wide dynamic range. All this translated to OpenCV and a function looks like:
C++
double getPSNR(
const Mat& I1,
const Mat& I2)
{
if( sse <= 1e-10)
return 0;
else
{
double psnr = 10.0 *
log10((255 * 255) / mse);
return psnr;
}
}
Python
def getPSNR(I1, I2):
s1 = np.float32(s1)
s1 = s1 * s1
sse = s1.sum()
if sse <= 1e-10:
return 0
else:
shape = I1.shape
mse = 1.0 * sse / (shape[0] * shape[1] * shape[2])
psnr = 10.0 * np.log10((255 * 255) / mse)
return psnr
Typically result values are anywhere between 30 and 50 for video compression, where higher is better. If the images significantly differ you'll get much lower ones like 15 and so. This similarity check is easy and fast to calculate, however in practice it may turn out somewhat inconsistent with human eye perception. The structural similarity algorithm aims to correct this.
Describing the methods goes well beyond the purpose of this tutorial. For that I invite you to read the article introducing it. Nevertheless, you can get a good image of it by looking at the OpenCV implementation below.
- Note
- SSIM is described more in-depth in the: "Z. Wang, A. C. Bovik, H. R. Sheikh and E. P.
Simoncelli, "Image quality assessment: From error visibility to structural similarity," IEEE
Transactions on Image Processing, vol. 13, no. 4, pp. 600-612, Apr. 2004." article.
C++
{
const double C1 = 6.5025, C2 = 58.5225;
Mat sigma1_2, sigma2_2, sigma12;
sigma1_2 -= mu1_2;
sigma2_2 -= mu2_2;
sigma12 -= mu1_mu2;
t1 = 2 * mu1_mu2 + C1;
t2 = 2 * sigma12 + C2;
t1 = mu1_2 + mu2_2 + C1;
t2 = sigma1_2 + sigma2_2 + C2;
return mssim;
}
Python
def getMSSISM(i1, i2):
C1 = 6.5025
C2 = 58.5225
I1 = np.float32(i1)
I2 = np.float32(i2)
I2_2 = I2 * I2
I1_2 = I1 * I1
I1_I2 = I1 * I2
mu1_2 = mu1 * mu1
mu2_2 = mu2 * mu2
mu1_mu2 = mu1 * mu2
sigma1_2 -= mu1_2
sigma2_2 -= mu2_2
sigma12 -= mu1_mu2
t1 = 2 * mu1_mu2 + C1
t2 = 2 * sigma12 + C2
t3 = t1 * t2
t1 = mu1_2 + mu2_2 + C1
t2 = sigma1_2 + sigma2_2 + C2
t1 = t1 * t2
return mssim
This will return a similarity index for each channel of the image. This value is between zero and one, where one corresponds to perfect fit. Unfortunately, the many Gaussian blurring is quite costly, so while the PSNR may work in a real time like environment (24 frames per second) this will take significantly more than to accomplish similar performance results.
Therefore, the source code presented at the start of the tutorial will perform the PSNR measurement for each frame, and the SSIM only for the frames where the PSNR falls below an input value. For visualization purpose we show both images in an OpenCV window and print the PSNR and MSSIM values to the console. Expect to see something like:
You may observe a runtime instance of this on the YouTube here.