Hands-On Algorithms For Compute - Amin Ahmadi Tazehkandi (Principalmente Enfocado A C++) PDF
Hands-On Algorithms For Compute - Amin Ahmadi Tazehkandi (Principalmente Enfocado A C++) PDF
Learn	how	to	use	the	best	and	most	practical	computer	vision	algorithms	using
OpenCV
BIRMINGHAM	-	MUMBAI
Hands-On	Algorithms	for	Computer
Vision
Copyright	©	2018	Packt	Publishing
All	rights	reserved.	No	part	of	this	book	may	be	reproduced,	stored	in	a	retrieval	system,	or	transmitted	in
any	form	or	by	any	means,	without	the	prior	written	permission	of	the	publisher,	except	in	the	case	of	brief
quotations	embedded	in	critical	articles	or	reviews.
Every	effort	has	been	made	in	the	preparation	of	this	book	to	ensure	the	accuracy	of	the	information
presented.	However,	the	information	contained	in	this	book	is	sold	without	warranty,	either	express	or
implied.	Neither	the	author,	nor	Packt	Publishing	or	its	dealers	and	distributors,	will	be	held	liable	for	any
damages	caused	or	alleged	to	have	been	caused	directly	or	indirectly	by	this	book.
Packt	Publishing	has	endeavored	to	provide	trademark	information	about	all	of	the	companies	and	products
mentioned	in	this	book	by	the	appropriate	use	of	capitals.	However,	Packt	Publishing	cannot	guarantee	the
accuracy	of	this	information.
ISBN	978-1-78913-094-2
www.packtpub.com
	
    This	book	is	dedicated	to	Maman	and	Baba...
                            –	Amin	Ahmadi	Tazehkandi,	Summer	of	2018,	Vienna
	
mapt.io
Mapt	is	an	online	digital	library	that	gives	you	full	access	to	over	5,000	books
and	videos,	as	well	as	industry	leading	tools	to	help	you	plan	your	personal
development	and	advance	your	career.	For	more	information,	please	visit	our
website.
Why	subscribe?
  Spend	less	time	learning	and	more	time	coding	with	practical	eBooks	and
  Videos	from	over	4,000	industry	professionals
Improve your learning with Skill Plans built especially for you
Did	you	know	that	Packt	offers	eBook	versions	of	every	book	published,	with
PDF	and	ePub	files	available?	You	can	upgrade	to	the	eBook	version	at	www.PacktP
ub.com	and	as	a	print	book	customer,	you	are	entitled	to	a	discount	on	the	eBook
At	www.PacktPub.com,	you	can	also	read	a	collection	of	free	technical	articles,	sign
up	for	a	range	of	free	newsletters,	and	receive	exclusive	discounts	and	offers	on
Packt	books	and	eBooks.
	
Contributors
About	the	author
Amin	Ahmadi	Tazehkandi	is	an	Iranian	author,	developer,	and	a	computer
vision	expert.	He	completed	his	computer	software	engineering	studies	in	Iran
and	has	worked	for	numerous	software	and	industrial	companies	around	the
world.
I	would	like	to	thank	my	wife,	Senem,	who	is	a	symbol	of	passion,	love,	and	strength	for	me.	I	would	also
like	to	thank	my	family,	a	big	pack	of	brilliant	engineers	born	to	an	Iranian	inventor	father	and	a	loving
mother,	for	all	of	their	unconditional	love	and	support.
                                                      	
About	the	reviewer
Zhuo	Qingliang	(KDr2	online)	is	presently	working	at	paodingai,	a	start-up
fintech	company	that	dedicated	to	improving	the	financial	industry	by	using
artificial	intelligence	technologies.	He	has	over	10	years	of	experience	in	Linux,
C,	C++,	Java,	Python,	and	Perl	development.	He	is	interested	in	programming,
doing	consulting	work,	and	contributing	to	the	open	source	community.
He maintains a personal website, KDr2, where you can find out more about him.
                                         	
Packt	is	searching	for	authors	like
you
If	you're	interested	in	becoming	an	author	for	Packt,	please	visit	authors.packtpub.c
om	and	apply	today.	We	have	worked	with	thousands	of	developers	and	tech
professionals,	just	like	you,	to	help	them	share	their	insight	with	the	global	tech
community.	You	can	make	a	general	application,	apply	for	a	specific	hot	topic
that	we	are	recruiting	an	author	for,	or	submit	your	own	idea.
                                          	
Preface
We	live	in	quite	an	exciting	time.	Every	single	day	you	can	find	a	completely
new	application	or	digital	device	that	can	perform	a	certain	task	for	you,
entertain	you,	connect	you	to	your	family	and	friends,	record	your	memories	in
the	form	of	pictures	or	videos,	and	more.	The	list	could	probably	go	on	forever.
Most	of	these	applications	or	devices—whether	it	is	a	small	smartphone	in	your
pocket,	a	tiny	smartwatch	on	your	wrist,	or	your	connected	smart	car—exist
thanks	to	the	huge	developments	in	producing	cheaper	and	faster	processors	that
have	fantastic	frameworks	and	libraries,	which	in	turn	house	algorithms	and
technologies	that	efficiently	perform	tasks	that	were	once	only	expected	from
humans	with	their	natural	sensors,	such	as	their	eyes	and	ears.
Over	the	past	years,	OpenCV,	or	the	Open	Source	Computer	Vision	library,	has
turned	into	a	complete	set	of	tools	for	developers	involved	with	computer	vision.
It	contains	almost	everything	that	is	needed	for	implementing	most	computer
vision	problems	that	you	can	think	of,	from	the	most	basic	operations	that	are
expected	from	a	proper	computer	vision	library,	such	as	resizing	and	filtering
images,	up	to	training	models	with	machine	learning	that	can	be	used	for
accurate	and	fast	object	detection	purposes.	The	OpenCV	library	contains	almost
everything	you	need	to	develop	computer	vision	applications,	and	it	also
supports	some	of	the	most	popular	programming	languages,	such	as	C++	and
Python.	You	can	even	find	OpenCV	bindings	for	the	.NET	Framework	among
others.	OpenCV	can	run	on	almost	any	major	operating	system	that	you	can
think	of,	whether	on	mobile	or	desktop	platforms.
This	book	is	the	result	of	months	of	hard	work,	and	it	would	have	not	been
possible	without	the	invaluable	help	of	Tiksha	Sarang,	for	her	patience	and
amazing	editing;	Adhithya	Haridas,	for	his	precise	and	insightful	technical
reviews	and	comments;	Sandeep	Mishra,	for	this	wonderful	opportunity;	the
very	helpful	Technical	Reviewer,	Mr.	Zhuo	Qingliang;	and	everyone	else	at
Packt	Publishing	who	helped	me	create	and	deliver	this	book	as	it	is	and	serve
the	open	source	and	computer	vision	community.
Who	this	book	is	for
Any	developer	with	a	solid	understanding	of	the	C++	programming	language
and	knowledge	about	the	usage	of	third-party	libraries	in	the	operating	system	of
their	choice	will	find	the	topics	of	this	book	very	easy	to	follow	and	work	with.
On	the	other	hand,	developers	familiar	with	the	Python	programming	language
can	also	use	this	book	to	learn	about	the	usage	of	OpenCV	library;	however,
they'll	need	to	adapt	the	examples	from	C++	to	Python	by	themselves,	as	this
book's	main	focus	will	be	on	the	C++	versions	of	the	presented	algorithms.
                                        	
What	this	book	covers
        ,	Introduction	to	Computer	Vision,	lays	out	the	basics	of	the	computer
Chapter	1
vision	science—what	it	is;	where	it	is	used;	the	definition	of	an	image	and	its
basic	properties,	such	as	pixels,	depth,	and	channels;	and	so	on.	This	short
chapter	is	a	totally	introductory	chapter	meant	for	people	who	are	completely
new	to	the	computer	vision	world.
Chapter	2 ,	Getting	Started	with	OpenCV,	introduces	the	OpenCV	library	and
detail	its	core	by	going	through	the	most	important	building	blocks	of	OpenCV
development.	You'll	also	be	presented	with	information	about	where	to	get	it	and
how	to	use	it.	This	chapter	will	even	briefly	go	through	the	usage	of	CMake	and
how	to	create	and	build	OpenCV	projects,	after	which	you'll	learn	about	the	Mat
class	and	its	variants,	reading	and	writing	images	and	videos,	and	accessing
cameras	(among	other	input	source	types).
are	used	to	create	or	alter	matrices.	In	this	chapter,	you'll	learn	how	to	perform
matrix	operations,	such	as	the	cross	product,	the	dot	product,	and	inversion.	This
chapter	will	introduce	you	to	many	of	the	so-called	per-element	matrix
operations,	along	with	mathematical	operations	such	as	mean,	sum,	and	Fourier
transformation.
Firstly,	to	get	an	idea	of	what	a	regular	computer	is	these	days,	you	can	search
online	or	ask	a	local	shop;	however,	the	one	you	already	have	is	most	probably
enough	to	get	you	started.
	
Download	the	example	code	files
You	can	download	the	example	code	files	for	this	book	from	your	account	at	www.
packtpub.com.	If	you	purchased	this	book	elsewhere,	you	can	visit	www.packtpub.com/su
Once	the	file	is	downloaded,	please	make	sure	that	you	unzip	or	extract	the
folder	using	the	latest	version	of:
The	code	bundle	for	the	book	is	also	hosted	on	GitHub	at	https://github.com/PacktPu
blishing/Hands-On-Algorithms-for-Computer-Vision.	In	case	there's	an	update	to	the	code,
it	will	be	updated	on	the	existing	GitHub	repository.
We	also	have	other	code	bundles	from	our	rich	catalog	of	books	and	videos
available	at	https://github.com/PacktPublishing/.	Check	them	out!
Download	the	color	images
We	also	provide	a	PDF	file	that	has	color	images	of	the	screenshots/diagrams
used	in	this	book.	You	can	download	it	here:	https://www.packtpub.com/sites/default/f
iles/downloads/HandsOnAlgorithmsforComputerVision_ColorImages.pdf.
Conventions	used
There	are	a	number	of	text	conventions	used	throughout	this	book.
CodeInText:	Indicates	code	words	in	text,	database	table	names,	folder	names,
filenames,	file	extensions,	pathnames,	dummy	URLs,	user	input,	and	Twitter
handles.	Here	is	an	example:	"On	the	other	hand,	to	rotate	an	image,	you	can	use
the	rotate	function."
Bold:	Indicates	a	new	term,	an	important	word,	or	words	that	you	see	onscreen.
For	example,	words	in	menus	or	dialog	boxes	appear	in	the	text	like	this.	Here	is
an	example:	"Before	we	start	with	the	shape-analysis	and	feature-analysis
algorithms,	we	are	going	to	learn	about	an	easy-to-use,	extremely	powerful
method	of	object	detection	called	template	matching."
        Warnings	or	important	notes	appear	like	this.
General	feedback:	Email	[email protected]	and	mention	the	book	title	in	the
subject	of	your	message.	If	you	have	questions	about	any	aspect	of	this	book,
please	email	us	at	[email protected].
Errata:	Although	we	have	taken	every	care	to	ensure	the	accuracy	of	our
content,	mistakes	do	happen.	If	you	have	found	a	mistake	in	this	book,	we	would
be	grateful	if	you	would	report	this	to	us.	Please	visit	www.packtpub.com/submit-errata,
selecting	your	book,	clicking	on	the	Errata	Submission	Form	link,	and	entering
the	details.
Piracy:	If	you	come	across	any	illegal	copies	of	our	works	in	any	form	on	the
Internet,	we	would	be	grateful	if	you	would	provide	us	with	the	location	address
or	website	name.	Please	contact	us	at	[email protected]	with	a	link	to	the
material.
If	you	are	interested	in	becoming	an	author:	If	there	is	a	topic	that	you	have
expertise	in	and	you	are	interested	in	either	writing	or	contributing	to	a	book,
please	visit	authors.packtpub.com.
Reviews
	
Please	leave	a	review.	Once	you	have	read	and	used	this	book,	why	not	leave	a
review	on	the	site	that	you	purchased	it	from?	Potential	readers	can	then	see	and
use	your	unbiased	opinion	to	make	purchase	decisions,	we	at	Packt	can
understand	what	you	think	about	our	products,	and	our	authors	can	see	your
feedback	on	their	book.	Thank	you!
	
Table	of	Contents
      Title	Page
Dedication
Packt Upsell
Why subscribe?
PacktPub.com
Contributors
Preface
Conventions used
Get in touch
Reviews
Technical requirements
Color spaces
Summary
Questions
2.	   Getting	Started	with	OpenCV
Technical requirements
Introduction to OpenCV
Accessing pixels
Accessing cameras
Mat-like classes
Summary
Questions
           Further	reading
3.	   Array	and	Matrix	Operations
Technical requirements
Cloning a matrix
Extracting a diagonal
Matrix inversion
Transposing a matrix
Basic operations
Weighted addition
Matrix inversion
Summary
Questions
4.	   Drawing,	Filtering,	and	Transformation
Technical requirements
Drawing on images
Drawing shapes
Filtering images
Blurring/smoothening filters
Morphological filters
Derivative-based filters
Arbitrary filtering
Transforming images
Thresholding algorithms
Geometric transformation
Applying colormaps
Summary
           Questions
5.	   Back-Projection	and	Histograms
Technical requirements
Understanding histograms
Displaying histograms
Back-projection of histograms
Comparing histograms
Equalizing histograms
Summary
Questions
           Further	reading
6.	   Video	Analysis	–	Motion	Detection	and	Tracking
Technical requirements
Processing videos
Summary
           Questions
7.	   Object	Detection	–	Features	and	Descriptors
Technical requirements
Edge-detection algorithms
Summary
           Questions
8.	   Machine	Learning	in	Computer	Vision
Technical requirements
Creating samples
Summary
Questions
Assessments
This	is	intended	as	a	brief	introductory	chapter	that	lays	out	the	foundation	of
concepts	that	will	be	used	in	many,	if	not	all,	of	the	computer	vision	algorithms
available.	Even	if	you	are	already	familiar	with	computer	vision	and	the	basics,
such	as	images,	pixels,	channels,	and	so	on,	it	is	still	a	good	idea	to	go	through
this	chapter	briefly	to	make	sure	that	you	understand	the	fundamental	concepts
of	computer	vision	and	to	refresh	your	memory.
In	this	chapter,	we'll	be	starting	with	a	brief	introduction	to	the	field	of	computer
vision.	We'll	go	through	some	of	the	most	important	industries	where	computer
vision	is	used,	with	examples.	After	that,	we'll	directly	dive	into	some	basic
computer	vision	concepts,	starting	with	images.	We'll	learn	what	images	are	in
terms	of	computer	vision	and	what	their	building	blocks	are,	too.	During	this
process,	we'll	cover	concepts	such	as	pixels,	depth,	and	channels,	all	of	which
are	crucial	to	understanding	and	successfully	working	hands-on	with	computer
vision	algorithms.
By the end of this chapter, you will have learned about the following:
So,	which	definition	are	we	to	accept?	The	answer	is	quite	simple—both.	To	put
it	in	just	a	few	words,	computer	vision	refers	to	the	algorithms,	methods,	and
technologies	that	deal	with	digital	visual	data	(or	any	data	that	can	be	visualized)
in	any	way	imaginable.	Note	that	visual	data	in	this	sense	does	not	mean	just
images	taken	using	conventional	cameras,	but	they	might	be,	for	instance,	a
graphical	representation	or	elevation	on	a	map,	a	heat	intensity	map,	or	any	data
that	can	be	visualized	regardless	of	its	real-world	meaning.
With	this	definition,	all	of	the	following	questions—as	well	as	many	more—can
be	solved	with	computer	vision:
In	this	day	and	age,	computer	vision	is	one	of	the	hottest	topics	in	the	computer
science	and	software	industry.	The	reason	for	this	lies	in	the	fact	that	it	is	used	in
a	variety	of	ways,	whether	it's	bringing	to	life	the	ideas	for	applications,	digital
devices,	or	industrial	machines	that	handle	or	simplify	a	wide	range	of	tasks	that
are	usually	expected	from	the	human	eye.	There	are	a	lot	of	working	examples
for	what	we	just	mentioned,	which	vary	across	a	wide	spectrum	of	industries,
including	the	automotive,	motion	picture,	biomedical	devices,	defense,	photo
editing	and	sharing	tools,	and	video	game	industries.	We	are	going	to	talk	about
just	a	couple	of	these	examples	and	leave	the	rest	of	them	for	you	to	research.
The	following	image	depicts	some	of	the	objects,	symbols,	and	areas	of	interest
for	the	automotive	industry,	as	images	that	are	commonly	seen	through	the
cameras	mounted	on	vehicles:
Another	great	example	of	an	industry	on	the	verge	of	a	technological	revolution
is	the	biomedical	industry.	Not	only	have	the	imaging	methods	of	human	organs
and	body	parts	undergone	a	great	deal	of	enhancement,	but	the	way	these	images
are	interpreted	and	visualized	has	also	been	improved	by	computer	vision
algorithms.	Computers	are	used	to	detect	cancerous	tissues	in	images	taken	by
microscopes	with	an	extremely	high	level	of	precision.	There	are	also	promising
and	emerging	results	from	robots	that	are	able	to	perform	surgery,	for	example.
The	following	image	depicts	two	zoomed-in	versions	of	the	same	area	from	the
same	image	in	both	a	grayscale	and	colored	(RGB)	format.	Notice	how	higher
values	in	the	grayscale	image	(on	the	left-hand	side)	correspond	to	brighter
values	and	vice	versa.	Similarly,	in	the	color	image	(on	the	right-hand	side),	you
can	see	that	the	value	of	the	red	channel	is	quite	high,	which	is	consistent	with
the	reddish	color	of	that	area,	as	well	as	the	white	channels:
In	addition	to	what	we	mentioned	previously,	an	image	has	a	few	more
specifications,	which	are	as	follows:
        For	now,	we'll	only	focus	on	the	basic	concepts	of	computer	vision	and	not	go	into	the
        implementation	details	of	the	preceding	image	modification	example.	We'll	learn	about	this
        and	many	more	image	processing	techniques	and	algorithms	in	the	upcoming	chapters.
The	image	properties	mentioned	in	this	section	(width,	height,	resolution,	depth,
and	channels)	are	extensively	used	in	computer	vision.	For	instance,	in	several
cases,	if	an	image	processing	algorithm	is	too	complex	and	time-consuming,
then	the	image	can	be	resized	to	make	it	smaller	so	that	less	time	is	used	to
process	it.	Once	processed,	the	results	can	then	be	mapped	back	to	the	original
image	size	and	displayed	to	the	user.	The	same	process	also	applies	to	depth	and
channels.	If	an	algorithm	only	needs	a	specific	channel	of	an	image,	you	can
either	extract	and	process	it	separately	or	use	the	grayscale-converted	version	of
an	image.	Note	that,	once	the	object	detection	algorithm	has	completed	its	job,
you'll	want	to	display	the	results	over	the	original	colored	image.	Having	a
correct	understanding	of	these	kinds	of	image	properties	will	help	you	a	lot	when
confronting	various	computer	vision	problems	and	when	working	with	computer
vision	algorithms.	Without	further	ado,	let's	move	on	to	color	spaces.
Color	spaces
Although	its	definition	can	vary,	in	general,	a	color	space	(sometimes	referred	to
as	a	color	model)	is	a	method	that	is	used	for	interpreting,	storing,	and
reproducing	a	set	of	colors.	Let's	break	this	down	with	an	example—a	grayscale
color	space.	In	a	grayscale	color	space,	each	pixel	is	represented	with	a	single	8-
bit	unsigned	integer	value	that	corresponds	to	brightness	or	gray-intensity	of	that
pixel.	This	makes	it	possible	to	store	256	different	levels	of	grayscale,	in	which
zero	corresponds	to	absolute	black	and	255	corresponds	to	the	absolute	white.	In
other	words,	the	higher	the	value	of	a	pixel,	the	brighter	it	is,	and	vice	versa.	The
following	image	displays	all	possible	colors	that	exist	within	the	grayscale	color
space:
        An	RGB	image	that	has	the	same	R,	G,	and	B	values	in	each	of	its	individual	pixels	would
        result	in	a	grayscale	image.	In	other	words,	the	same	intensity	of	red,	green,	and	blue	would
        result	in	a	shade	of	gray.
Another	color	space	that	is	widely	used	in	computer	vision	is	the	HSV	(Hue,
Saturation,	and	Value)	color	space.	In	this	color	space,	each	pixel	is	represented
by	three	values	for	hue	(the	color),	saturation	(the	color	strength),	and	value
(how	bright	or	dark	it	is).	Hue,	as	seen	in	the	following	image,	can	be	a	value
between	0	and	360	(degrees),	which	represents	the	color	of	that	pixel.	For
instance,	zero	and	nearby	degrees	correspond	to	red	and	other	similar	colors:
This	color	space	is	especially	popular	in	computer	vision	detection	and	tracking
algorithms	that	are	based	on	the	color	of	an	object,	as	you'll	see	later	on	in	this
book.	The	reason	for	this	is	that	the	HSV	color	space	allows	us	to	work	with
colors	regardless	of	how	dark	or	bright	they	are.	This	is	not	easy	to	achieve	with
RGB	and	similar	color	spaces,	as	looking	at	an	individual	pixel	channel	value
cannot	tell	us	its	color.
The	following	image	is	another	representation	of	the	HSV	color	space,	which
shows	the	variation	of	hue	(from	left	to	right),	saturation,	and	value	in	one
image,	thus	producing	all	possible	colors:
Besides	the	color	spaces	mentioned	in	this	section,	there	are	many	other	color
spaces,	each	with	their	own	use	cases.	For	instance,	the	four	channel	CMYK
color	space	(Cyan,	Maroon,	Yellow,	and	Key/Black)	has	proven	to	be	most
effective	in	printing	systems.
Make	sure	to	learn	about	other	popular	color	spaces	from	the	internet	and	how
they	might	be	useful	for	any	particular	computer	vision	problem.
Input,	process,	and	output
So,	now	that	we	know	images	are	basically	matrix-like	entities	that	have
essential	properties	such	as	width,	height,	element	type,	channels,	depth,	and	so
on,	the	only	big	question	that	remains	is	where	do	they	come	from,	what	happens
to	them,	and	where	do	they	go?
Let's	break	this	down	further	with	a	simple	photo	gallery	application	as	an
example.	Odds	are,	you	have	a	smartphone	that	contains	such	an	application	by
default.	A	photo	gallery	application	usually	allows	you	to	take	new	pictures	or
videos	using	the	built-in	camera	on	your	smartphone,	use	the	previously-
recorded	files,	apply	filters	on	images,	and	even	share	them	on	social	media,	via
email,	or	with	your	friends	and	family.	This	example,	though	it	might	look	like	a
simple	application	when	you	are	using	it,	contains	all	the	crucial	parts	of	a
proper	computer	vision	application.
By	taking	this	example	as	a	pretext,	we	can	say	that	images	are	provided	by	a
wide	range	of	different	input	devices	based	on	the	use	case.	Some	of	the	most
common	image	input	devices	are	as	follows:
After	an	image	is	read	using	an	input	device,	the	actual	processing	of	it	starts.
This	is	probably	the	part	of	the	computer	vision	process	cycle	that	you	are
looking	for	in	this	book	—and	for	good	reason.	This	is	where	actual	computer
vision	algorithms	are	used	to	extract	values	from	an	image,	modify	it	in	one	way
or	another,	or	perform	any	type	of	computer	vision	task	in	general.	This	part	is
usually	done	by	software	on	a	given	device.
Now,	the	output	of	this	whole	process	needs	to	be	created.	This	part	completely
depends	on	the	computer	vision	algorithm	and	the	type	of	device	that	the
computer	vision	process	is	running	on,	but	in	general,	the	following	types	of
outputs	are	expected	from	computer	vision	algorithms:
In	general,	there	are	two	main	types	of	computer	vision	libraries	that	you	can
choose	from	when	developing	computer	vision	applications;	they	are	as	follows:
You	can	search	for	many	good	examples	of	both	proprietary	and	open	source
computer	vision	libraries	online	to	compare	them	for	yourself.
The	library	that	we'll	be	using	throughout	this	book	is	the	Open	Source
Computer	Vision	library	(OpenCV).	OpenCV	is	a	computer	vision	library	with
the	following	features:
In	the	next	chapter,	we'll	introduce	the	OpenCV	framework	and	start	with	some
hands-on	computer	vision	lessons.	We'll	learn	how	OpenCV	can	be	used	to
access	input	devices,	perform	computer	vision	algorithms,	and	access	output
devices	to	display	or	record	results.	The	next	chapter	will	be	the	first	real	hands-
on	chapter	in	the	book	and	will	set	us	up	for	later,	more	practical	chapters.
	
Questions
	
    1.	 Name	two	industries,	besides	the	ones	mentioned	in	this	chapter,	that	can
        significantly	benefit	from	computer	vision.
    2.	 What	would	be	an	example	of	a	computer	vision	application	used	for
        security	purposes?	(Think	about	an	idea	for	an	application	you	haven't
        come	across.)
    3.	 What	would	be	an	example	of	a	computer	vision	application	used	for
        productivity	reasons?	(Again,	think	about	an	idea	for	an	application	that
        you	haven't	come	across,	even	though	you	might	suspect	that	it	exists.)
    4.	 How	many	megabytes	would	be	needed	to	store	a	1920	x	1080	image	with
        four	channels	and	a	depth	of	32-bits?
    5.	 Ultra-HD	images,	also	known	as	4K	or	8K	images,	are	quite	common
        nowadays,	but	how	many	megapixels	does	an	ultra-HD	image	contain?
    6.	 Name	two	commonly-used	color	spaces	besides	the	ones	mentioned	in	this
        chapter.
    7.	 Compare	OpenCV	libraries	with	computer	vision	tools	in	MATLAB.	What
        are	the	pros	and	cons	of	each?
	
Getting	Started	with	OpenCV
In	the	previous	chapter,	we	were	presented	with	an	introduction	to	computer
vision,	with	some	examples	of	the	industries	that	use	it	extensively	to	improve
their	services	and	products.	Then,	we	learned	about	the	most	basic	concepts	used
in	this	field,	such	as	images	and	pixels.	We	learned	about	color	spaces	and
finished	the	chapter	with	a	brief	discussion	of	computer	vision	libraries	and
frameworks.	We're	going	to	continue	from	where	we	left	off,	which	is	with
introducing	you	to	one	of	the	most	powerful	and	widely	used	computer	vision
libraries,	called	OpenCV.
After	learning	about	the	initial	stages	of	using	the	OpenCV	library,	we'll	move
on	to	learn	about	the	Mat	class.	We'll	see	how	all	of	the	concepts	that	we	covered
in	the	previous	chapter	about	images	are	embedded	into	the	structure	of	the	Mat
class	in	OpenCV.	We'll	also	talk	about	the	various	other	classes	that	are
compatible	with	(or	closely	related	to)	the	Mat	class.	OpenCV's	method	of
handling	input	and	output	parameters	in	its	functions	is	a	very	important	subject
that	we'll	be	covering	in	the	later	sections	of	this	chapter.	Finally,	we'll	learn	how
OpenCV	is	used	to	apply	the	three	steps	of	input,	process,	and	output	in
computer	vision	applications.	This	will	require	learning	about	accessing	(and
writing	into)	images	and	video	files	using	OpenCV.
This	chapter,	as	a	direct	connection	to	the	previous	introductory	chapter,	will	lay
out	the	foundations	of	learning	computer	vision	algorithms	using	hands-on	and
practical	examples.
     What	OpenCV	is,	where	to	get	it,	and	how	to	use	it?
     How	to	use	CMake	to	create	OpenCV	projects?
     Understanding	the	Mat	class	and	how	it	is	used	to	access	pixels
     How	to	use	the	Mat_,	Matx,	and	UMat	classes?
     How	to	use	the	imread	and	imwrite	functions	to	read	and	write	images?
     How	to	use	the	VideoCapture	and	VideoWriter	classes	to	read	and	write	videos?
     How	to	access	cameras	and	video	feeds	from	a	network	(using	Real	Time
     Streaming	Protocol	(RTSP))?
Technical	requirements
	
     Microsoft	Visual	Studio,	Xcode,	or	any	IDE	that	can	be	used	to	develop
     C++	programs
     Visual	Studio	Code	or	any	other	code	editor	that	can	be	used	to	edit	CMake
     files,	Python	source	files,	and	so	on
     Python	3.X
     CMake	3.X
     OpenCV	3.X
        It	is	always	best	to	try	to	use	a	more	recent	version	of	the	technologies	and	software	that	you
        are	trying	to	learn.	The	topics	covered	in	this	book,	and	computer	vision	in	general,	are	no
        exception,	so	make	sure	to	download	and	install	the	latest	versions	of	the	mentioned	software.
Wherever	necessary,	a	brief	set	of	instructions	are	provided	for	the	installation
and	configuration.	You	can	use	the	following	URL	to	download	the	source	codes
and	examples	for	this	chapter:
https://github.com/PacktPublishing/Hands-On-Algorithms-for-Computer-Vision/tree/master/Cha
pter02
	
Introduction	to	OpenCV
	
	
	
The	Main	modules	in	OpenCV
	
    core :	This	contains	all	of	the	core	OpenCV	functionalities.	For	instance,	all
    basic	structures,	including	the	Mat	class	(which	we'll	learn	about	in	detail
    later)	and	matrix	operations	are	some	of	the	functionalities	embedded	into
    this	module.
    imgproc:	This	module	contains	all	image-processing	functionalities,	such	as
    filtering,	transformations,	and	histograms.
    imgcodecs:	This	module	includes	functions	that	are	used	for	reading	and
    writing	images.
    videoio:	This	is	similar	to	the	imgcodecs	module,	but	this	one	is	used	to	work
    with	videos,	as	the	name	implies.
    highgui:	This	module,	which	we'll	use	extensively	throughout	the	book,
    contains	all	of	the	functionalities	used	for	displaying	results	and	GUI
    creation	in	general.	Note	that	even	though	the	highgui	module	is	enough	for
    the	purpose	of	this	book	and	learning	about	computer	vision	algorithms
    while	visualizing	the	results	as	we	move	forward,	it's	still	not	meant	for
    full-scale	applications.	Refer	to	the	Further	reading	section	at	the	end	of
    this	chapter	for	more	references	about	proper	GUI	creation	tools	for	full-
    scale	computer	vision	applications.
    video:	Contains	video	analysis	functionalities	of	OpenCV,	such	as	motion
    detection	and	tracking,	the	Kalman	filter,	and	the	infamous	CAM	Shift
    algorithm	(used	for	object	tracking).
    calib3d:	This	module	includes	calibration	and	3D	reconstruction
    functionalities.	A	well-known	example	of	the	capabilities	of	this	module	is
    the	estimation	of	transformation	between	two	images.
    features2d:	Supported	keypoint-detection	and	descriptor-extraction
    algorithms	are	included	in	this	module.	As	we'll	learn	in	the	upcoming
    chapters,	this	module	contains	some	of	the	most	widely	used	object
    detection	and	categorization	algorithms.
    objdetect:	As	the	name	implies,	this	module	is	used	for	object	detection
    using	OpenCV.	We'll	learn	about	the	functionalities	contained	within	this
    module	in	the	final	chapters	of	this	book.
    dnn:	Similar	to	the	objdetect	module,	this	module	is	also	used	for	object
    detection	and	classification	purposes,	among	others.	The	dnn	module	is
    relatively	new	in	the	list	of	the	Main	modules	of	OpenCV,	and	it	contains
    all	of	the	capabilities	related	to	deep	learning.
    ml:	This	machine	learning	module	contains	the	classes	and	functions	used	to
    handle	classification	and	regression.	Simply	put,	all	strictly	machine
    learning-related	capabilities	are	included	in	this	module.
    flann:	This	is	OpenCV's	interface	to	Fast	Library	for	Approximate
    Nearest	Neighbors	(FLANN).	FLANN	contains	a	wide	set	of	optimized
    algorithms	that	are	used	to	deal	with	the	nearest	neighbor	search	of	high-
    dimensional	features	in	large	datasets.	The	algorithms	mentioned	here	are
    mostly	used	in	conjunction	with	algorithms	in	other	modules,	such	as
    features2d.
    photo:	An	interesting	module	for	photography-related	computer	vision,	it
    contains	classes	and	functions	that	are	used	to	deal	with	tasks	such	as
    denoising,	HDR	imaging,	and	restoring	a	region	in	a	photo	using	its
    neighborhood.
    stitching:	This	module	contains	classes	and	functions	used	for	image
    stitching.	Note	that	stitching	in	itself	is	a	very	complex	task	and	it	requires
    functions	for	rotation	estimation	and	image	warping,	all	of	which	are	also
    part	of	this	very	interesting	OpenCV	module.
    shape:	This	module	is	used	to	deal	with	shape	transformation,	matching,	and
    distance-related	topics.
    superres:	Algorithms	that	fall	into	the	category	of	resolution	enhancement
    are	included	in	the	super-resolution	module.
    videostab:	This	module	contains	algorithms	used	for	video	stabilization.
    viz:	Otherwise	known	as	the	3D	Visualizer	module,	it	contains	classes	and
    functions	that	are	used	to	deal	with	displaying	widgets	on	the	3D
    visualization	window.	This	module	will	not	be	part	of	the	topics	discussed
    in	this	book,	but	we'll	just	mention	it.
Apart	from	the	modules	that	we	just	covered,	OpenCV	also	contains	a	number	of
Main	modules	that	are	based	on	CUDA	(an	API	created	by	Nvidia).	These
modules	are	easily	distinguished	by	their	names,	which	start	with	the	word	cuda.
Since	the	availability	of	these	modules	totally	depends	on	a	specific	type	of
hardware,	and	almost	all	of	the	functionalities	inside	those	modules	are	covered,
one	way	or	another,	by	other	modules,	we're	going	to	skip	them	for	now.	But	it's
worth	noting	that	using	the	OpenCV	cuda	modules	can	significantly	improve	the
performance	of	your	applications,	provided	that	the	algorithms	you	need	are
implemented	in	them,	and	that	your	hardware	meets	the	minimum	requirements
for	them.
	
Downloading	and	building/installing
OpenCV
OpenCV,	for	the	most	part,	does	not	have	a	prebuilt	and	ready-to-use	version
(there	are	some	exceptions	we'll	cover	in	this	section)	and	similar	to	most	of	the
open	source	libraries,	it	needs	to	be	configured	and	built	from	sources.	In	this
section,	we'll	quickly	describe	how	OpenCV	is	built	(and	installed)	on	a
computer.	But	first,	you	need	to	get	the	OpenCV	source	codes	in	your	computer.
You	can	use	the	following	link	for	this:
https://opencv.org/releases.html
On	this	page,	you	can	find	release	versions	of	OpenCV.	The	latest	version,	as	of
the	time	of	writing	this	book,	is	3.4.1	so	you	should	download	it,	or	if	there	is	a
higher	version	then	just	go	with	that.
As	seen	in	the	following	screenshot,	for	each	version	of	the	OpenCV	release,
there	are	various	downloadable	entries,	such	as	the	Win,	iOS,	and	Android
packs,	but	you	should	download	Sources	and	build	OpenCV	yourself	based	on
the	platform	you	want	to	work	with:
       OpenCV	3.4.1,	by	default,	provides	prebuilt	versions	of	the	Android,	iOS,	and	64-bit	MSVC14
       and	MSVC15	(the	same	as	Microsoft	Visual	C++	2015	and	Microsoft	Visual	C++	2017)
       libraries.	So,	if	you	want	to	build	applications	for	any	of	these	platforms,	you	can	download
       the	relevant	pack	and	skip	the	OpenCV	build	process	altogether.
To build OpenCV from sources, you need the following tools on your computer:
    C/C++	compiler	with	C++11	support:	On	Windows,	this	means	any	of	the
    more	recent	Microsoft	Visual	C++	compilers,	such	as	MSVC15	(2017)	or
    MSVC14	(2015).	On	Linux	operating	systems,	you	can	use	an	up-to-date
    GCC	and,	on	macOS,	you	can	use	the	command-line	tools	for	Xcode	that
    contain	all	the	required	tools.
    CMake:	Make	sure	you	use	the	latest	version	of	CMake,	such	as	3.10,	to	be
    on	the	safe	side	with	more	recent	versions	of	OpenCV,	although	you	can
    use	CMake	3.1	and	later.
    Python:	This	is	especially	important	if	you	are	aiming	to	use	the	Python
    programming	language.
       OpenCV	contains	a	large	number	of	tools	and	libraries	and	it's	possible	to	customize	your
       build	in	many	different	ways.	For	instance,	you	can	use	the	Qt	Framework,	Intel	Threading
       Building	Blocks	(TBB),	Intel	Integrated	Performance	Primitives	(IPP),	and	other	third-
        party	libraries	to	further	enhance	and	customize	your	OpenCV	build,	but	since	we'll	be	using
        OpenCV	with	the	default	settings	and	set	of	tools,	we're	ignoring	the	aforementioned	third-
        party	tools	in	the	list	of	requirements.
After	getting	all	of	the	prerequisites	we	just	mentioned,	you	can	configure	and
build	OpenCV	by	using	CMake	and	the	corresponding	compilers,	depending	on
your	operating	system	and	desired	platforms.
The	following	screenshot	depicts	the	CMake	tool	with	a	default	set	of
configurations	visible.	Usually,	you	don't	need	to	make	any	change	to	the
configurations	unless	you	want	to	apply	your	own	set	of	customizations	to	the
OpenCV	build:
Note	that	when	CMake	is	first	opened,	you	need	to	set	the	source	and	build
folders,	which	are	visible	in	the	preceding	screenshot,	as	Where	is	the	source
code:	and	Where	to	build	the	binaries:,	respectively.	After	hitting	the	Configure
button,	you	need	to	set	a	generator	and	apply	the	settings,	then	press	Generate.
After	generation,	you	can	simply	switch	to	the	CMake	output	folder	using	a
Terminal	or	Command	Prompt	instance	and	execute	the	following	commands:
  make
  				make	install
        Be	aware	that	running	each	one	of	these	commands	can	take	some	time	depending	on	your
        computer's	speed	and	configurations.	Also	note	that	the	make	commands	can	be	different
        depending	on	the	set	of	tools	you	are	aiming	to	use.	For	instance,	if	you	are	using	Microsoft
        Visual	Studio,	then	you	need	to	replace	make	with	nmake,	or	if	you	are	using	MinGW	then	you
        have	to	replace	make	with	mingw32-make.
After	the	build	process	is	completed,	you	can	start	using	OpenCV.	The	only
thing	you	need	to	take	care	of	is	configuring	your	C++	projects	so	that	they	can
use	your	set	of	OpenCV	libraries	and	installation.
        On	Windows	operating	systems,	you	need	to	make	sure	OpenCV	DLL	files	are	accessible	by
        the	applications	you	are	building.	This	can	be	done	either	by	copying	all	of	the	required	DLLs
        to	the	same	folder	where	your	applications	are	being	built	or	simply	by	adding	the	path	of	the
        OpenCV	DLL	files	to	the	PATH	environment	variable.	Make	sure	to	take	care	of	this	before
        proceeding	further,	otherwise	your	applications	will	crash	when	they	are	executed,	even
        though	they	might	build	successfully	and	not	report	any	issues	at	compile	time.
If	you	are	going	to	use	Python	to	build	computer	vision	applications,	then	things
are	going	to	be	extremely	simple	for	you	since	you	can	use	pip	(package
manager)	to	install	OpenCV	for	Python,	using	the	following	command:
  pip	install	opencv-python
This	will	automatically	get	you	the	latest	OpenCV	version	and	all	of	its
dependencies	(such	as	numpy)	or,	if	you	have	already	installed	OpenCV,	you	can
use	the	following	command	to	make	sure	it's	upgraded	to	the	latest	version:
  pip	install	--upgrade	opencv-python
It	goes	without	saying	that	you	need	a	working	internet	connection	for	these
commands	to	work.
Using	OpenCV	with	C++	or	Python
	
In	this	section,	we'll	demonstrate	how	you	can	use	OpenCV	in	your	C++	or
Python	projects	with	a	very	simple	example	that	we'll	call	HelloOpenCV.	You	might
already	know	that	the	purpose	of	such	a	project	is	either	one	of	the	following:
       To	get	started	with	a	new	library,	such	as	OpenCV,	that	you've	never	used
       before
       To	make	sure	your	OpenCV	installation	is	functional	and	works	fine
So,	even	if	you	are	not	an	OpenCV	beginner,	it's	still	worth	going	through	the
following	instructions	and	running	the	simple	example	in	this	section	to	test	your
OpenCV	build	or	installation.
We'll start with the required steps for using OpenCV in a C++ project:
project(HelloOpenCV)
set(OpenCV_DIR "path_to_opencv")
find_package(OpenCV REQUIRED)
            include_directories(${OpenCV_INCLUDE_DIRS})	
        	
add_executable(${PROJECT_NAME} "main.cpp")
target_link_libraries(${PROJECT_NAME} ${OpenCV_LIBS})
      In	the	preceding	code,	you	need	to	replace	"path_to_opencv"	with	the	path	to	the	folder
      containing	the	OpenCVConfig.cmake	and	OpenCVConfig-version.cmake	files,	which	is	the	same	folder
      where	you	have	installed	your	OpenCV	libraries.	If	you	are	using	the	Linux	operating	system
      and	the	prebuilt	OpenCV	libraries,	you	might	not	need	an	exact	path	to	the	OpenCV	folder.
4.	 As	for	the	main.cpp	file,	make	sure	it	contains	the	following,	which	is	the
    actual	C++	code	we'll	be	running:
        #include	<iostream>	
#include <opencv2/opencv.hpp>
int main()
if(!image.empty())
        				{	
       								imshow("image",	image);	
waitKey();
else
return 0;
5.	 Our	C++	project	is	ready.	Now,	we	can	use	CMake	to	generate	Visual
    Studio	or	any	other	type	of	project	that	we	want	(depending	on	the
    platform,	compiler,	and	IDE	we're	going	to	use)	and	then	build	and	run	it.
    Note	that	CMake	is	simply	used	to	make	sure	a	cross-platform	and	IDE-
    independent	C++	project	is	created.
  By	running	this	example	project,	your	input	image	(in	this	case	MyImage.png)
   will	be	read	and	displayed	until	any	key	on	the	keyboard	is	pressed.	If	any
   problems	occur	during	the	reading	of	the	image,	then	the	Empty	image!	message
   will	be	displayed.
   We	can	create	and	run	the	same	project	in	Python	by	using	the	following
   code:
         import	cv2	
image = cv2.imread("MyImage.png")
cv2.imshow("image", image)
cv2.waitKey()
else:
print("Empty image!")
The	resemblance	is	quite	unmistakable	here.	The	exact	same	imshow	and	waitKey
functions	are	also	used	in	the	Python	version	of	the	same	code.	As	was
mentioned	before,	for	now	don't	bother	with	the	exact	way	of	using	any	of	the
functions	and	just	focus	on	making	sure	that	you	are	able	to	run	these	programs,
either	in	C++	or	Python,	or	both,	and	that	you	are	able	to	see	the	image
displayed.
If	you	were	able	to	successfully	run	the	HelloOpenCV	example	project	in	this
section,	then	you	are	ready	to	take	on	the	next	sections	of	this	chapter	and	the
next	chapters	of	this	book	without	any	problems.	If	you	still	face	problems	with
the	topics	discussed	so	far,	or	you	feel	that	you	need	a	stronger	understanding	of
those	topics,	you	can	go	through	them	once	again	from	the	start	of	the	chapter,	or
even	better,	you	can	refer	to	the	additional	books	mentioned	in	the	Further
reading	section	at	the	end	of	this	chapter.
	
Understanding	the	Mat	class
Refer	back	to	the	description	that	was	provided	for	an	image	in	computer	vision
from	the	previous	chapter,	and	that	any	image	is	in	fact	a	matrix	with	a	given
width,	height,	number	of	channels,	and	depth.	With	this	description	in	mind,	we
can	say	that	the	OpenCV	Mat	class	can	be	used	to	handle	image	data	and	it
supports	all	properties	required	by	an	image,	such	as	width	and	height.	In	fact,
the	Mat	class	is	an	n-dimensional	array	that	can	be	used	to	store	single	or	multiple
channels	of	data	with	any	given	data	type,	and	it	contains	a	number	of	members
and	methods	to	create,	modify,	or	manipulate	it	in	many	ways.
In	this	section,	we're	going	to	learn	about	some	of	the	most	important	members
and	methods	of	the	Mat	class	with	example	use	cases	and	code	samples.
        The	equivalent	of	the	OpenCV	C++	Mat	class	in	Python	is	not	originally	an	OpenCV	class,
        and	it	is	represented	by	the	numpy.ndarray	type.	NumPy	is	a	Python	library	that	contains	a	wide
        set	of	numerical	algorithms	and	mathematical	operations,	and	it	supports	working	with	large
        multi-dimensional	arrays	and	matrices.	The	reason	why	the	numpy.ndarray	type	in	Python	is	used
        as	Mat	is	that	it	offers	the	best	(if	not	the	same)	set	of	members	and	methods	required	by	the
        OpenCV	Mat	class	in	C++.	For	a	complete	list	of	members	and	methods	supported	by
        numpy.ndarray,	you	can	refer	to	NumPy	documentation.
Constructing	a	Mat	object
Mat	contains	about	20	different	constructors	that	can	be	used	to	create	instances
of	it,	based	on	what	kind	of	initialization	is	needed.	Let's	see	some	of	the	most
commonly	used	constructors	with	some	examples.
Creating	a	Mat	object	(or	class	instance)	that	has	a	width	of	1920	and	height	of	1080,
with	three	channels	that	contain	32-bit	floating-point	values	looks	as	follows:
  Mat	image(1080,	1920,	CV_32FC3);	
Note	that	the	type	parameter	in	the	Mat	constructors	accepts	a	special	type	of
parameter,	that	is,	a	constant	value	that	contains	the	depth,	type,	and	number	of
channels.	The	pattern	is	seen	here:
      CV_<depth><type>C<channels>	
<depth> 	can	be	replaced	by	8,	16,	32,	or	64,	which	represent	the	number	of	bits
used	to	store	each	element	in	the	pixels.	The	actual	number	of	bits	needed	for
each	pixel	can	be	calculated	by	multiplying	this	number	with	the	number	of
channels,	or	in	other	words,	<channels>.	Finally,	<type>	needs	to	be	replaced	with	U,
S,	or	F	for	the	unsigned	integer,	signed	integer,	and	floating-point	values,
respectively.	For	example,	you	can	use	the	following	to	create	standard	grayscale
and	colored	images	with	a	width	of	800	and	height	of	600	pixels.	Notice	that	only
the	number	of	channels	is	different,	and	that	the	depth	and	type	parameters
represent	8-bit	unsigned	integers:
      Mat	grayscaleImage(600,	800,	CV_8UC1);	
      Mat	colorImage(600,	800,	CV_8UC3);	
You	can	use	the	following	constructor	to	create	a	three-channel	RGB	image	with
a	width	of	W,	height	of	H,	and	8-bit	unsigned	integer	elements,	and	then	initialize
all	elements	with	the	R,	G,	and	B	color	value:
      int	W	=	800,	H	=	600,	R	=	50,	G	=	150,	B	=	200;	
      Mat	image(H,	W,	CV_8UC3,	Scalar(R,	G,	B));
          It's	important	to	note	that	the	default	order	of	colors	in	OpenCV	is	BGR	(instead	of	RGB),
          which	means	a	swapped	B	and	R	value.	This	is	especially	important	if	we're	aiming	to	display
          our	processed	images	at	some	point	during	the	application	run	time.
So,	the	correct	way	of	the	scalar	initializer	in	the	preceding	code	would	be	as
follows:
      Scalar(B,	G,	R)	
If	we	need	a	Mat	object	with	higher	dimensions,	we	can	use	the	following.	Note
that	in	the	following	example,	a	Mat	object	of	seven	dimensions	is	created.	The
size	of	each	dimension	is	provided	in	the	sizes	array	and	each	element	in	the	high
dimensional	Mat,	which	is	called	hdm,	contains	two	channels	of	32-bit	floating-
point	values:
      const	int	dimensions	=	7;	
      const	int	sizes[dimensions]	=	{800,	600,	3,	2,	1,	1,	1};	
      Mat	hdm(7,	sizes,	CV_32FC2);	
Another	way	of	achieving	the	same	thing	would	be	using	C++	vectors,	as	seen	in
the	following	example:
      vector<int>	sizes	=	{800,	600,	3,	2,	1,	1,	1};	
      Mat	hdm(sizes,	CV_32FC2);	
Similarly,	you	can	provide	an	additional	Scalar	parameter	to	initialize	all	of	the
values	in	Mat.	Note	that	the	number	of	values	in	Scalar	must	match	the	number	of
channels.	For	instance,	to	initialize	all	of	the	elements	in	the	previous	seven-
dimensional	Mat,	we	can	use	the	following	constructor:
      Mat	hdm(sizes,	CV_32FC2,	Scalar(1.25,	3.5));	
The	Mat	class	allows	us	to	use	already-stored	image	data	to	initialize	it.	Using	this
constructor,	you	can	make	your	Mat	class	contain	the	same	data	that	the	data
pointer	is	pointing	to.	Note	that	this	constructor	does	not	create	a	completely
new	copy	of	the	original	data,	and	it	only	makes	this	newly	created	Mat	object
point	to	it.	This	allows	very	efficient	initialization	and	construction	of	the	Mat
classes,	but	has	the	obvious	downside	of	not	taking	care	of	the	memory	cleanup
when	it	is	not	needed,	so	you	need	to	be	extra	careful	when	using	this
constructor:
      Mat	image(1080,	1920,	CV_8UC3,	data);
Note	that,	unlike	the	previous	constructors	and	their	initializers,	data	here	is	not	a
Scalar	but	a	pointer	to	a	chunk	of	memory	that	contains	a	1920	by	1080	pixel,	three-
channel	image	data.	This	method	of	initializing	a	Mat	object	with	a	pointer	to	a
memory	space	can	also	be	used	with	higher	dimensionalities	of	the	Mat	class.
One	last	constructor	type,	which	is	also	one	of	the	most	important	constructors
of	the	Mat	class,	is	the	region	of	interest	(ROI)	constructor.	This	constructor	is
used	to	initialize	a	Mat	object	with	a	region	inside	another	Mat	object.	Let's	break
this	down	with	an	example.	Imagine	you	have	an	image	and	you	want	to	apply
some	modifications	to	a	specific	region	inside	that	image,	or	in	other	words	to
the	ROI.	You	can	use	the	following	constructor	to	create	a	Mat	class	that	has
access	to	the	ROI,	and	any	changes	applied	to	it	will	affect	the	same	region	of
the	original	image.	Here's	how	you	can	do	that:
      Mat	roi(image,	Rect(240,	140,	300,	300));	
If	the	preceding	constructor	is	used	while	image	(which	itself	is	a	Mat	object)
contains	the	picture	on	the	left	side	of	the	following	image,	then	roi	will	have
access	to	the	region	highlighted	in	that	image,	and	it	will	contain	the	image	seen
on	the	right	side:
The	Rect	class	in	OpenCV	is	used	to	represent	a	rectangle	that	has	a	top-left
point,	width,	and	height.	For	instance,	the	Rect	class	that	was	used	in	the
preceding	code	example	has	a	top-left	point	of	240	and	140,	width	of	300,	and
height	of	300	pixels,	as	seen	here:
      Rect(240,	140,	300,	300)
As	was	mentioned	earlier,	modifying	the	ROI	in	any	way	will	result	in	the
original	image	being	modified.	For	instance,	we	can	apply	something	similar	to
the	following	image-processing	algorithm	to	roi	(for	now	don't	bother	with	the
nature	of	the	following	algorithm,	as	we'll	learn	more	about	it	in	the	upcoming
chapters,	and	just	focus	on	the	concept	of	ROIs):
      dilate(roi,	roi,	Mat(),	Point(-1,-1),	5);	
If	we	attempt	to	display	the	image,	the	result	would	be	similar	to	the	following.
Notice	the	area	that	was	highlighted	in	the	previous	image	is	modified	(dilated)
in	the	following	image,	even	though	we	applied	the	change	to	roi,	and	not	the
image	itself:
Similar	to	constructing	an	ROI	Mat	object	using	a	Rect	class	that	corresponds	to	a
rectangular	region	of	an	image,	you	can	also	create	an	ROI	that	corresponds	to	a
range	of	columns	and	rows	in	the	original	Mat	object.	For	instance,	the	same
region	in	the	preceding	example	can	also	be	accessed	with	the	ranges	seen	in	the
following	constructor:
     Mat	roi(image,	Range(140,	440),	Range(240,	540));	
The	Range	class	in	OpenCV	represents	a	range	that	has	start	and	end.	Depending
on	the	values	of	start	and	end,	Range	class	can	be	checked	to	see	whether	it's	empty
or	not.	In	the	preceding	constructor,	the	first	Range	class	corresponds	to	the	rows
of	the	original	image,	from	row	140	to	row	440.	The	second	Range	corresponds	to
the	columns	of	the	original	image,	from	column	240	to	column	540.	The	shared
space	of	the	provided	two	ranges	is	considered	to	be	the	resulting	ROI.
Deleting	a	Mat	object
	
A	Mat	object	can	be	cleaned	up	by	using	its	release	function,	however,	since	release
is	called	in	the	destructor	of	the	Mat	class,	there	is	usually	no	need	to	call	this
function	at	all.	It's	important	to	note	that	the	Mat	class	shares	the	same	data
between	multiple	objects	that	point	to	it.	This	has	the	advantage	of	less	data
copying	and	less	memory	usage,	and	since	all	of	the	reference	counting	is	done
automatically,	you	don't	usually	need	to	take	care	of	anything.
The	only	scenario	in	which	you	need	to	take	extra	care	of	how	and	when	your
objects	and	data	are	cleaned	up	is	when	you	use	a	data	pointer	to	construct	a	Mat
object,	as	mentioned	in	the	previous	section.	In	such	cases,	calling	the	release
function	of	the	Mat	class,	or	its	destructor,	will	have	nothing	to	do	with	the
external	data	you	have	used	to	construct	it,	and	the	cleanup	will	be	completely
on	your	shoulders.
	
Accessing	pixels
Apart	from	using	an	ROI	to	access	the	pixels	in	a	rectangular	region	of	an	image,
as	we	did	in	the	previous	sections,	there	are	a	few	other	methods	for	achieving
the	same	goal	or	even	for	accessing	individual	pixels	of	an	image.	To	be	able	to
access	any	single	pixel	in	an	image	(in	other	words,	a	Mat	object),	you	can	use	the
at	function,	as	seen	in	the	following	example:
image.at<TYPE>(R, C)
In	the	preceding	example,	in	the	usage	of	the	at	function,	TYPE	must	be	replaced
with	a	valid	type	name	that	is	in	accordance	with	the	number	of	channels	and
depth	of	the	image.	R	must	be	replaced	with	the	row	number,	and	C	with	the
column	number	of	the	pixels	we	want	to	have	access	to.	Notice	that	this	is
slightly	different	from	the	usual	pixel-access	methods	in	many	libraries,	in	which
the	first	parameter	is	X	(or	left)	and	the	second	parameter	is	Y	(or	top).	So
basically,	the	parameters	appear	reversed	here.	Here	are	some	examples	of
accessing	individual	pixels	in	different	types	of	Mat	objects.
In	the	preceding	code,	the	Vec3b	(vector	of	3	bytes)	type	is	used.	This	and	various
other	similar	vector	types	are	defined	in	OpenCV	for	convenience.	Here	is	the
pattern	of	the	OpenCV	Vec	types	that	you	can	use	with	the	at	function,	or	for	any
other	purposes:
      Vec<N><Type>	
corresponds	to	the	number	of	channels	in	a	Mat	object.	<Type>,	on	the	other	hand,
can	be	one	of	the	following,	which	represent	the	type	of	the	data	stored	in	each
channel	of	each	pixel:
For	instance,	Vec4b	can	be	used	to	access	the	pixels	of	a	four-channel	Mat	object
with	the	uchar	elements,	and	Vec6f	can	be	used	to	access	the	pixels	of	a	six-
channel	Mat	object	with	the	float	elements.	It's	important	to	note	that	the	Vec	type
can	be	treated	like	an	array	to	access	individual	channels	too.	Here's	an	example
of	how	to	access	the	second	channel	of	a	three-channel	Mat	object	with	the	uchar
elements:
      image.at<Vec3b>(R,	C)[1]
It's	important	to	note	that	by	access	we	mean	both	reading	and	writing	to	a	pixel
and	its	individual	channels.	For	instance,	the	following	example	is	one	way	to
apply	a	sepia	filter	to	an	image:
      for(int	i=0;	i<image.rows;	i++)	
      {	
      				for(int	j=0;	j<image.cols;	j++)	
      				{	
      								int	inputBlue	=	image.at<Vec3b>(i,j)[0];	
      								int	inputGreen	=	image.at<Vec3b>(i,j)[1];	
      								int	inputRed	=	image.at<Vec3b>(i,j)[2];	
      	
      								int	red	=	
      																inputRed	*	0.393	+	
      																inputGreen	*	0.769	+	
      																inputBlue	*	0.189;	
      	
      								if(red	>	255	)	red	=	255;	
      	
      								int	green	=	
      																inputRed	*	0.349	+	
      																inputGreen	*	0.686	+	
      																inputBlue	*	0.168;	
      	
      								if(green	>	255)	green	=	255;	
      	
      								int	blue	=	
      																inputRed	*	0.272	+	
      																inputGreen	*	0.534	+	
      																inputBlue	*	0.131;	
      	
      								if(blue	>	255)	blue	=	255;	
      	
      								image.at<Vec3b>(i,j)[0]	=	blue;	
      								image.at<Vec3b>(i,j)[1]	=	green;	
      								image.at<Vec3b>(i,j)[2]	=	red;	
      				}	
      }	
First,	a	few	things	to	note	here	are	the	rows	and	cols	members	of	the	image,	which
basically	represent	the	number	of	rows	(or	height)	and	the	number	of	columns
(or	width)	in	it.	Also	notice	how	the	at	function	is	used	both	to	extract	the	values
of	channels	and	to	write	updated	values	into	them.	Don't	worry	about	the	values
used	in	this	example	to	multiply	and	get	the	correct	sepia	tone,	as	they	are
specific	to	the	tone	itself,	and	essentially	any	type	of	operation	can	be	applied	to
the	individual	pixels	to	change	them.
The	following	image	depicts	the	result	of	applying	the	preceding	example	code
on	a	three-channel	color	image	(left—original	image,	right—filtered	image):
In	the	preceding	code,	the	second	parameter,	or	the	position	parameter	(which	is
not	needed	and	therefore	omitted	here)	is	the	pointer	to	the	position	of	the	pixel.
Using	the	previous	for	loop,	we	would	need	to	write	the	following:
      for(int	i=0;	i<image.rows;	i++)	
      				for(int	j=0;	j<image.cols;	j++)	
      								image.at<uchar>(i,j)	/=	5;	
OpenCV	also	allows	the	use	of	STL-like	iterators	to	access	or	modify	individual
pixels	in	an	image.	Here's	the	same	example	but	written	using	STL-like	iterators:
      MatIterator_<uchar>	it_begin	=	image.begin<uchar>();	
      MatIterator_<uchar>	it_end	=	image.end<uchar>();	
      for(	;	it_begin	!=	it_end;	it_begin++)	
      {	
      				*it_begin	/=	5;	
      }
It's	interesting	to	note	that	the	same	operation	in	all	of	the	preceding	three
examples	can	also	be	done	by	using	the	following	simple	statement:
      image	/=	5;	
This	is	because	the	Mat	object	in	OpenCV	treats	this	statement	as	an	element-wise
divide	operation,	which	we'll	learn	more	about	it	in	the	upcoming	chapters.	The
following	image	depicts	the	result	of	applying	the	preceding	examples	to	a
grayscale	image	(left—original	image,	right—modified	image):
Obviously,	forEach,	the	C++	for	loop,	and	STL-like	iterators	can	all	be	used	to
access	and	modify	the	pixels	within	a	Mat	object.	We'll	suffice	to	the	functions
and	members	discussed	in	this	section	about	the	Mat	class,	but	make	sure	to
explore	the	huge	set	of	functionalities	it	provides	to	work	with	images	and	their
underlying	properties	in	an	efficient	way.
Reading	and	writing	images
OpenCV	allows	reading	an	image	from	a	disk	into	a	Mat	object	using	the	imread
function,	which	we	briefly	used	in	this	chapter	in	a	previous	example.	The	imread
function	accepts	an	input	image	file	name	and	a	flag	parameter,	and	returns	a	Mat
object	filled	with	the	input	image.	The	input	image	file	must	have	one	of	the
image	formats	supported	by	OpenCV.	Here	are	some	of	the	most	popular
supported	formats:
Make	sure	to	always	check	the	OpenCV	documentation	for	a	complete	and
updated	list,	and	especially	for	exception	cases	and	notes	that	might	apply	to
some	formats	on	some	operating	systems.	As	for	the
flag	parameter,	it	can	be	one	or	a	combination	of	the	values	from	the	ImreadModes
enum,	which	is	defined	in	OpenCV.	Here	are	a	few	of	the	most	widely	used	and
self-explanatory	entries:
       IMREAD_UNCHANGED
       IMREAD_GRAYSCALE
       IMREAD_COLOR
       IMREAD_IGNORE_ORIENTATION
For	example,	the	following	code	can	be	used	to	read	an	image	from	a	disk,
without	taking	the	orientation	value	stored	in	the	EXIF	data	of	the	image	and
also	converted	to	grayscale:
       Mat	image	=	imread("MyImage.png",	
       				IMREAD_GRAYSCALE	|	IMREAD_IGNORE_ORIENTATION);
          Exchangeable	Image	File	Format	(EXIF)	is	a	standard	for	adding	tags	and	additional	data
          (or	metadata)	to	images	taken	by	digital	cameras.	These	tags	might	include	the	manufacturer
          and	camera	model	and	the	orientation	of	the	camera	while	the	photo	was	taken.	OpenCV	is
          capable	of	reading	certain	tags	(such	as	orientation)	and	interpreting	them,	or	in	the	case	of
        the	preceding	sample	code,	ignoring	them.
After	the	image	is	read,	you	can	call	empty	to	see	whether	it	was	read	successfully
or	not.	You	can	also	use	channels	to	get	the	number	of	channels,	depth	to	get	the
depth,	type	to	get	the	type	of	the	image,	and	so	on.	Alternatively,	you	can	call	the
imshow	function	to	display	it,	as	we	saw	previously	in	this	chapter.
Similarly,	the	imreadmulti	function	can	be	used	to	read	a	multipage	image	into	a
vector	of	Mat	objects.	The	obvious	difference	here	is	that	imreadmulti	returns	a	bool
value	that	can	be	checked	for	successful	reading	of	the	pages	and	fills	the
vector<Mat>	object	passed	to	it	by	reference.
To	be	able	to	write	an	image	to	a	file	on	disk,	you	can	use	the	imwrite	function.
imwrite	takes	the	file	name	that	will	be	written	to,	a	Mat	object,	and	a	vector	of	int
values	containing	the	write	parameters,	which	can	be	ignored	in	the	case	of
default	parameters.	See	the	following	enums	in	OpenCV	for	a	complete	list	of
parameters	that	can	be	used	with	the	imwrite	function	to	alter	the	behavior	of	the
write	process:
     ImwriteFlags
     ImwriteEXRTypeFlags
     ImwritePNGFlags
     ImwritePAMFlags
The	following	is	an	example	code	that	depicts	how	to	write	a	Mat	object	into	an
image	file	on	disk	using	the	imwrite	function.	Note	that	the	format	of	the	image	is
derived	from	the	extension	provided,	in	this	case	png:
      bool	success	=	imwrite("c:/my_images/image1.png",	image);	
      cout	<<	(success	?	
      												"Image	was	saved	successfully!"	
      										:	
      												"Image	could	not	be	saved!")	
      					<<	endl;	
Besides	the	imread	and	imwrite	functions,	which	are	used	to	read	and	write	images
from	and	into	image	files	on	disk,	you	can	also	use	the	imdecode	and	imencode
functions	to	read	images	stored	in	a	buffer	in	memory	or	write	into	them.	We'll
leave	those	two	functions	for	you	to	discover,	and	move	on	to	the	next	topic,
which	is	accessing	videos	using	OpenCV.
Reading	and	writing	videos
OpenCV,	using	its	videoio	module	or,	to	be	precise,	using	the	VideoCapture	and
VideoWriter	classes,	allows	us	to	read	and	write	video	files.	The	obvious	difference
in	the	case	of	videos,	is	that	they	contain	a	consecutive	set	of	images	(or	better
yet,	frames)	as	opposed	to	just	a	single	image.	So,	they	are	usually	read	and
processed	or	written	in	a	loop	that	covers	the	whole	or	any	desired	number	of	the
frames	in	a	video.	Let's	start	with	example	code	that	shows	how	to	read	and	play
a	video	using	the	OpenCV	VideoCapture	class:
      VideoCapture	vid("MyVideo.mov");	
      //	check	if	video	file	was	opened	correctly	
      if(!vid.isOpened())	
      {	
      				cout	<<	"Can't	read	the	video	file";	
      				return	-1;	
      }	
      //	get	frame	rate	per	second	of	the	video	file	
      double	fps	=	vid.get(CAP_PROP_FPS);	
      if(fps	==	0)	
      {	
      				cout	<<	"Can't	get	video	FPS";	
      				return	-1;	
      }	
      //	required	delay	between	frames	in	milliseconds	
      int	delay_ms	=	1000.0	/	fps;	
      //	infinite	loop	
      while(true)	
      {	
      				Mat	frame;	
      				vid	>>	frame;	
      				if(frame.empty())	
      								break;	
      				//	process	the	frame	if	necessary	...	
      				//	display	the	frame	
      				imshow("Video",	frame);	
      				//	stop	playing	if	space	is	pressed	
      				if(waitKey(delay_ms)	==	'	')	
      								break;	
      }	
      //	release	the	video	file	
      vid.release();	
As	seen	in	the	preceding	code,	the	video	file	name	is	passed	to	the	VideoCapture
class	when	it	is	constructed.	This	automatically	leads	to	the	opening	of	the	video
file,	if	it	exists	and	if	the	format	is	supported	by	your	computer	(and	OpenCV).
Consequently,	you	can	check	whether	the	video	file	was	successfully	opened	or
not	using	the	isOpened	function.	Right	after	that,	the	get	function	of	the	VideoCapture
class	is	used	to	retrieve	the	framerate	per	second	(FPS)	of	the	video	file	that
has	been	opened.	get	is	an	extremely	important	function	of	VideoCapture	and	it
allows	us	to	retrieve	a	wide	range	of	properties	of	an	opened	video	file.	Here	are
some	example	parameters	that	can	be	provided	to	the	get	function	to	get	the
desired	result:
Apart	from	the	way	we	used	the	VideoCapture	class	and	its	methods,	we	can	also
call	its	open	function	to	open	the	video	file,	if	we	do	not	want	to	pass	the	file
name	to	the	constructor,	or	if	the	video	file	is	not	present	at	the	VideoCapture
construction	time.	Another	important	function	of	VideoCapture	is	the	set	function.
Think	of	set	as	the	exact	opposite	of	the	get	function,	in	the	sense	that	it	allows
setting	the	parameters	of	VideoCapture	and	the	opened	video	file.	Try
experimenting	with	it	for	yourself	with	different	parameters,	mentioned	before	in
the	VideoCaptureProperties	enum.
To	be	able	to	write	into	a	video	file,	you	can	use	the	VideoWriter	class	in	a	very
similar	way.	Here's	an	example	that	shows	how	a	VideoWriter	object	is	created:
      VideoWriter	wrt("C:/output.avi",	
      																VideoWriter::fourcc('M','J','P','G'),	
      																30,	Size(1920,	1080));	
This	will	create	a	video	file	at	"C:/output.avi"	with	a	width	of	1920	and	height	of
1080	pixels	at	30	frames	per	second,	ready	to	be	filled	with	frames.	But	what	is
format	(or	the	codec,	to	be	precise)	that	will	be	used	to	record	the	video	file.	In
this	example,	we	have	used	one	of	the	most	common	FourCC	values,	but	you
can	check	online	for	a	more	comprehensive	list	of	FourCC	values	and	their
specifications.
After	a	VideoWriter	object	is	created,	you	can	use	the	<<	operator	or	the	write
function	to	write	an	image	(of	the	exact	same	size	as	the	video)	into	the	video
file:
      wrt	<<	image;	
Finally,	you	can	call	the	release	function	to	make	sure	the	video	file	is	released,
and	all	of	your	changes	are	written	into	it.
        Apart	from	the	aforementioned	methods	of	using	the	VideoCapture	and	VideoWriter	classes,	you
        can	also	set	the	preferred	backend	used	by	them.	For	more	information	about	this,	refer	to	the
        VideoCaptureAPIs	enum	in	the	OpenCV	documentation.	When	omitted,	which	was	the	case	in	our
        examples,	the	default	backend	supported	by	your	computer	is	used.
Accessing	cameras
	
OpenCV	supports	accessing	cameras	that	are	available	on	a	system	by	using	the
same	VideoCapture	class	we	used	for	accessing	the	video	files.	The	only	difference
is	that	instead	of	passing	a	file	name	to	the	constructor	of	the	VideoCapture	class	or
its	open	function,	you	must	provide	a	0-based	index	number	that	corresponds	to
each	available	camera.	For	instance,	the	default	webcam	on	a	computer	can	be
accessed	and	displayed	by	using	the	following	example	code:
      VideoCapture	cam(0);	
if(!cam.isOpened())
return -1;
// infinite loop
while(true)
Mat frame;
if(frame.empty())
      								break;	
     	
imshow("Camera", frame);
break;
cam.release();
As	you	can	see,	the	only	difference	is	in	the	constructor.	This	implementation	of
the	VideoCapture	class	allows	users	to	treat	any	type	of	video	source	the	same	way,
thus	writing	almost	the	exact	same	code	to	deal	with	cameras	instead	of	video
files.	This	is	also	the	case	with	network	feeds,	as	described	in	the	next	section.
	
Accessing	RTSP	and	network	feeds
	
OpenCV	allows	users	to	read	video	frames	from	a	network	feed	or,	to	be	precise,
from	an	RTSP	stream	located	on	a	network,	such	as	the	local	network	or	even	the
internet.	To	be	able	to	do	this,	you	need	to	pass	the	URL	of	the	RTSP	stream	to
the	VideoCapture	constructor	or	its	open	function,	exactly	the	same	way	as	if	it
were	a	file	on	a	local	hard	disk.	Here's	the	most	common	pattern	and	an	example
URL	that	can	be	used:
    rtsp://user:[email protected]/somevideo	
In	this	URL,	user	is	replaced	by	the	actual	username,	password	with	the	password
of	that	user,	and	so	on.	In	the	event	that	the	network	feed	does	not	require	a
username	and	password,	they	can	be	omitted.
	
Mat-like	classes
	
Besides	the	Mat	class,	OpenCV	provides	a	number	of	other	classes	that	are	quite
similar	to	Mat,	but	different	in	how	and	when	they	are	used.	Here	are	the	most
important	Mat-like	classes	that	can	be	used	instead	of,	or	together	with,	the	Mat
class:
     Mat_:	This	is	a	subclass	of	the	Mat	class,	but	it	provides	a	better	access
     method	than	the	at	function,	that	is,	using	().Mat_	is	a	template	class	and
     obviously	needs	to	be	provided	with	the	type	of	the	elements	at	compile
     time,	something	that	can	be	avoided	with	the	Mat	class	itself.
     Matx:	This	is	best	used	with	matrices	that	have	a	small	size,	or	to	be	precise,
     a	known	and	small	size	at	compile	time.
     UMat:	This	is	a	more	recent	implementation	of	the	Mat	class,	which	allows	us
     to	use	OpenCL	for	faster	matrix	operations.
        Using	UMat	can	significantly	increase	the	performance	of	your	computer	vision	applications,
        but	since	it	is	used	in	exactly	the	same	way	as	the	Mat	class,	we'll	ignore	it	throughout	the
        chapters	of	this	book;	however,	in	practice	and	especially	in	real-time	computer	vision
        applications,	you	must	always	make	sure	to	use	classes	and	functions	that	are	better	optimized
        and	more	performant,	such	as	UMat.
	
Summary
	
We're	through	all	the	crucial	topics	that	are	needed	to	easily	take	on	computer
vision	algorithms	using	hands-on	examples	and	real-life	scenarios.	We	started
this	chapter	by	learning	about	OpenCV	and	its	overall	structure,	including	its
modules	and	main	building	blocks.	This	helped	us	gain	a	perspective	of	the
computer	vision	library	that	we'll	be	working	on,	but,	more	importantly,	this
gave	us	an	overview	of	what's	possible	and	what	to	expect	when	dealing	with
computer	vision	algorithms	in	general.	We	moved	on	to	learn	where	and	how	to
get	OpenCV	and	how	to	install	or	build	it	on	our	own.	We	also	learned	how	to
create,	build,	and	run	C++	and	Python	projects	that	use	the	OpenCV	library.
Then,	by	learning	all	about	the	Mat	class	and	dealing	with	pixels	in	an	image,	we
learned	how	to	alter	and	display	an	image.	The	final	sections	of	this	chapter
included	everything	we	need	to	know	about	reading	and	writing	images	from
files	saved	on	disk,	whether	they	are	single	(or	multi-page)	images	or	videos
files,	and	also	cameras	and	network	feeds.	We	finished	the	chapter	by	learning
about	a	few	other	types	from	the	OpenCV	Mat	family	that	can	help	improve	our
applications.
Now	that	we	are	aware	of	the	real	nature	of	an	image	(that	it	is,	in	essence,	a
matrix),	we	can	start	with	possible	operations	on	matrices	and	matrix-like
entities.	In	the	next	chapter,	we're	going	to	learn	all	about	the	matrix	and	array
operations	that	fall	into	the	realm	of	computer	vision.	By	the	end	of	the	next
chapter,	we'll	be	able	to	perform	tons	of	pixel-wise	and	image-wise	operations
and	transformations	using	OpenCV.
	
Questions
	
	
Further	reading
  Computer	Vision	with	OpenCV	3	and	Qt5:	https://www.packtpub.com/application-
  development/computer-vision-opencv-3-and-qt5
  Learn	Qt5:	https://www.packtpub.com/web-development/learn-qt-5
  Qt5	Projects:	https://www.packtpub.com/application-development/qt-5-projects
Array	and	Matrix	Operations
Now	that	we're	past	the	first	part	of	the	book	and	the	introductions	and
fundamental	concepts	in	computer	vision,	we	can	start	with	the	computer	vision
algorithms	and	functions	provided	by	the	OpenCV	library,	which	more	or	less
cover	any	computer	vision	topic	that	comes	to	mind,	with	optimized
implementations.	As	we	learned	in	the	previous	chapters,	OpenCV	uses	a
modular	structure	to	categorize	the	computer	vision	functionalities	contained	in
it.	We'll	move	forward	with	the	topics	in	this	book	with	a	similar	structure	in
mind,	so	that	the	skills	learned	are	related	to	each	other	in	each	chapter,	not	just
from	a	theoretical	point	of	view,	but	also	from	a	hands-on	perspective.
In	the	previous	chapter,	we	learned	about	the	relationship	between	images	and
matrices,	and	covered	the	most	crucial	functionalities	of	the	Mat	class,	such	as
constructing	it	using	a	given	width,	height,	and	type.	We	also	learned	how	to
read	images	from	disk,	a	network	feed,	video	files,	or	cameras.	During	the
process,	we	learned	how	to	access	pixels	in	an	image	using	various	methods.	We
can	now	start	with	actual	image-	and	pixel-modification	and	manipulation
functionalities.
In	this	chapter,	we're	going	to	learn	about	a	large	number	of	functions	and
algorithms	that	are	used	to	deal	with	images,	either	for	calculating	a	value	that
might	be	useful	in	another	process	or	for	directly	modifying	the	values	of	the
pixels	in	an	image.	Almost	all	of	the	algorithms	presented	in	this	chapter	are
based	on	the	fact	that	images	are	essentially	matrices,	and	also	the	fact	that
matrices	are	implemented	using	arrays	of	data,	hence	the	name	of	this	chapter!
We'll	start	this	chapter	by	covering	the	functionalities	within	the	Mat	class	itself,
which	are	few,	but	quite	important	when	it	comes	to	the	creation	of	initial
matrices	and	so	on.	Then,	we'll	move	on	to	learn	about	a	large	number	of	per-
element	(or	element-wise)	algorithms.	These	algorithms,	as	we'll	learn	by	going
through	many	hands-on	examples,	apply	a	specific	operation	on	each	individual
element	of	a	matrix	and	they	are	not	concerned	with	any	other	element	(or
pixel).	Finally,	we'll	learn	about	matrix	and	array	operations	that	are	not	element-
wise	and	the	result	may	depend	on	the	whole	image	or	groups	of	elements.	This
will	all	clear	up	as	we	move	forward	with	the	algorithms	in	this	chapter.	It's
important	to	note	that	all	of	the	algorithms	and	functions	in	this	chapter	are
contained	within	the	core	module	of	OpenCV	library.
By the end of this chapter, you'll have a better understanding of the following:
Refer	to	Chapter	2,	Getting	Started	with	OpenCV,	for	more	information	about	how
to	set	up	a	personal	computer	and	make	it	ready	for	developing	computer	vision
applications	using	OpenCV	library.
You	can	use	the	following	URL	to	download	the	source	codes	and	examples	for
this	chapter:	https://github.com/PacktPublishing/Hands-On-Algorithms-for-Computer-Vision/t
ree/master/Chapter03
	
Operations	contained	in	the	Mat	class
In	this	section,	we're	going	to	go	through	the	set	of	mathematical	and	other
operations	that	are	contained	inside	the	Mat	class	itself.	Even	though	there	is	no
general	pattern	of	usage	for	the	functions	in	the	Mat	class,	most	of	them	are
related	to	creating	a	new	matrix,	whether	it	is	using	an	existing	one	or	from
scratch.	So,	let's	start.
        During	the	course	of	the	book,	the	words	image,	matrix,	the	Mat	class,	and	so	on	will	be	used
        interchangeably,	and	they	all	mean	the	same	thing,	unless	explicitly	stated.	Take	this	chance	to
        get	used	to	the	idea	of	thinking	about	images	in	terms	of	matrices,	just	like	a	computer	vision
        expert	would	do.
                                                   	
Cloning	a	matrix
You	can	use	Mat::clone	to	create	a	completely	independent	clone	of	a	Mat	object.
Note	that	this	function	creates	a	full	copy	of	the	image	with	its	own	allocated
space	in	memory.	Here's	how	it's	used:
     Mat	clone	=	image.clone();	
You	can	also	use	the	copyTo	function	to	do	the	same,	as	seen	here:
     Mat	clone;	
     image.copyTo(clone);	
In	both	of	the	preceding	code	examples,	image	is	the	original	matrix	(or	image)
that	is	read	from	an	image,	camera,	or	produced	in	any	possible	way	prior	to
performing	the	cloning	operation.	From	now	on,	and	in	all	of	the	examples	in
this	and	upcoming	chapters,	image	is	simply	a	Mat	object	that	is	the	source	of	our
operations,	unless	otherwise	stated.
Calculating	the	cross-product
	
You	can	use	Mat::cross	to	calculate	the	cross-product	of	two	Mat	objects	that	have
three	floating-point	elements,	as	seen	in	the	following	example:	Mat	A(1,	1,
CV_32FC3),	B(1,	1,	CV_32FC3);
Obviously, AxB is not the same as BxA in the cross-product of two vectors.
	
Extracting	a	diagonal
        	can	be	used	to	extract	a	diagonal	from	a	Mat	object,	as	seen	in	the
Mat::diag
following	example:
  int	D	=	0;	//	or	+1,	+2,	-1,	-2	and	so	on	
  Mat	dg	=	image.diag(D);	
This	function	accepts	an	index	parameter	that	can	be	used	to	extract	additional
diagonals	beside	the	main	diagonal,	as	depicted	in	the	following	diagram:
If	D=0,	then	the	extracted	diagonal	will	contain	1,	6,	11,	and	16,	which	is	the
main	diagonal.	But	depending	on	the	value	of	D,	the	extracted	diagonal	will	be
above	or	below	the	main	diagonal,	as	seen	in	the	preceding	diagram.
Calculating	the	dot	product
To	calculate	the	dot	product,	or	scalar	product,	or	inner	product	of	two	matrices,
you	can	use	the	Mat::dot	function,	as	seen	here:
     double	result	=	A.dot(B);	
In	case	you	need	a	value	other	than	the	ones	in	the	identity	matrix	diagonal,	you
can	use	a	scale	parameter:
     double	scale	=	0.25;	
     Mat	id	=	Mat::eye(10,	10,	CV_32F)	*	scale;	
Note	that	you	can	supply	a	matrix	decomposition	type	to	the	inv	function,	which
can	be	an	entry	in	the	cv::DecompTypes	enum.
Element-wise	matrix	multiplication
	
Needless	to	say,	this	same	function	can	also	be	used	for	element-wise	division.
Here's	an	example:
      Mat	result	=	A.mul(B);	
You	can	also	supply	an	additional	scale	parameter	that	will	be	used	to	scale	the
result.	Here's	another	example:
      double	scale	=	0.75;	
	
The	ones	and	zeroes	matrix
	
Mat::ones	and	Mat::zeroes	can	be	used	to	create	a	matrix	of	a	given	size	with	all	of
its	elements	set	to	one	or	zero,	respectively.	These	matrices	are	usually	used	to
create	initializer	matrices.	Here	are	some	examples:
      Mat	m1	=	Mat::zeroes(240,	320,	CV_8UC1);	
In	case	you	need	to	create	a	matrix	that	is	filled	with	a	value	other	than	ones,	you
can	use	something	like	the	following:
      Mat	white	=	Mat::ones(240,	320,	CV_8UC1)	*	255;	
	
Transposing	a	matrix
You	can	use	Mat::t	to	transpose	a	matrix.	Here's	an	example:	Mat	transpose	=
image.t();
The	image	on	the	left	is	the	original	image	and	the	image	on	the	right	is	the
transpose	of	the	original	image.	As	you	can	see,	the	result	is	a	rotated
(clockwise)	and	flipped	(vertically)	version	of	the	original	image.	As	you'll	learn
later	in	this	chapter,	transpose,	or	the	Mat::t	function,	can	be	used	in	conjunction
with	the	flip	function	to	rotate	or	flip/mirror	an	image	in	all	possible	directions.
        Transposing	a	transpose	matrix	is	the	same	as	the	original	matrix,	and	running	the	following
        code	will	result	in	the	original	image	itself:
        Mat	org	=	image.t().t();
Note	that	passing	a	value	of	zero	as	the	number	of	channels	means	that	the
number	of	channels	will	stay	the	same	as	the	source.	Similarly,	passing	a	value
of	zero	as	the	number	of	rows	means	the	number	of	rows	in	the	image	will	stay
the	same.
Note	that	Mat::resize	is	another	useful	function	for	reshaping	a	matrix,	but	it	only
allows	changing	the	number	of	rows	in	an	image.	Another	function	that	can
come	in	handy	when	reshaping	matrices	or	dealing	with	the	number	of	elements
in	a	matrix,	is	the	Mat::total	function,	which	returns	the	total	number	of	elements
in	an	image.
That's	about	it	for	the	functionalities	embedded	into	the	Mat	class	itself.	Make
sure	to	go	through	the	Mat	class	documentation	and	familiarize	yourself	with
possible	variants	of	the	methods	you	just	learned	in	this	section.
Element-wise	matrix	operations
Per-element	or	element-wise	matrix	operations	are	mathematical	functions	and
algorithms	in	computer	vision	that	work	on	individual	elements	of	a	matrix	or,	in
other	words,	pixels	of	an	image.	It's	important	to	note	that	element-wise
operations	can	be	parallelized,	which	fundamentally	means	that	the	order	in
which	the	elements	of	a	matrix	are	processed	is	not	important.	This	specification
is	the	most	important	feature	that	separates	the	functions	and	algorithms	in	this
section	and	the	upcoming	sections	of	this	chapter.
                                        	
Basic	operations
OpenCV	provides	all	the	necessary	functions	and	overloaded	operators	that	you
need	to	perform	all	four	basic	operations	of	addition,	subtraction,	multiplication,
and	division	between	two	matrices	or	a	matrix	and	a	scalar.
The	addition	operation
The	add	function	and	the	+	operator	can	be	used	to	add	the	elements	of	two
matrices,	or	a	matrix	and	a	scalar,	as	seen	in	the	following	examples:	Mat	image
=	imread("Test.png");	Mat	overlay	=	imread("Overlay.png");	Mat	result;
add(image,	overlay,	result);
You	can	replace	the	last	line	in	the	preceding	code	with	the	following:
     result	=	image	+	overlay;	
In	case,	you	want	to	add	a	single	scalar	value	to	all	elements	of	a	Mat	object,	you
can	simply	use	something	similar	to	the	following:	result	=	image	+	80;
If	the	preceding	code	is	executed	on	a	grayscale	image,	the	result	will	be	brighter
than	the	source	image.	Note	that	if	the	image	has	three	channels,	you	must	use	a
three-item	vector	instead	of	a	single	value.	For	instance,	to	be	able	to	make	an
RGB	image	brighter,	you	can	use	the	following:	result	=	image	+	Vec3b(80,	80,
80);
Here's	an	image	depicting	the	brighter	result	image,	when	the	preceding	code	is
executed	on	it:
In	the	preceding	example	codes,	simply	increase	the	added	value	to	get	an	even
brighter	image.
Weighted	addition
Besides	the	simple	addition	of	two	images,	you	can	also	use	the	weighted
addition	function	to	consider	a	weight	for	each	of	the	two	images	that	are	being
added.	Think	of	it	as	setting	an	opacity	level	for	each	of	the	participants	in	an	add
operation.	To	perform	a	weighted	addition,	you	can	use	the	addWeighted	function:
double	alpha	=	1.0;	//	First	image	weight	double	beta	=	0.30;	//	Second	image
weight	double	gamma	=	0.0;	//	Added	to	the	sum	addWeighted(image,	alpha,
overlay,	beta,	gamma,	result);
If	executed	on	the	sample	pictures	from	the	previous	section	with	the	add
example,	the	result	would	be	similar	to	the	following:
Notice	the	transparent	text	that	is	similar	to	the	watermark	usually	applied	by
photo-editing	applications.	Notice	the	comments	in	the	code	regarding	the	alpha,
beta,	and	gamma	values?	Obviously,	providing	a	beta	value	of	1.0	would	have	made
this	example	exactly	the	same	as	a	regular	add	function	with	no	transparency	for
the	overlay	text.
The	subtraction	operation
Similar	to	adding	two	Mat	objects	to	each	other,	you	can	also	subtract	all	elements
of	one	image	from	another	using	the	subtract	function	or	the	-	operator.	Here's	an
example:
      Mat	image	=	imread("Test.png");	
      Mat	overlay	=	imread("Overlay.png");	
      Mat	result;	
      subtract(image,	overlay,	result);	
The	last	line	in	the	preceding	code	can	also	be	replaced	with	this:
      result	=	image	-	overlay;	
Here's	the	result	of	the	subtraction	operation	if	we	used	the	same	two	images
from	the	previous	examples:
Notice	how	the	subtraction	of	higher	pixel	values	(brighter	pixels)	from	the
source	image	results	in	the	dark	color	of	the	overlay	text.	Also	note	that	the
subtraction	operation	depends	on	the	order	of	its	operands,	unlike	addition.	Try
swapping	the	operands	and	see	what	happens	for	yourself.
Just	like	addition,	it's	also	possible	to	multiply	a	constant	number	with	all	of	the
pixels	of	an	image.	You	can	guess	that	subtraction	of	a	constant	value	from	all
pixels	will	result	in	a	darker	image	(depending	on	the	subtracted	value)	which	is
the	opposite	of	the	addition	operation.	Here's	an	example	of	making	an	image
darker	with	a	simple	subtraction	operation:
     result	=	image	-	80;
In	case	the	source	image	is	a	three-channel	RGB	image,	you	need	to	use	a	vector
as	the	second	operand:
     result	=	image	-	Vec3b(80,	80,	80);	
The	multiplication	and	division
operations
Similar	to	addition	and	subtraction,	you	can	also	multiply	all	elements	of	a	Mat
object	with	all	elements	of	another	Mat	object.	The	same	can	be	done	with	the
division	operation.	Again,	both	operations	can	be	performed	with	a	matrix	and	a
scalar.	Multiplication	can	be	done	using	OpenCV's	multiply	function	(similar	to
the	Mat::mul	function),	while	division	can	be	performed	using	the	divide	function.
scale	in	the	preceding	code	is	an	additional	parameter	that	can	be	supplied	to	the
multiply	and	divide	functions	to	scale	all	of	the	elements	in	the	result	Mat	object.
You	can	also	perform	multiplication	or	division	with	a	scalar,	as	seen	in	the
following	examples:	resultBrighter	=	image	*	5;	resultDarker	=	image	/	5;
Obviously,	the	preceding	code	will	produce	two	images,	one	that	is	five	times
brighter	and	one	that	is	five	times	darker	than	the	original	image.	The	important
thing	to	note	here	is	that,	unlike	addition	and	subtraction,	the	resulting	image
will	not	be	homogeneously	brighter	or	darker,	and	you'll	notice	that	brighter
areas	become	much	brighter	and	vice	versa.	The	reason	for	this	is	obviously	the
effect	of	multiplication	and	division	operations,	in	which	the	value	of	brighter
pixels	grows	or	drops	much	faster	than	smaller	values	after	the	operation.	It's
interesting	to	note	that	this	same	technique	is	used	in	most	photo-editing
applications	to	brighten	or	darken	the	bright	areas	of	an	image.
Bitwise	logical	operations
Just	like	basic	operations,	you	can	also	perform	bitwise	logical	operations	on	all
of	the	elements	of	two	matrices	or	a	matrix	and	a	scalar.	For	this	reason,	you	can
use	the	following	functions:
     bitwise_not
     bitwise_and
     bitwise_or
     bitwise_xor
It's	immediately	recognizable	from	their	names	that	these	functions	can	perform
Not,	And,	Or,	and	Exclusive	OR	operations,	but	let's	see	how	they're	used	in	detail	with
First	things	first,	the	bitwise_not	function	is	used	to	invert	all	the	bits	of	all	pixels
in	an	image.	This	function	has	the	same	effect	as	the	inversion	operation	that	can
be	found	in	most	photo	editing	applications.	Here's	how	it's	used:
      bitwise_not(image,	result);	
The	preceding	code	can	be	replaced	with	the	following	too,	which	uses	an
overloaded	bitwise	not	operator	(~)	in	C++:
      result	=	~image;	
If	the	image	is	a	monochrome	black	and	white	image,	the	result	will	contain	an
image	with	all	white	pixels	replaced	with	black	and	vice	versa.	In	case	the	image
is	an	RGB	color	image,	the	result	will	be	inverted	(in	the	sense	of	its	binary	pixel
values),	which	is	depicted	in	the	following	example	image:
The	bitwise_and	function,	or	the	&	operator,	is	used	to	perform	a	bitwise	And
operation	on	pixels	from	two	images	or	pixels	from	an	image	and	a	scalar.	Here
is	an	example:
     bitwise_and(image,	mask,	result);	
You	can	simply	use	the	&	operator	and	write	the	following	instead:
     result	=	image	&	mask;	
The	bitwise_and	function	can	be	easily	used	to	mask	and	extract	certain	areas	in
images.	For	instance,	the	following	image	is	a	demonstration	of	how	bitwise_and
results	in	an	image	that	passes	the	white	pixels	and	removes	the	black	pixels:
Besides	masking	certain	areas	of	an	image,	the	bitwise	And	operation	can	be	used
to	filter	out	a	channel	altogether.	To	be	able	to	do	this,	you	need	to	use	the
second	form	of	the	&	operator,	which	takes	a	matrix	and	a	scalar	and	performs	the
And	operation	between	all	pixels	and	that	value.	Here	is	an	example	code	that	can
be	used	to	mask	(zero	out)	the	green	color	channel	in	an	RGB	color	image:
     result	=	image	&	Vec3b(0xFF,	0x00,	0xFF);	
Now,	let's	move	on	to	the	next	bitwise	operation,	the	Or	operation.	The	bitwise_or
and	|	operators	can	both	be	used	to	perform	a	bitwise	Or	operation	on	two
images,	or	an	image	and	a	scalar.	Here	is	an	example:
      bitwise_or(image,	mask,	result);
Similar	to	the	bitwise	And	operation,	you	can	use	the	|	operator	in	the	Or	operation
and	simply	write	the	following	instead	of	the	preceding	code:
      result	=	image	|	mask;	
If	the	And	operation	was	used	to	pass	through	the	non-zero	pixels	(or	non-black
pixels),	then	it	can	be	said	that	the	Or	operation	is	used	to	pass	through	the	pixel
with	the	higher	value	(or	brighter)	in	any	of	its	input	images.	Here's	the	result	of
performing	the	bitwise	Or	operation	on	the	previous	example	images:
Similar	to	bitwise	And	operation,	you	can	also	use	bitwise	Or	operation	to	update
an	individual	channel	or	all	the	pixels	of	an	image.	Here	is	an	example	code	that
shows	how	you	can	update	only	the	green	channel	in	an	RGB	image	to	have	the
maximum	possible	value	(which	is	255,	or	hexadecimal	FF)	in	all	of	its	pixels
and	leave	the	other	channels	as	they	are:
      result	=	image	|	Vec3b(0x00,	0xFF,	0x00);	
Notice	how	this	operation	leads	to	the	inversion	of	the	pixels	in	the	masked	area?
Think	about	the	reason	behind	this	by	writing	down	the	pixel	values	on	a	paper
and	trying	to	calculate	the	result	by	yourself.	Exclusive	Or,	and	all	bitwise
operations,	can	be	used	for	many	other	computer	vision	tasks	if	their	behavior	is
clearly	understood.
The	comparison	operations
Comparing	two	images	with	each	other	(or	given	values)	can	be	quite	useful,
especially	for	producing	masks	that	can	be	used	in	various	other	algorithms,
whether	it	is	for	tracking	some	object	of	interest	in	an	image	or	performing	an
operation	on	an	isolated	(masked)	region	of	an	image.	OpenCV	provides	a
handful	of	functions	to	perform	element-wise	comparisons.	For	instance,	the
compare	function	can	be	used	to	compare	two	images	with	each	other.	Here's	how:
The	first	two	parameters	are	the	first	and	second	image	that	will	participate	in
the	comparison.	The	result	will	be	saved	into	a	third	Mat	object,	and	the	last
parameter,	which	must	be	an	entry	from	the	CmpTypes	enum,	is	used	to	select	the
type	of	comparison,	which	can	be	any	of	the	following:
     CMP_EQ:	Means	that	the	first	image	equals	the	second	image
     CMP_GT:	Means	that	the	first	image	is	greater	than	the	second	image
     CMP_GE:	Means	that	the	first	image	is	greater	than	or	equal	to	the	second
     image
     CMP_LT:	Means	that	the	first	image	is	less	than	the	second	image
     CMP_LE:	Means	that	the	first	image	is	less	than	or	equal	to	the	second	image
     CMP_NE:	Means	that	the	first	image	is	not	equal	to	the	second	image
        Note	that	we	are	still	talking	about	element-wise	operations,	so	when	we	say	first	image	is	less
        than	or	equal	to	the	second	image,	what	we	actually	mean	is	each	individual	pixel	in	the	first
        image	contains	a	value	less	than	or	equal	to	its	exact	corresponding	pixel	in	the	second
        image,	and	so	on.
Note	that	you	can	also	use	the	overloaded	C++	operators	to	achieve	the	same
goal	as	the	compare	function.	Here	is	how,	for	each	individual	comparison	type:
     result	=	image1	==	image2;	//	CMP_EQ	
     result	=	image1	>	image2;	//	CMP_GT	
     result	=	image1	>=	image2;	//	CMP_GE	
     result	=	image1	<	image2;	//	CMP_LT	
     result	=	image1	<=	image2;	//	CMP_LE	
     result	=	image1	!=	image2;	//	CMP_NE	
Note	that	lb	and	hb	are	both	Mat	objects	of	the	same	size	and	type	of	source	image,
except	lb	is	filled	with	zeroes	and	hb	is	filled	with	the	value	of	50.	This	way,	when
inRange	is	called,	it	checks	each	individual	pixel	in	the	source	image	with	their
corresponding	pixels	in	lb	and	hb	and	sets	the	corresponding	pixel	in	the	result	to
white	if	the	value	fits	between	the	provided	boundaries.
The	following	image	depicts	the	result	of	the	inRange	function	when	it	is	executed
on	our	example	image:
The	min	and	max	functions,	as	it	can	be	easily	guessed	from	their	names,	are	two
other	comparison	functions	that	can	be	used	to	compare	two	images	(element-
wise)	and	find	out	the	minimum	or	maximum	pixel	values.	Here's	an	example:
      min(image1,	image2,	result);	
Simply	put,	these	two	functions	compare	the	pixels	of	two	images	of	the	same
size	and	type	and	set	the	corresponding	pixel	in	the	result	matrix	to	the	minimum
or	maximum	pixel	value	from	the	input	images.
The	mathematical	operations
	
Besides	the	functions	we've	learned	about	so	far,	OpenCV	also	provides	a
number	of	functions	to	deal	with	element-wise	mathematical	operations.	In	this
section,	we're	going	to	go	through	them	briefly,	but	you	can,	and	you	should
experiment	with	them	by	yourself	to	make	sure	that	you're	familiar	with	them,
and	you	can	comfortably	use	them	in	your	projects.
       In	the	preceding	code,	image1,	image2,	and	result	are	all	Mat	objects,	and	each
       element	in	the	result	represents	the	absolute	difference	between
       corresponding	pixels	in	image1	and	image2.
    The	exp	function	can	be	used	to	calculate	the	exponential	of	all	elements	in	a
    matrix:
          exp(mat,	result);	
    The	log	function	can	be	used	to	calculate	the	natural	logarithm	of	every
    element	in	a	matrix:
          log(mat,	result);	
    The	sqrt	function	is	used	to	calculate	the	square	root	of	all	elements	in	a
    matrix,	and	it's	used	as	follows:
         sqrt(mat,	result);	
       Functions	such	as	log	and	pow	should	not	be	confused	with	the	functions	of	the	same	name	in
       standard	C++	libraries.	For	better	readability	of	your	code,	consider	using	the	cv	namespace
       before	the	function	name	in	your	C++	code.	For	instance,	you	can	call	the	pow	function	as
       follows:
       cv::pow(image1,	3.0,	result);
	
Matrix	and	array-wise	operations
Unlike	the	functions	and	algorithms	that	we've	seen	so	far	in	this	chapter,	the
algorithms	in	this	section	perform	an	atomic	and	complete	operation	on	the
image	(or	matrix)	itself	and	are	not	considered	element-wise	operations	in	the
sense	that	has	been	described	so	far.	If	you	recall,	the	rule	of	thumb	for	element-
wise	operations	was	that	they	can	be	easily	parallelized	since	the	resulting	matrix
is	dependent	on	corresponding	pixels	of	two	images,	whereas	the	functions	and
algorithms	we'll	learn	about	in	this	chapter	are	not	easily	parallelizable,	or	the
resulting	pixels	and	values	might	have	little	or	nothing	at	all	to	do	with	their
corresponding	source	pixels,	or	quite	the	opposite,	the	resulting	pixels	might
depend	on	some	or	all	of	the	input	pixels	at	the	same	time.
                                         	
Making	borders	for	extrapolation
As	you'll	see	in	this	and	upcoming	chapters,	one	of	the	most	important	issues	to
handle	when	dealing	with	many	computer	vision	algorithms	is	extrapolation,	or
to	put	it	simply,	the	assumption	of	the	non-existing	pixels	outside	of	an	image.
You	might	be	wondering,	why	would	I	need	to	think	about	the	non-existing
pixels,	and	the	simplest	answer	is	that	there	are	many	computer	vision
algorithms	that	work	with	not	just	a	single	pixel,	but	also	with	their	surrounding
pixels.	In	such	cases,	when	the	pixel	is	in	the	middle	of	an	image,	there	are	no
issues.	But	for	the	pixels	at	the	borders	of	the	image	(for	instance,	in	the	topmost
row),	some	of	the	surrounding	pixels	will	fall	outside	of	the	image.	That	is
exactly	where	you	need	to	think	about	extrapolation	and	the	assumption	of	non-
existing	pixels.	Are	you	going	to	simply	assume	those	pixels	have	zero	values?
Maybe	it's	better	to	assume	they	have	the	same	value	as	the	border	pixels?	These
questions	are	all	taken	care	of	in	an	OpenCV	function	called	copyMakeBorder.
customizations	to	deal	with	all	possible	scenarios.	Let's	see	how	copyMakeBorder	is
used	with	a	couple	of	simple	examples:
      int	top	=	50;	
      int	bottom	=	50;	
      int	left	=	50;	
      int	right	=	50;	
      BorderTypes	border	=	BORDER_REPLICATE;	
      copyMakeBorder(image,	
      															result,	
      															top,	
      															bottom,	
      															left,	
      															right,	
      															border);	
     BORDER_REFLECT
     BORDER_WRAP
Note	that	when	BORDER_CONSTANT	is	used	as	the	border	type	parameter,	an	additional
scalar	parameter	must	be	provided	to	the	copyMakeBorder	function,	which	represents
the	constant	color	value	of	the	created	border.	If	this	value	is	omitted,	zero	(or
black)	is	assumed.	The	following	image	demonstrates	the	output	of	the
copyMakeBorder	function	when	it	is	executed	on	our	example	image:
function,	internally,	to	calculate	the	position	of	the	donor	pixel	used	for	the
extrapolation	and	creating	the	non-existing	pixels.	You	won't	need	to	call	this
function	directly	yourself,	so	we'll	leave	it	for	you	to	explore	and	discover	on
your	own.
Flipping	(mirroring)	and	rotating
images
You	can	use	the	flip	function	to	flip	or	mirror	an	image.	This	function	can	be
used	to	flip	an	image	around	the	x	or	y	axes,	or	both,	depending	on	the	provided
flip	code.	Here's	how	this	function	is	used:	int	code	=	+1;	flip(image,	result,
code);
If	code	is	zero,	the	input	image	will	be	flipped/mirrored	vertically	(around	the	x
axis),	if	code	is	a	positive	value,	the	input	image	will	be	flipped/mirrored
horizontally	(around	the	y	axis),	and	if	code	is	a	negative	value,	the	input	image
will	be	flipped/mirrored	around	both	the	x	and	y	axes	at	the	same	time.
On	the	other	hand,	to	rotate	an	image,	you	can	use	the	rotate	function.	What	you
need	to	take	care	of,	when	calling	the	rotate	function,	is	providing	a	correct
rotation	flag,	as	seen	in	the	following	example:	RotateFlags	rt	=
ROTATE_90_CLOCKWISE;	rotate(image,	result,	rt);
The RotateFlag enum can be one of the following self-explanatory constant values:
     ROTATE_90_CLOCKWISE
     ROTATE_180
     ROTATE_90_COUNTERCLOCKWISE
Here's	an	image	that	depicts	all	possible	results	of	the	flip	and	rotate	functions.
Notice	that	the	result	of	flip	around	both	axes	is	the	same	as	a	180-degree
rotation	in	the	following	resulting	images:	
As	mentioned	previously	in	this	chapter,	Mat::t,	or	the	transpose	of	a	matrix,	can	also	be	used
to	rotate	an	image	when	used	in	conjunction	with	the	flip	function.
Working	with	channels
OpenCV	provides	a	handful	of	functions	to	deal	with	channels,	whether	we	need
to	merge,	split,	or	perform	various	actions	on	them.	In	this	section,	we're	going
to	learn	how	to	split	an	image	to	its	consisting	channels	or	make	a	multi-channel
image	using	multiple	single-channel	images.	So,	let's	start.
You	can	use	the	merge	function	to	merge	a	number	of	single-channel	Mat	objects
and	create	a	new	multi-channel	Mat	object,	as	seen	in	the	following	sample	code:
     Mat	channels[3]	=	{ch1,	ch2,	ch3};	
     merge(channels,	3,	result);	
In	the	preceding	code,	ch1,	ch2,	and	ch3	are	all	single-channel	images	of	the	same
size.	The	result	will	be	a	three-channel	Mat	object.
You	can	also	use	the	insertChannel	function	to	insert	a	new	channel	into	an	image.
Here's	how:
     int	idx	=	2;	
     insertChannel(ch,	image,	idx);	
In	the	preceding	code,	ch	is	a	single	channel	Mat	object,	image	is	the	matrix	we
want	to	add	an	additional	channel	into	it,	and	idx	refers	to	the	zero-based	index
number	of	the	position	where	the	channel	will	be	inserted.
The	split	function	can	be	used	to	perform	exactly	the	opposite	of	the	merge
function,	which	is	splitting	a	multi-channel	image	to	create	a	number	of	single-
channel	images.	Here's	an	example:
     Mat	channels[3];	
     split(image,	channels);	
The	channels	array	in	the	preceding	code	will	contain	three	single-channel	images
of	the	same	size	that	correspond	to	each	individual	channel	in	the	image.
To	be	able	to	extract	a	single	channel	from	an	image,	you	can	use	the
extractChannel	function:
     int	idx	=	2;	
     Mat	ch;	
     extractChannel(image,	ch,	idx);	
In	the	preceding	code,	quite	obviously,	the	channel	at	position	idx	will	be
extracted	from	the	image	and	saved	in	ch.
Even	though	merge,	split,	insertChannel,	and	extractChannel	are	enough	for	most	use
cases,	you	still	might	need	a	more	complex	shuffling,	extraction,	or	manipulation
of	the	channels	in	an	image.	For	this	reason,	OpenCV	provides	a	function	called
mixChannels,	which	allows	a	much	more	advanced	way	of	handling	channels.	Let's
see	how	mixChannels	is	used	with	an	example	case.	Let's	say	we	want	to	shift	all	of
the	channels	of	an	image	to	the	right.	To	be	able	to	perform	such	a	task,	we	can
use	the	following	sample	code:
     Mat	image	=	imread("Test.png");	
     Mat	result(image.rows,	image.cols,	image.type());	
     	
     vector<int>	fromTo	=	{0,1,	
     																						1,2,	
     																						2,0};	
     mixChannels(image,	result,	fromTo);	
The	only	important	piece	of	code	in	the	preceding	example	is	the	fromTo	vector,
which	must	contain	pairs	of	values	that	correspond	to	the	channel	number	in	the
source	and	destination.	The	channels	are,	as	always,	0-index	based,	so	0,	1
means	the	first	channel	in	the	source	image	will	be	copied	to	the	second	channel
in	the	result	and	so	on.
        It's	worth	noting	that	all	previous	functions	in	this	section	(merge,	split,	and	so	on),	are	a
        partial	case	of	the	mixChannels	function.
Mathematical	functions
The	functions	and	algorithms	that	we'll	learn	about	in	this	section	are	solely	used
for	mathematical	calculations	of	a	non-element-wise	nature,	as	opposed	to	what
we	saw	earlier	on	in	this	chapter.	They	include	simple	functions,	such	as	mean	or
sum	or	more	complex	operations,	such	as	the	discrete	Fourier	transform.	Let's
browse through some of the most important ones with hands-on examples.
                                         	
Matrix	inversion
	
The	invert	function	can	be	used	to	calculate	the	inverse	of	a	matrix.	This	should
not	to	be	confused	with	the	bitwise_not	function,	which	inverts	each	and	every	bit
in	the	pixels	of	an	image.	The	invert	function	is	not	an	element-wise	function,
and	the	inversion	method	needs	to	be	provided	as	a	parameter	to	this	function.
Here's	an	example:
     DecompTypes	dt	=	DECOMP_LU;	
The	DecompTypes	enum	contains	all	possible	entries	that	can	be	used	in	the	invert
function	as	the	decomposition	type.	Here	they	are:
     DECOMP_LU
     DECOMP_SVD
     DECOMP_EIG
     DECOMP_CHOLESKY
     DECOMP_QR
Refer	to	the	OpenCV	documentation	for	the	DecompTypes	enum	if	you're	interested
in	a	detailed	description	of	each	and	every	decomposition	method.
	
Mean	and	sum	of	elements
You	can	calculate	the	mean,	or	the	average	value	of	the	elements	in	a	matrix,	by
using	the	mean	function.	Here	is	an	example	that	shows	how	to	read	an	image	and
calculate	and	display	the	mean	value	of	all	of	its	individual	channels:	Mat	image
=	imread("Test.png");	Mat	result;	Scalar	m	=	mean(image);	cout	<<	m[0]	<<
endl;	cout	<<	m[1]	<<	endl;	cout	<<	m[2]	<<	endl;
You	can	use	the	sum	function	exactly	the	same	way,	to	calculate	the	sum	of
elements	in	a	matrix:
      Scalar	s	=	sum(image);	
OpenCV	also	includes	a	meanStdDev	function	that	can	be	used	to	calculate	the	mean
and	standard	deviation	of	all	elements	in	the	matrix	at	the	same	time.	Here	is	an
example:
      Scalar	m;	
      Scalar	stdDev;	
      meanStdDev(image,	m,	stdDev);	
Similar	to	the	mean	function,	the	meanStdDev	function	also	calculates	the	results	for
each	individual	channel	separately.
Discrete	Fourier	transformation
A	discrete	Fourier	transformation	of	a	1D	or	2D	array,	or	in	other	words—an
image,	is	one	of	the	many	ways	to	analyze	an	image	in	computer	vision.	The
interpretation	of	the	result	depends	completely	on	the	field	it	is	being	used	for
and	that	is	not	what	we	are	concerned	with	in	this	book,	however,	how	to
perform	a	discrete	Fourier	transformation	is	what	we're	going	to	learn	in	this
section.
Simply	put,	you	can	use	the	dft	function	to	calculate	the	Fourier	transformation
of	an	image.	However,	there	are	some	preparations	needed	before	the	dft
function	can	be	safely	called.	The	same	applies	to	the	result	of	the	Fourier
transformation.	Let's	break	this	down	with	an	example	code,	and	by	calculating
and	displaying	the	Fourier	transformation	of	our	example	image	used	in	the
previous	sections.
The	dft	function	can	process	matrices	of	certain	sizes	(powers	of	2,	such	as	2,	4,
and	8)	a	lot	more	efficiently,	that	is	why	it	is	always	best	to	increase	the	size	of
our	matrix	to	the	nearest	optimum	size	and	pad	it	with	zeroes	before	calling	the
dft	function.	This	can	be	done	by	using	the	getOptimalDFTSize	function.	Assuming
that	image	is	the	input	image	that	we	want	to	calculate	its	discrete	Fourier
transformation,	we	can	write	the	following	code	to	calculate	and	resize	it	to	the
optimum	size	for	the	dft	function:
      int	optRows	=	getOptimalDFTSize(	image.rows	);	
      int	optCols	=	getOptimalDFTSize(	image.cols	);	
      	
      Mat	resizedImg;	
      copyMakeBorder(image,	
      															resizedImg,	
      															0,	
      															optRows	-	image.rows,	
      															0,	
      															optCols	-	image.cols,	
      															BORDER_CONSTANT,	
      															Scalar::all(0));	
As	you	can	see,	the	getOptimalDFTSize	function	must	be	called	twice,	for	rows	and
columns	separately.	You	are	already	familiar	with	the	copyMakeBorder	function.
Resizing	an	image	and	padding	the	new	pixels	with	zeroes	(or	any	other	desired
value)	is	one	of	countless	example	use	cases	for	the	copyMakeBorder	function.
The	rest	is	quite	easy,	we	need	to	form	a	two-channel	image	and	pass	it	to	the	dft
function	and	get	the	complex	(real	and	imaginary)	result	in	the	same	matrix.	This
will	later	simplify	the	process	of	displaying	the	result.	Here's	how	it's	done:
      vector<Mat>	channels	=	{Mat_<float>(resizedImg),	
      																			Mat::zeros(resizedImg.size(),	CV_32F)};	
      	
      Mat	complexImage;	
      merge(channels,	complexImage);	
      	
      dft(complexImage,	complexImage);	
We	have	already	learned	how	to	use	the	merge	function.	The	only	important	thing
to	note	in	the	preceding	code	is	the	fact	that	the	result	is	saved	into	the	same
image	as	the	input.	complexImage	now	contains	two	channels,	one	for	the	real	and
the	other	for	the	imaginary	part	of	the	discrete	Fourier	transformation.	That's	it!
We	now	have	our	result,	however,	to	be	able	to	display	it,	we	must	calculate	the
magnitude	of	the	result.	Here's	how	it's	done:
      split(complexImage,	channels);	
      	
      Mat	mag;	
      magnitude(channels[0],	channels[1],	mag);	
In	the	preceding	code,	we	have	split	the	complex	result	into	its	consisting
channels	and	then	calculated	the	magnitude	using	the	magnitude	function.
Theoretically,	mag	is	a	displayable	result,	but	in	reality,	it	contains	values	much
higher	than	what	is	displayable	using	OpenCV,	so	we	need	to	perform	a	couple
of	conversions	before	being	able	to	display	it.	First,	we	need	to	make	sure	the
result	is	in	the	logarithmic	scale,	by	performing	the	following	conversion:
      mag	+=	Scalar::all(1);	
      log(mag,	mag);	
Next,	we	must	make	sure	the	result	values	are	scaled	and	normalized	to	fit
between	0.0	and	1.0,	to	be	displayable	by	the	imshow	function	in	OpenCV.	You
need	to	use	the	normalize	function	for	this	reason:
      normalize(mag,	mag,	0.0,	1.0,	CV_MINMAX);
You	can	now	try	to	display	the	result	using	the	imshow	function.	Here's	an	example
that	displays	the	result	of	the	discrete	Fourier	transformation:
The	problem	with	this	result	is	that	the	quadrants	need	to	be	swapped	before	the
origin	of	the	result	is	at	the	center	of	the	image.	The	following	image	depicts
how	the	four	quadrants	of	the	result	must	be	swapped	before	the	result	has	an
origin	point	at	the	center:
The	following	is	the	code	needed	to	swap	the	four	quadrants	of	the	Fourier
transform	result.	Notice	how	we	first	find	the	center	of	the	result,	then	create
four	region	of	interest	(ROI)	matrices,	and	then	swap	them:
     int	cx	=	mag.cols/2;	
     int	cy	=	mag.rows/2;	
     	
     Mat	Q1(mag,	Rect(0,	0,	cx,	cy));	
     Mat	Q2(mag,	Rect(cx,	0,	cx,	cy));	
     Mat	Q3(mag,	Rect(0,	cy,	cx,	cy));	
     Mat	Q4(mag,	Rect(cx,	cy,	cx,	cy));	
     	
     Mat	tmp;	
     Q1.copyTo(tmp);	
     Q4.copyTo(Q1);	
     tmp.copyTo(Q4);	
     	
     Q2.copyTo(tmp);	
     Q3.copyTo(Q2);	
     tmp.copyTo(Q3);	
The	dft	function	accepts	an	additional	parameter	that	can	be	used	to	further
customize	its	behavior.	This	parameter	can	be	a	combination	of	values	from	the
       	enum.	For	instance,	to	perform	an	inverse	Fourier	transformation,	you
DftFlags
Make	sure	to	check	out	the	DftFlags	enum	and	the	dft	function	documentation	for
more	information	about	how	the	discrete	Fourier	transform	is	implemented	in
OpenCV.
Generating	random	numbers
	
    The	randn	function	can	be	used	to	fill	a	matrix	or	an	array	with	random
    numbers	with	a	given	mean	value	and	standard	deviation.	Here's	how	this
    function	is	used:
         randn(rmat,	mean,	stddev);	
    The	randu	function,	similar	to	the	randn	function,	is	used	to	fill	an	array	with
    random	values,	however,	instead	of	mean	and	standard	deviation,	this
    function	uses	a	lower	band	and	higher	band	(both	inclusive)	for	the
    produced	random	values.	Here's	an	example:
         randu(rmat,	lowBand,	highBand);
	
The	search	and	locate	functions
When	working	on	your	computer	vision	projects,	you'll	be	faced	with	countless
scenarios	and	cases	in	which	you'll	need	to	look	for	certain	pixels,	or	the
maximum	values	(brightest	point),	and	so	on.	The	OpenCV	library	contains	a
number	of	functions	that	can	be	used	for	this	same	purpose,	and	they	are	the
subject	of	this	section.
                                       	
Locating	non-zero	elements
	
Here	is	an	example	that	depicts	how	you	can	use	the	findNonZero	function	to	find
the	first	non-black	pixel	in	a	grayscale	image	and	print	its	position:	Mat	image	=
imread("Test.png",	IMREAD_GRAYSCALE);	Mat	result;
vector<Point> idx;
findNonZero(image, idx);
if(idx.size() > 0)
Here	is	another	example	code	that	shows	how	you	can	find	the	percentage	of
black	pixels	in	a	grayscale	image:	Mat	image	=	imread("Test.png",
IMREAD_GRAYSCALE);	Mat	result;
	
Locating	minimum	and	maximum
elements
	
     minMaxIdx
     minMaxLoc
The	minMaxLoc	function	searches	the	whole	image	(only	single-channel	images)	for
the	brightest	and	darkest	points	and	returns	the	value	of	the	brightest	and	darkest
pixel,	along	with	their	positions,	whereas	the	minMaxIdx	function	returns	a	pointer
to	the	found	minimum	and	maximum	position,	instead	of	a	position	(the	Point
object	with	x	and	y).	Here's	how	the	minMaxLoc	function	is	used:	double	minVal,
maxVal;	Point	minLoc,	maxLoc;
This	is	an	example	of	using	the	minMaxIdx	function:	double	minVal,	maxVal;	int
minIdx,	maxIdx;
	
Lookup	table	transformation
In	computer	vision,	replacing	pixels	with	new	values	based	on	their	value	from	a
given	table	that	contains	the	required	replacement	is	called	lookup	table
transformation.	This	might	sound	a	bit	confusing	at	first	but	it's	a	very	powerful
and	simple	method	for	modifying	images	using	lookup	tables.	Let's	see	how	it's
done	with	a	hands-on	example.
Let's	assume	we	have	an	example	image	and	we	need	to	replace	the	bright	pixels
(with	a	value	greater	than	175)	with	absolute	white,	dark	pixels	(with	a	value	less
than	125)	with	absolute	black,	and	leave	the	rest	of	the	pixels	as	they	are.	To
perform	such	a	task,	we	can	simply	use	a	lookup	table	Mat	object	along	with	the
LUT	function	in	OpenCV:
The	following	image	depicts	the	result	of	this	lookup	table	transformation	when
it	is	executed	on	our	example	image.	As	you	can	see,	the	lookup	table
transformation	can	be	executed	on	both	color	(on	the	right)	and	grayscale	(on	the
left)	images:
Using	the	LUT	function	wisely	can	lead	to	many	creative	solutions	for	computer
vision	problems	where	replacing	pixels	with	a	given	value	is	desired.
Summary
	
What	we	saw	in	this	chapter	was	a	taste	of	what	can	be	done	using	matrix	and
array	operations	in	computer	vision	with	the	OpenCV	library.	We	started	the
chapter	with	nothing	but	a	firm	background	in	the	fundamental	concepts	of
computer	vision	from	the	previous	chapters,	and	we	ended	up	learning	about	lots
of	algorithms	and	functions	using	hands-on	examples.	We	learned	about	ones,
zeroes,	identity	matrix,	transpose,	and	other	functions	embedded	into	the	heart	of
the	Mat	class.	Then,	we	continued	on	to	learn	about	many	element-wise
algorithms.	We	are	now	totally	capable	of	performing	element-wise	matrix
operations,	such	as	basic	operations,	comparisons,	and	bitwise	operations.
Finally,	we	learned	about	the	non-element-wise	operations	in	the	OpenCV	core
module.	Making	borders,	modifying	channels,	and	discrete	Fourier
transformation	were	among	the	many	algorithms	we	learned	to	use	in	this
chapter.
In	the	next	chapter,	we're	going	to	learn	about	computer	vision	algorithms	used
for	filtering,	drawing,	and	other	functions.	Topics	covered	in	the	upcoming
chapter	are	essential	before	you	step	into	the	world	of	advanced	computer	vision
development	and	algorithms	used	for	highly	complex	tasks,	such	as	face	or
object	detection	and	tracking.
	
Questions
	
	
Drawing,	Filtering,	and
Transformation
	
We	started	the	previous	chapter	with	nothing	but	the	basics	and	fundamental
concepts	used	in	computer	vision,	and	we	ended	up	learning	about	many
algorithms	and	functions	used	to	perform	a	wide	range	of	operations	on	matrices
and	images.	First,	we	learned	all	about	the	functions	that	are	embedded	into	the
Mat	class	for	convenience,	such	as	cloning	(or	getting	a	full	and	independent
copy)	a	matrix,	calculating	the	cross	and	the	dot	product	of	two	matrices,	getting
the	transpose	or	inverse	of	a	matrix,	and	producing	an	identity	matrix.	Then	we
moved	on	to	learn	about	various	element-wise	operations	in	OpenCV.	Element-
wise	operations,	as	we	already	know,	are	parallelizable	algorithms	that	perform
the	same	process	on	all	individual	pixels	(or	elements)	of	an	image.	During	the
process,	we	also	experimented	with	the	effect	of	such	operations	on	actual	image
files.	We	completed	the	previous	chapter	by	learning	about	operations	and
functions	that	treat	images	as	a	whole,	unlike	element-wise	operations,	but	they
are	still	considered	matrix	operations	in	terms	of	computer	vision.
Now,	we're	ready	to	dig	even	deeper	and	learn	about	numerous	powerful
algorithms	that	are	used	in	computer	vision	applications	for	tasks	such	as
drawing,	filtering,	and	the	transformation	of	images.	As	mentioned	in	the
previous	chapters,	these	categories	of	algorithms	as	a	whole	are	considered
image-processing	algorithms.	In	this	chapter,	we're	going	to	start	by	learning
about	drawing	shapes	and	text	on	empty	images	(similar	to	canvases)	or	existing
images	and	video	frames.	The	examples	in	the	first	section	of	this	chapter	will
also	include	a	tutorial	about	adding	trackbars	to	OpenCV	windows,	for	easily
adjusting	required	parameters.	After	that,	we'll	move	on	to	learn	about	image
filtering	techniques,	such	as	blurring	an	image,	dilation,	and	erosion.	The
filtering	algorithms	and	functions	that	we'll	learn	about	in	this	chapter	include
many	popular	and	widely	used	algorithms,	especially	by	professional	photo-
editing	applications.	This	chapter	will	include	a	comprehensive	section	about
image-transformation	algorithms,	including	algorithms	such	as	the	simple
resizing	of	photos,	or	complex	algorithms	such	as	remapping	pixels.	We'll	end
this	chapter	by	learning	how	to	apply	colormaps	to	images	to	transform	their
colors.
	
Technical	requirements
	
Refer	to	Chapter	2,	Getting	Started	with	OpenCV,	for	more	information	about	how
to	set	up	a	personal	computer	and	make	it	ready	for	developing	computer	vision
applications	using	the	OpenCV	library.
You	can	use	the	following	URL	to	download	the	source	codes	and	examples	for
this	chapter:
https://github.com/PacktPublishing/Hands-On-Algorithms-for-Computer-Vision/tree/master/Cha
pter04
	
Drawing	on	images
Without	a	doubt,	one	of	the	most	important	tasks	when	developing	computer
vision	applications	is	drawing	on	images.	Imagine	you	want	to	print	the
timestamp	on	pictures	or	draw	a	rectangle	or	ellipse	around	some	areas	in	an
image,	and	many	similar	examples	that	would	require	you	to	draw	text	and	digits
on	images	or	shapes	(rectangles	and	so	on).	As	you	can	see,	the	examples	that
can	be	pointed	out	are	quite	obvious	and	countless,	so	without	further	ado,	let's
start	with	the	functions	and	algorithms	in	OpenCV	that	can	be	used	for	drawing.
                                        	
Printing	text	on	images
OpenCV	contains	a	very	easy-to-use	function	named	putText	to	draw,	or	print,
text	on	images.	This	function	requires	an	image	as	the	input/output	parameter,
which	means	the	source	image	itself	will	be	updated.	So,	be	sure	to	make	a	copy
of	your	original	image	in	memory	before	calling	this	function.	You	also	need	to
provide	this	function	with	an	origin	point,	which	is	simply	the	point	where	the
text	will	be	printed.	The	font	of	the	text	must	be	one	of	the	entries	in	the
HersheyFonts	enum,	which	can	take	one	(or	a	combination)	of	the	following	values:
     FONT_HERSHEY_SIMPLEX
     FONT_HERSHEY_PLAIN
     FONT_HERSHEY_DUPLEX
     FONT_HERSHEY_COMPLEX
     FONT_HERSHEY_TRIPLEX
     FONT_HERSHEY_COMPLEX_SMALL
     FONT_HERSHEY_SCRIPT_SIMPLEX
     FONT_HERSHEY_SCRIPT_COMPLEX
     FONT_ITALIC
For	details	of	how	each	entry	looks	when	printed,	you	can	check	out	OpenCV	or
simply	search	online	for	more	information	about	Hershey	fonts.
Apart	from	the	parameters	we	just	mentioned,	you	also	need	a	few	additional
parameters,	such	as	the	scale,	color,	thickness,	and	line	type	of	the	text.	Let's
break	them	all	down	with	a	simple	example.
Here's	an	example	code	that	demonstrates	the	usage	of	the	putText	function:
      string	text	=	"www.amin-ahmadi.com";	
      int	offset	=	25;	
      Point	origin(offset,	image.rows	-	offset);	
      HersheyFonts	fontFace	=	FONT_HERSHEY_COMPLEX;	
      double	fontScale	=	1.5;	
      Scalar	color(0,	242,	255);	
      int	thickness	=	2;	
      LineTypes	lineType	=	LINE_AA;	
      bool	bottomLeftOrigin	=	false;	
      	
      putText(image,	
      								text,	
     								origin,	
     								fontFace,	
     								fontScale,	
     								color,	
     								thickness,	
     								lineType,	
     								bottomLeftOrigin);	
When	executed	on	our	example	picture	from	the	previous	chapters,	the	following
result	will	be	created:
LINE_4	stands	for	four-connected	line	types,	and	LINE_8	stands	for	eight-
connected	line	types.	LINE_AA	though,	which	is	the	anti-aliased	line	type,	is
slower	to	draw	than	the	other	two,	but,	as	can	be	seen	in	the	preceding	diagram,
it	also	provides	a	much	better	quality.
        The	LineTypes	enum	also	includes	a	FILLED	entry,	which	is	used	for	filling	the	shapes	drawn	on
        an	image	with	the	given	color.	It's	important	to	note	that	almost	all	drawing	functions	in
        OpenCV	(not	just	putText)	require	a	line	type	parameter.
OpenCV	provides	two	more	functions	that	are	related	to	handling	texts,	but	not
exactly	for	drawing	them.	The	first	one	is	called	getFontScaleFromHeight,	which	is
used	to	get	the	required	scale	value	based	on	a	font	type,	height	(in	pixels),	and
thickness.	Here's	an	example:
     double	fontScale	=	getFontScaleFromHeight(fontFace,	
     																											50,	//	pixels	for	height	
     																											thickness);	
We	could	have	used	the	preceding	code	instead	of	providing	a	constant	value	for
scale	in	our	previous	example	usage	of	the	putText	function.	Obviously,	we	need
to	replace	50	with	any	desired	pixel	height	value	for	our	text.
This	means	the	text	will	need	216	by	30	pixels	of	space	to	be	printed	and	the
baseline	will	be	17	pixels	farther	from	the	bottom	of	the	text.
Drawing	shapes
You	can	use	a	set	of	very	simple	OpenCV	functions	to	draw	various	types	of
shapes	on	images.	These	functions	are	all	included	in	the	imgproc	module,	just	like
the	putText	function,	and	they	can	be	used	to	draw	markers,	lines,	arrowed	lines,
rectangles,	ellipses	and	circles,	polylines,	and	so	on	in	given	points.
Let's	start	with	the	drawMarker	function,	which	is	used	to	draw	a	marker	with	a
given	type	on	an	image.	Here's	how	this	function	is	used	to	print	a	marker	at	the
center	of	a	given	image:
      Point	position(image.cols/2,	
      															image.rows/2);	
      Scalar	color	=	Scalar::all(0);	
      MarkerTypes	markerType	=	MARKER_CROSS;	
      int	markerSize	=	25;	
      int	thickness	=	2;	
      int	lineType	=	LINE_AA;	
      drawMarker(image,	
      											position,	
      											color,	
      											markerType,	
      											markerSize,	
      											thickness,	
      											lineType);	
       	is	the	central	point	of	the	marker,	and	the	rest	of	the	parameters	are
position
almost	exactly	what	we	saw	in	the	putText	function	previously.	This	is	the	same
pattern	of	parameters	for	most	(if	not	all)	of	the	drawing	functions	in	OpenCV.
The	only	parameters	that	are	specific	to	the	drawMarker	function	are	markerSize,
which	is	simply	the	size	of	the	marker,	and	markerType,	which	can	take	one	of	the
following	values	from	the	MarkerTypes	enum:
     MARKER_CROSS
     MARKER_TILTED_CROSS
     MARKER_STAR
     MARKER_DIAMOND
     MARKER_SQUARE
     MARKER_TRIANGLE_UP
     MARKER_TRIANGLE_DOWN
The	following	diagram	depicts	all	possible	marker	types	mentioned	in	the
previous	list	when	they	are	printed	on	a	white	background,	from	left	to	right:
Drawing	lines	in	OpenCV	is	possible	using	the	line	function.	This	function
requires	two	points	and	it	will	draw	a	line	connecting	the	given	points.	Here's	an
example:
     Point	pt1(25,	image.rows/2);	
     Point	pt2(image.cols/2	-	25,	image.rows/2);	
     Scalar	color	=	Scalar(0,255,0);	
     int	thickness	=	5;	
     int	lineType	=	LINE_AA;	
     int	shift	=	0;	
     	
     line(image,	
     					pt1,	
     					pt2,	
     					color,	
     					thickness,	
     					lineType,	
     					shift);	
The	shift	parameter	corresponds	to	the	number	of	fractional	bits	in	the	given
points.	You	can	omit	or	simply	pass	zero	to	make	sure	it	has	no	effect	on	your
results.
Similar	to	the	line	function,	arrowedLine	can	be	used	to	draw	an	arrowed	line.
Obviously,	the	order	of	the	given	points	determines	the	direction	of	the	arrow.
The	only	parameter	that	this	function	needs	is	the	tipLength	parameter,	which
corresponds	to	the	percentage	of	the	line	length	that	will	be	used	to	create	the	tip
of	the	arrow.	Here	is	an	example:
     double	tipLength	=	0.2;	
     	
     arrowedLine(image,	
     												pt1,	
     												pt2,	
     												color,	
     												thickness,	
     												lineType,	
     												shift,	
     												tipLength);	
To	be	able	to	draw	a	circle	on	an	image,	we	can	use	the	circle	function.	Here's
how	this	function	is	used	to	draw	a	circle	right	at	the	center	of	an	image:
     Point	center(image.cols/2,	
     													image.rows/2);	
      int	radius	=	200;	
      circle(image,	
      							center,	
      							radius,	
      							color,	
      							thickness,	
      							lineType,	
      							shift);	
Apart	from	center	and	radius,	which	are	obviously	the	center	point	and	radius	of
the	circle,	the	rest	of	the	parameters	are	the	same	as	the	functions	and	examples
we	learned	about	in	this	section.
Instead	of	two	separate	Point	objects,	this	function	can	also	be	provided	with	a
single	Rect	object.	Here's	how:
      Rect	rect(pt1,pt2);	
      rectangle(image,	
      										color,	
      										thickness,	
      										lineType,	
      										shift);	
Similarly,	an	ellipse	can	be	drawn	by	using	the	ellipse	function.	This	function
requires	the	size	of	the	axes	along	with	the	angle	of	the	ellipse	to	be	provided.
Additionally,	you	can	use	start	and	ending	angles	to	draw	all	or	part	of	an	ellipse,
or,	in	others,	an	arc	instead	of	an	ellipse.	You	can	guess	that	passing	0	and	360	as
the	start	and	ending	angles	will	result	in	a	full	ellipse	being	drawn.	Here's	an
example:
      Size	axes(200,	100);	
      double	angle	=	20.0;	
      double	startAngle	=	0.0;	
      double	endAngle	=	360.0;	
      ellipse(image,	
      								center,	
      								axes,	
      								angle,	
      								startAngle,	
      								endAngle,	
      								color,	
      								thickness,	
      								lineType,	
      								shift);
Another	way	of	calling	the	ellipse	function	is	by	using	a	RotatedRect	object.	In	this
version	of	the	same	function,	you	must	first	create	a	RotatedRect	with	a	given
width	and	height	(or,	in	other	words,	size)	and	an	angle,	and	then	call	the	ellipse
function	as	seen	here:
      Size	size(150,	300);	
      double	angle	=	45.0;	
      RotatedRect	rotRect(center,	
      																				axes,	
      																				angle);	
      ellipse(image,	
      								rotRect,	
      								color,	
      								thickness,	
      								lineType);	
Note	that	with	this	method,	you	can't	draw	an	arc,	and	this	is	only	used	for
drawing	full	ellipses.
We're	down	to	the	last	type	of	shapes	that	can	be	drawn	using	OpenCV	drawing
functions,	and	that	is	polyline	shapes.	You	can	draw	polyline	shapes	by	using	the
polylines	function.	You	must	make	sure	to	create	a	vector	of	points	that
corresponds	to	the	vertices	required	for	drawing	a	polyline.	Here's	an	example:
      vector<Point>	pts;	
      pts.push_back(Point(100,	100));	
      pts.push_back(Point(50,	150));	
      pts.push_back(Point(50,	200));	
      pts.push_back(Point(150,	200));	
      pts.push_back(Point(150,	150));	
      bool	isClosed	=	true;	
      polylines(image,	
      										pts,	
      										isClosed,	
      										color,	
      										thickness,	
      										lineType,	
      										shift);	
The	isClosed	parameter	is	used	to	determine	whether	the	polyline	must	be	closed,
by	connecting	the	last	vertex	to	the	first	one	or	not.
The	following	image	depicts	the	result	of	the	arrowedLine,	circle,	rectangle,	and
polylines	functions	that	we	used	in	the	preceding	code	snippets,	when	they	are
Before	proceeding	with	the	next	section	and	learning	about	algorithms	used	for
image	filtering,	we're	going	to	learn	about	adjusting	parameters	at	runtime	by
using	trackbars	added	to	OpenCV	display	windows.	This	is	a	very	useful	method
in	order	to	try	out	a	wide	range	of	different	parameters	at	runtime	when	you	are
experimenting	with	different	values	to	see	their	effect,	and	it	allows	the	value	of
a	variable	to	be	changed	by	simply	readjusting	a	trackbar	(or	slider)	position.
Let's	first	see	an	example	and	then	further	break	down	the	code	and	learn	how
trackbars	are	handled	using	OpenCV	functions.	The	following	complete	example
shows	how	we	can	use	a	trackbar	to	adjust	the	radius	of	a	circle	drawn	on	an
image	at	runtime:
  string	window	=	"Image";	//	Title	of	the	image	output	window
  string	trackbar	=	"Radius";	//	Label	of	the	trackbar
  Mat	image	=	imread("Test.png");	
  Point	center(image.cols/2,	image.rows/2);	//	A	Point	object	that	points	to	the	center	
  of	the	image
  int	radius	=	25;
  Scalar	color	=	Scalar(0,	255,	0);	//	Green	color	in	BGR	(OpenCV	default)	color	space
  int	thickness	=	2;	LineTypes	lineType	=	LINE_AA;	int	shift	=	0;	
In	the	preceding	code,	window	and	trackbar	are	string	objects	that	are	used	to
identify	and	access	a	specific	trackbar	on	a	specific	window.	image	is	a	Mat	object
containing	the	source	image.	center,	radius,	color,	thickness,	lineType,	and	shift	are
parameters	required	for	drawing	a	circle,	as	we	learned	previously	in	this
chapter.	drawCircle	is	the	name	of	the	function	(the	callback	function,	to	be
precise)	that	will	be	called	back	when	the	trackbar	is	used	to	update	the	radius
value	for	the	circle	we	want	to	draw.	This	function	must	have	the	signature	that
is	used	in	this	example,	which	has	an	int	and	a	void	pointer	as	its	parameters.	This
function	is	quite	simple;	it	just	clones	the	original	image,	draws	a	circle	on	it,
and	then	displays	it.
The	main	function	is	where	we	actually	create	the	window	and	the	trackbar.	First,
the	namedWindow	function	must	be	called	to	create	a	window	with	the	window	name
that	we	want.	The	createTrackbar	function	can	then	be	called,	as	seen	in	the
example,	to	create	a	trackbar	on	that	window.	Note	that	the	trackbar	itself	has	a
name	that	is	used	to	access	it.	This	name	will	also	be	printed	next	to	the	trackbar
to	display	its	purpose	to	the	user	when	the	application	is	running.	setTrackbarMin
and	setTrackbarMax	are	called	to	make	sure	our	trackbar	doesn't	allow	radius	values
less	than	25	or	greater	than	the	width	or	height	of	the	image	(whichever	is
smaller),	divided	by	2	(since	we're	talking	radius,	not	diameter).
The	following	is	a	screenshot	that	demonstrates	the	output	of	our	window	along
with	trackbar	on	it	that	can	be	used	to	adjust	the	radius	of	the	circle:
Try	adjusting	to	see	for	yourself	how	the	radius	of	the	circle	changes	according
to	the	position	of	the	trackbar.	Make	sure	to	use	this	method	when	you	want	to
experiment	with	the	parameters	of	a	function	or	algorithms	that	you	learn	in	this
book.	Note	that	you	can	add	as	many	trackbars	as	you	need.	However,	adding
more	trackbars	will	use	more	space	on	your	window,	which	might	lead	to	a	poor
user	interface	and	experience	and,	consequently,	a	hard-to-use	program	instead
of	a	simplified	one,	so	try	to	make	use	of	trackbars	wisely.
Filtering	images
	
It	really	doesn't	matter	whether	you	are	trying	to	build	a	computer	vision
application	to	perform	highly	complex	tasks,	such	as	object	detection	in	real-
time,	or	simply	modifying	an	input	image	in	one	way	or	another.	Almost
inevitably,	you'll	have	to	apply	a	certain	type	of	filter	to	your	input	or	output
images.	The	reason	for	this	is	quite	simple—not	all	pictures	are	ready	to	be
processed	out	of	the	box,	and	most	of	the	time,	applying	filters	to	make	images
smoother	is	one	of	the	ways	to	make	sure	they	can	be	fed	into	our	algorithms.
The	variety	of	filters	that	you	can	apply	to	your	images	in	computer	vision	is
huge,	but	in	this	section,	we're	going	to	be	learning	about	some	of	the	most
important	filters	and	especially	the	ones	with	an	implementation	in	the	OpenCV
library	that	we	can	use.
	
Blurring/smoothening	filters
Blurring	an	image	is	one	of	the	most	important	image	filtering	tasks,	and	there
are	many	algorithms	that	can	perform	it,	each	one	with	their	own	pros	and	cons,
which	we'll	be	talking	about	in	this	section.
Let's	start	with	the	simplest	filter	used	for	smoothening	images,	which	is	called
the	median	filter,	and	it	can	be	done	by	using	the	medianBlur	function,	as	seen	in
the	following	example:
     int	ksize	=	5;	//	must	be	odd	
     medianBlur(image,	result,	ksize);	
This	filter	simply	finds	the	median	value	of	the	neighboring	pixels	of	each	pixel
in	an	image.	ksize,	or	the	kernel	size	parameter,	decides	how	big	the	kernel	used
for	blurring	is,	or,	in	other	words,	how	far	the	neighboring	pixels	will	be
considered	in	the	blurring	algorithm.	The	following	image	depicts	the	result	of
increasing	the	kernel	size	from	1	to	7:
Note	that	the	kernel	size	must	be	an	odd	value,	and	a	kernel	size	of	1	will
produce	the	exact	same	image	as	the	input.	So,	in	the	previous	images,	you	can
see	the	increase	in	blurring	level	from	the	original	image	(the	leftmost)	until	the
kernel	size	of	7.
Note	that	you	can	use	extremely	high	kernel	sizes	if	you	need,	but	it	is	rarely
necessary	and	would	usually	be	needed	in	case	you	need	to	de-noise	an
extremely	noisy	image.	Here's	an	example	image	depicting	the	result	of	a	kernel
size	of	21:
Another	method	of	blurring	an	image	is	by	using	the	boxFilter	function.	Let's	see
how	it's	done	with	an	example	code	and	then	break	it	down	further	to	better
understand	its	behavior:
     int	ddepth	=	-1;	
     Size	ksize(7,7);	
     Point	anchor(-1,	-1);	
     bool	normalize	=	true;	
     BorderTypes	borderType	=	BORDER_DEFAULT;	
     	
     boxFilter(image,	
     										result,	
     										ddepth,	
     										ksize,	
     										anchor,	
     										normalize,	
     										borderType);	
We	can	perform	the	exact	same	task,	in	other	words,	the	normalized	box	filter,
by	using	the	blur	function,	as	seen	in	the	following	example	code:
  Size	ksize(7,7);	
  Point	anchor(-1,	-1);	
  BorderTypes	borderType	=	BORDER_DEFAULT;	
  	
  blur(image,	
  					result,	
  					ksize,	
  					anchor,	
  					borderType);	
The	result	of	this	sample	code	is	exactly	the	same	as	calling	boxFilter,	as	we	saw
previously.	The	obvious	difference	here	is	that	this	function	does	not	allow	us	to
change	the	depth	of	the	result	and	applies	normalization	by	default.
In	addition	to	the	standard	box	filter,	you	can	also	apply	a	square	box	filter	with
the	sqrBoxFilter	function.	In	this	method,	instead	of	calculating	the	sum	of	the
neighboring	pixels,	the	sum	of	their	square	values	is	calculated.	Here	is	an
example,	which	is	incredibly	similar	to	calling	the	boxFilter	function:
     int	ddepth	=	-1;	
     Size	ksize(7,7);	
     Point	anchor(-1,	-1);	
     bool	normalize	=	true;	
     BorderTypes	borderType	=	BORDER_DEFAULT;	
     	
     sqrBoxFilter(image,	
     													result,	
     													ddepth,	
     													ksize,	
     													anchor,	
     													normalize,	
     													borderType);	
        The	unnormalized	versions	of	the	boxFilter	and	sqrBoxFilter	functions	can	also	be	used	to	find
        statistical	information	about	the	neighboring	areas	of	all	pixels	in	an	image,	and	their	use
        case	is	not	limited	to	just	blurring	an	image.
One	of	the	most	popular	blurring	methods	in	computer	vision	is	the	Gaussian
blur	algorithm,	and	it	can	be	performed	in	OpenCV	by	using	the	GaussianBlur
function.	This	function,	similar	to	the	previous	blur	functions	that	we	learned
about,	requires	a	kernel	size,	along	with	standard	deviation	values	in	the	X	and	Y
direction,	called	sigmaX	and	sigmaY,	respectively.	Here's	an	example	of	how	this
function	is	used:
     Size	ksize(7,7);	
     double	sigmaX	=	1.25;	
     double	sigmaY	=	0.0;	
     BorderTypes	borderType	=	BORDER_DEFAULT;	
     	
     GaussianBlur(image,	
     													result,	
     													ksize,	
     													sigmaX,	
     													sigmaY,	
     													borderType);	
Note	that	a	value	of	zero	for	sigmaY	means	the	value	of	sigmaX	will	also	be	used	in
the	Y	direction.	For	more	information	about	Gaussian	blur,	you	can	read	about
the	Gaussian	function	in	general	and	the	GaussianBlur	function	in	the	OpenCV
documentation	pages.
The	last	smoothening	filter	that	we'll	learn	about	in	this	section	is	called	the
bilateral	filter,	and	it	can	be	achieved	by	using	the	bilateralFilter	function.
Bilateral	filtering	is	a	powerful	method	of	de-noising	and	smoothening	an	image,
while	preserving	edges.	This	function	is	also	much	slower	and	more	CPU-
intensive	in	comparison	with	the	blur	algorithms	that	we	saw	previously.	Let's
see	how	bilateralFilter	is	used	with	an	example	and	then	break	down	the	required
parameters:
     int	d	=	9;	
     double	sigmaColor	=	250.0;	
     double	sigmaSpace	=	200.0;	
     BorderTypes	borderType	=	BORDER_DEFAULT;	
     	
     bilateralFilter(image,	
     																result,	
     																d,	
     																sigmaColor,	
     																sigmaSpace,	
     																borderType);	
,	or	the	filter	size,	is	the	diameter	of	the	pixel	neighborhood	that	will	participate
d
in	the	filter.	The	sigmaColor	and	sigmaSpace	values	are	both	used	to	define	the	effect
of	the	color	and	coordinate	the	pixels	nearer	or	farther	from	the	pixel	whose
filtered	value	is	being	calculated.	Here's	a	screenshot	that	demonstrates	the	effect
of	the	bilateralFilter	function	when	executed	on	our	example	image:
Morphological	filters
Similar	to	smoothening	filters,	morphology	filters	are	the	algorithms	that	change
the	value	of	each	pixel	based	on	the	value	of	the	neighboring	pixels,	although	the
obvious	difference	is	that	they	do	not	have	a	blurring	effect	and	are	mostly	used
to	produce	some	form	of	erosion	or	dilation	effect	on	images.	This	will	be
clarified	with	a	few	hands-on	examples	further	in	this	section,	but	for	now,	let's
see	how	morphological	operations	(also	called	transformations)	are	performed
using	OpenCV	functions.
op,	or	the	operation,	is	an	entry	from	the	MorphTypes	enum,	as	mentioned	before.
kernel,	or	the	structuring	element,	is	the	kernel	matrix	used	for	the	morphological
shape morph, kernel size (ksize), and anchor. shape can be a rectangle, cross, or
ellipse,	and	is	simply	an	entry	from	the	MorphShapes	enum.	The	iterations	variable
refers	to	the	number	of	times	the	morphological	operation	is	performed	on	an
image.	borderType	is	interpreted	exactly	the	same	way	as	with	all	the	functions
we've	seen	so	far	for	the	extrapolation	of	pixels.	In	case	a	constant	border	type
value	is	used,	the	morphologyEx	function	must	also	be	provided	with	a	border	value,
which	can	be	retrieved	by	using	the	morphologyDefaultBorderValue	function,	or
specified	manually.
Here	is	the	result	of	our	preceding	code	(for	erosion),	when	executed	on	our
sample	image:
Dilation,	on	the	other	hand,	is	performed	by	simply	replacing	the	op	value	with
MORPH_DILATE	in	the	previous	example.	Here's	the	result	of	the	dilation	operation:
A	highly	simplified	description	of	what	erosion	and	dilation	operations	do	is	that
they	cause	the	neighboring	pixels	of	darker	pixels	to	become	darker	(in	case	of
erosion)	or	the	neighboring	pixels	of	brighter	pixels	to	become	brighter	(in	case
of	dilations),	which,	after	more	iterations,	will	cause	a	stronger	and	easily	visible
effect.
As	was	mentioned	before,	all	of	the	other	morphological	operations	are	simply	a
combination	of	erosion	and	dilation.	Here	are	the	results	of	opening	and	closing
operations	when	they	are	executed	on	our	example	images:
Make	sure	you	try	the	rest	of	the	morphological	operations	by	yourself	to	see
their	effects.	Create	user	interfaces	that	have	trackbars	on	them	and	try	changing
the	value	of	iteration	and	other	parameters	to	see	how	they	affect	the	result	of
morphological	operations.	If	used	carefully	and	wisely,	morphological
operations	can	have	very	interesting	results	and	further	simplify	an	operation
that	you	want	to	perform	on	an	image.
Before	proceeding	to	the	next	section,	it	is	worth	noting	that	you	can	also	use	the
erode	and	dilate	functions	to	perform	exactly	the	same	operation.	Although	these
Let's	start	with	the	Sobel	function,	which	is	used	to	calculate	the	derivative	of	an
image	using	a	Sobel	operator.	Here's	how	this	function	is	used	in	practice:
  int	ddepth	=	-1;	
  int	dx	=	1;	
  int	dy	=	1;	
  int	ksize	=	5;	
  double	scale	=	0.3;	
  double	delta	=	0.0;	
  	
  BorderTypes	borderType	=	BORDER_DEFAULT;	
  Sobel(image,	
  						result,	
  						ddepth,	
  						dx,	
  						dy,	
  						ksize,	
  						scale,	
  						delta,	
  						borderType);	
The	following	images	depict	the	result	of	the	Sobel	function	when	called	with	the
parameter	values	from	the	preceding	example	code,	but	for	a	delta	value	of	zero
(on	the	left)	and	255	(on	the	right):
Try	setting	different	delta	and	scale	values	and	experiment	with	the	results.	Also
try	different	derivative	orders	and	see	the	effects	for	yourself.	As	you	can	see
from	the	preceding	output	images,	calculating	the	derivative	of	an	image	is	a
method	for	calculating	the	edges	in	an	image.
You	can	also	use	the	spatialGradient	function	to	calculate	the	first-order	derivative
of	an	image	in	both	the	X	and	Y	directions	using	the	Sobel	operator,	at	the	same
time.	In	other	words,	calling	spatialGradient	once	is	like	calling	the	Sobel	function
twice,	for	the	first-order	derivative	in	both	directions.	Here's	an	example:
      Mat	resultDX,	resultDY;	
      int	ksize	=	3;	
      BorderTypes	borderType	=	BORDER_DEFAULT;	
      spatialGradient(image,	
      																resultDX,	
      																resultDY,	
      																ksize,	
      																borderType);	
Note	that	the	ksize	parameter	must	be	3	and	the	input	image	type	must	be
grayscale,	otherwise	this	function	will	fail	to	execute,	although	this	might
change	in	an	upcoming	OpenCV	version.
Similar	to	the	way	we	used	the	Sobel	function,	you	can	use	the	Laplacian	function
to	calculate	the	Laplacian	of	an	image.	It's	important	to	note	that	this	function
essentially	sums	up	the	second	derivatives	in	the	X	and	Y	directions	calculated
using	the	Sobel	operator.	Here's	an	example	that	demonstrates	the	usage	of	the
Laplacian	function:
All	of	the	parameters	used	in	the	Laplacian	function	are	already	described	in
previous	examples,	and	especially	the	Sobel	function.
Arbitrary	filtering
	
filter2D(image,
result,
ddepth,
kernel,
    									anchor,	
    									delta,	
borderType);
Here	is	the	result	of	this	arbitrary	filter.	You	can	see	the	original	image	on	the
left,	and	the	result	of	the	filtering	operation	on	the	right-hand	side:
There	is	absolutely	no	limit	to	the	number	of	possible	filters	you	can	create	and
use	using	the	filter2D	function.	Make	sure	you	try	different	kernel	matrices	and
experiment	with	the	filter2D	function.	You	can	also	search	online	for	popular
filter	kernel	matrices	and	apply	them	using	the	filter2D	function.
	
Transforming	images
In	this	section,	we'll	learn	about	computer	vision	algorithms	that	are	used	to
transform	images	in	one	way	or	another.	The	algorithms	that	we'll	be	learning	in
this	section	cover	algorithms	that	change	the	content	of	an	image,	or	the	way	its
contents	is	interpreted.
Thresholding	algorithms
Thresholding	algorithms	are	used	to	apply	a	threshold	value	to	the	pixels	of	an
image.	These	algorithms	can	be	used	to	effectively	create	masks	from	images
that	have	possible	regions	or	pixels	of	interest	in	them	that	pass	a	certain
threshold	value.
You	can	use	the	threshold	function	to	apply	a	threshold	value	on	all	pixels	of	an
image.	The	threshold	function	must	be	provided	with	the	type	of	threshold	that	is
desired,	and	it	can	be	an	entry	from	the	ThresholdTypes	enum.	The	following	is	an
example	that	can	be	used	to	find	the	brightest	areas	in	an	image	using	the
threshold	function:
thresh	is	the	minimum	threshold	value,	and	maxval	is	the	maximum	allowed	value.
Simply	put,	all	pixel	values	between	thresh	and	maxval	are	allowed	to	pass	and	so
the	result	is	created.	The	following	is	the	result	of	the	preceding	threshold
operation	demonstrated	with	an	example	image:
Increase	the	thresh	parameter	(or	the	threshold	value)	and	you'll	notice	that	fewer
pixels	are	allowed	to	pass.	Setting	a	correct	threshold	value	requires	experience
and	knowledge	about	the	scene.	In	some	cases,	however,	you	can	develop
programs	that	set	the	threshold	automatically	for	you,	or	adaptively.	Note	that
the	threshold	type	completely	affects	the	result	of	the	threshold	function.	For
instance,	THRESH_BINARY_INV	will	produce	the	inverted	result	of	THRESH_BINARY	and	so
on.	Make	sure	to	try	different	threshold	types	and	experiment	for	yourself	with
this	interesting	and	powerful	function.
     ADAPTIVE_THRESH_MEAN_C
     ADAPTIVE_THRESH_GAUSSIAN_C
The	higher	the	blockSize	parameter	value,	the	more	pixels	are	used	in	the	adaptive
threshold	method.	The	following	is	an	example	image	that	depicts	the	result	of	the
adaptiveThreshold	method	called	with	the	values	in	the	preceding	example	code:
Color	space	and	type	conversion
Converting	various	color	spaces	and	types	to	each	other	is	extremely	important,
especially	when	it	comes	to	dealing	with	images	from	different	device	types	or
intending	to	display	them	on	different	devices	and	formats.	Let's	see	what	this
means	with	a	very	simple	example.	You'll	need	grayscale	images	for	various
OpenCV	functions	and	computer	vision	algorithms,	while	some	will	require
RGB	color	images.	In	such	cases,	you	can	use	the	cvtColor	function	to	convert
between	various	color	spaces	and	formats.
code	can	take	a	conversion	code,	which	must	be	an	entry	from	the
ColorConversionCodes	enum.	The	following	are	some	examples	of	the	most	popular
       COLOR_BGR2RGB
       COLOR_RGB2GRAY
       COLOR_BGR2HSV
The	list	goes	on	and	on.	Make	sure	to	check	the	ColorConversionCodes	enum	for	all
possible	color-conversion	codes.
Geometric	transformation
This	section	is	dedicated	to	geometric	transformation	algorithms	and	OpenCV
functions.	It's	important	to	note	that	the	name,	Geometric	transformation,	is
based	on	the	fact	that	the	algorithms	falling	in	this	category	do	not	change	the
content	of	an	image,	they	simply	deform	the	existing	pixels,	while	using	an
extrapolation	and	interpolation	method	to	calculate	the	pixels	that	fall	outside	the
area	of	the	existing	pixels,	or	over	each	other,	respectively.
Let's	start	with	the	simplest	geometric	transformation	algorithm,	which	is	used
for	resizing	an	image.	You	can	use	the	resize	function	to	resize	an	image.	Here	is
how	this	function	is	used:
  Size	dsize(0,	0);	
  double	fx	=	1.8;	
  double	fy	=	0.3;	
  InterpolationFlags	interpolation	=	INTER_CUBIC;	
  resize(image,	
  							result,	
  							dsize,	
  							fx,	
  							fy,	
  							interpolation);	
If	the	dsize	parameter	is	set	to	a	non-zero	size,	the	fx	and	fy	parameters	are	used
to	scale	the	input	image.	Otherwise,	if	fx	and	fy	are	both	zero,	the	input	image	is
resized	to	the	given	dsize.	The	interpolation	parameter,	on	the	other	hand,	is	used
to	set	the	interpolation	method	used	for	the	resize	algorithm,	and	it	must	be	one	of
the	entries	in	the	InterpolationFlags	enum.	Here	are	some	of	the	possible	values	for
the	interpolation	parameter:
     INTER_NEAREST
     INTER_LINEAR
     INTER_CUBIC
     INTER_AREA
     INTER_LANCZOS4
Make	sure	to	check	out	the	OpenCV	documentation	pages	for	InterpolationFlags	to
find	out	about	the	details	of	each	possible	method.
The	following	image	depicts	the	result	of	the	previous	example	code	that	was
used	for	resizing	an	image:
The	remap	function	must	be	provided	with	two	mapping	matrices,	one	for	X	and
another	one	for	the	Y	direction.	Besides	that,	an	interpolation	and	extrapolation
(border	type)	method,	along	with	a	border	value	in	case	of	constant	border	type,
must	be	provided	to	the	remap	function.	Let's	first	see	how	this	function	is	called
and	then	try	a	couple	of	different	remaps.	Here's	an	example	that	shows	how	the
remap	function	is	called:
You	can	create	an	infinite	number	of	different	mappings	and	use	them	with	the
remap	function	to	resize,	flip,	warp,	and	do	many	other	transformations	on	images.
For	instance,	the	following	code	can	be	used	to	create	a	remapping	that	will
cause	a	vertical	flip	of	the	resulting	image:
  for(int	i=0;	i<image.rows;	i++)	
  				for(int	j=0;	j<image.cols;	j++)	
  				{	
  								mapX.at<float>(i,j)	=	j;	
  								mapY.at<float>(i,j)	=	image.rows-i;	
  				}	
Replace	the	code	in	the	preceding	for	loop	with	the	following	and	the	result	of
the	remap	function	call	will	be	a	horizontally	flipped	image:
  mapX.at<float>(i,j)	=	image.cols	-	j;	
  mapY.at<float>(i,j)	=	i;	
Besides	simple	flipping,	you	can	use	the	remap	function	to	perform	many
interesting	pixel	deformations.	Here's	an	example:
  Point2f	center(image.cols/2,image.rows/2);	
  for(int	i=0;	i<image.rows;	i++)	
  				for(int	j=0;	j<image.cols;	j++)	
  				{	
  								//	find	i,j	in	the	standard	coordinates	
  								double	x	=	j	-	center.x;	
  								double	y	=	i	-	center.y;	
  	
  								//	Perform	any	mapping	for	X	and	Y	
  								x	=	x*x/750;	
  								y	=	y;	
  	
  								//	convert	back	to	image	coordinates	
  								mapX.at<float>(i,j)	=	x	+	center.x;	
  								mapY.at<float>(i,j)	=	y	+	center.y;	
  				}	
As	can	be	seen	from	the	inline	comments	of	the	preceding	example	code,	it	is
common	to	convert	the	OpenCV	i	and	j	values	(row	and	column	number)	to	a
standard	coordinate	system	and	use	X	and	Y	with	known	mathematical	and
geometrical	functions,	and	then	convert	them	back	to	the	OpenCV	image
coordinates.	The	following	image	demonstrates	the	result	of	the	preceding
example	code:
The	remap	function	is	incredibly	powerful	and	efficient	as	long	as	the	calculation
of	mapX	and	mapY	are	handled	efficiently.	Make	sure	to	experiment	with	this
function	to	learn	about	more	remapping	possibilities.
OpenCV	contains	a	function	named	applyColorMap	that	can	be	used	to	either	apply
predefined	colormaps	or	custom	ones	created	by	the	user.	In	case	predefined
colormaps	are	used,	applyColorMap	must	be	provided	with	a	colormap	type,	which
must	be	an	entry	from	the	ColormapTypes	enum.	Here's	an	example:
  ColormapTypes	colormap	=	COLORMAP_JET;	
  applyColorMap(image,	
  														result,	
  														colormap);	
The	following	image	depicts	the	result	of	various	predefined	colormaps	that	can
be	applied	using	the	applyColorMap	function:
As	mentioned	earlier,	you	can	also	create	your	own	custom	colormaps.	You	just
need	to	make	sure	you	follow	the	instructions	for	creating	a	colormap.	Your
colormap	must	have	a	size	of	256	elements	(a	Mat	object	with	256	rows	and	1
column)	and	it	must	contain	color	or	grayscale	values,	depending	on	the	type	of
image	you	are	planning	to	apply	the	colormap	to.	The	following	is	an	example
that	shows	how	to	create	a	custom	colormap	by	simply	inverting	the	green
channel	color:
  Mat	userColor(256,	1,	CV_8UC3);	
  for(int	i=0;	i<=255;	i++)	
  				userColor.at<Vec3b>(i,0)	=	Vec3b(i,	255-i,	i);	
  applyColorMap(image,	
  														result,	
  														userColor);	
We	started	this	chapter	by	learning	about	drawing	functions	that	can	be	used	to
draw	shapes	and	text	on	images.	Then	we	moved	on	to	learn	about	one	of	the
most	important	computer	vision	topics:	image	filtering.	We	learned	how	to	use
smoothening	algorithms,	we	experimented	with	morphological	filters,	and
learned	about	erosion,	dilation,	opening,	and	closing.	We	also	got	to	try	some
simple	edge	detection	algorithms,	or,	in	other	words,	derivative-based	filters.	We
learned	about	thresholding	images	and	changing	their	color	spaces	and	types.
The	final	sections	of	this	chapter	introduced	us	to	geometric	transformations	and
applying	colormaps	on	images.	We	even	created	a	custom	colormap	of	our	own.
You	can	easily	find	the	trace	of	the	algorithms	that	we	learned	in	this	chapter,	in
many	professional	photo-editing	or	social-networking	and	photo-sharing
applications.
In	the	next	chapter,	we'll	learn	all	about	histograms,	how	they	are	calculated,	and
how	they	are	used	in	computer	vision.	We'll	cover	the	algorithms	that	are	crucial
when	working	on	certain	object	detection	and	tracking	algorithms	that	we'll	be
exploring	in	the	upcoming	chapters.
Questions
	
    1.	 Write	a	program	that	draws	a	cross	mark	over	the	whole	image,	with	a
        thickness	of	3	pixels	and	in	the	color	red.
    2.	 Create	a	window	with	a	trackbar	to	change	the	ksize	of	a	medianBlur	function.
        The	possible	range	for	the	kszise	value	should	be	between	3	and	99.
    3.	 Perform	a	gradient	morphological	operation	on	an	image,	considering	a
        kernel	size	of	7	and	a	rectangular	morphological	shape	for	the	structuring
        element.
    4.	 Using	cvtColor,	convert	a	color	image	to	grayscale	and	make	sure	only	the
        darkest	100	shades	of	gray	are	filtered	out	using	the	threshold	function.
        Make	sure	that	filtered	pixels	are	set	to	white	in	the	result	image	and	the
        rest	of	the	pixels	are	set	to	black.
    5.	 Use	the	remap	function	to	resize	an	image	to	half	of	its	original	width	and
        height,	thus	preserving	the	aspect	ratio	of	the	original	image.	Use	a	default
        border	type	for	the	extrapolation.
    6.	 a)	Use	colormaps	to	convert	an	image	to	grayscale.	b)	Convert	an	image	to
        grayscale	and	invert	its	pixels	at	the	same	time.
    7.	 Did	you	read	about	perspective	transformation	functions?	Which	OpenCV
        function	covers	all	similar	transformations	in	one	single	function?
	
Back-Projection	and	Histograms
In	the	previous	chapter,	we	learned	about	many	computer	vision	algorithms	and
OpenCV	functions	that	can	be	used	to	prepare	images	for	further	processing	or
modify	them	in	one	way	or	another.	We	learned	how	to	draw	text	and	shapes	on
images,	filter	them	using	smoothening	algorithms,	perform	morphological
transformations	on	them,	and	calculate	their	derivatives.	We	also	learned	about
geometric	and	miscellaneous	transformations	of	images	and	applied	colormaps
to	alter	the	tone	of	our	images.
In	this	chapter,	we'll	be	learning	about	a	few	more	algorithms	and	functions	that
are	used	mostly	to	prepare	images	for	further	processing,	inference,	and
modification.	This	will	be	further	clarified	later	on	in	this	chapter,	after	we	learn
about	histograms	in	computer	vision.	We'll	be	introduced	to	the	concept	of
histograms,	and	then	we'll	learn	how	they	are	calculated	and	utilized	with	hands-
on	example	codes.	The	other	extremely	important	concept	we'll	be	learning
about	in	this	chapter	is	called	back-projection.	We'll	learn	how	back-projection
of	a	histogram	can	be	used	to	create	an	altered	version	of	the	original	image.
Besides	their	standard	usage,	the	concepts	and	algorithms	that	we'll	learn	in	this
chapter	are	also	essential	when	it	comes	to	dealing	with	some	of	the	most	widely
used	algorithms	for	object	detection	and	tracking,	which	we'll	be	learning	in	the
upcoming	chapters.
     Understanding	histograms
     Back	projection	of	histograms
     Histogram	comparison
     Equalizing	histograms
Technical	requirements
	
Refer	to	Chapter	2,	Getting	Started	with	OpenCV,	for	more	information	about	how
to	set	up	a	personal	computer	and	make	it	ready	to	develop	computer	vision
applications	using	the	OpenCV	library.
You	can	use	the	following	URL	to	download	the	source	codes	and	examples	for
this	chapter:
https://github.com/PacktPublishing/Hands-On-Algorithms-for-Computer-Vision/tree/master/Cha
pter05
	
Understanding	histograms
In	computer	vision,	histograms	are	simply	graphs	that	represent	the	distribution
of	pixel	values	over	the	possible	range	of	the	accepted	values	for	those	pixels,	or,
in	other	words,	the	probability	distribution	of	pixels.	Well,	this	might	not	be	as
crystal	clear	as	you	would	expect,	so	let's	take	single-channel	grayscale	images
as	a	simple	example	to	describe	what	histograms	are,	and	then	expand	it	to
multi-channel	colored	images,	and	so	on.	We	already	know	that	the	pixels	in	a
standard	grayscale	image	can	contain	values	between	0	and	255.	Considering
this	fact,	a	graph	similar	to	the	following,	which	depicts	the	ratio	of	the	number
of	pixels	containing	each	and	every	possible	grayscale	pixel	value	of	an	arbitrary
image,	is	simply	the	histogram	of	that	given	image:
Keeping	in	mind	what	we	just	learned,	it	can	be	easily	guessed	that	the
histogram	of	a	three-channel	image,	for	example,	would	be	three	graphs
representing	the	distribution	of	values	for	each	channel,	similar	to	what	we	just
saw	with	the	histogram	of	a	single-channel	grayscale	image.
You	can	use	the	calcHist	function	in	the	OpenCV	library	to	calculate	the
histogram	of	one	or	multiple	images	that	can	be	single-channel	or	multi-channel
themselves.	This	function	requires	a	number	of	parameters	that	must	be	provided
carefully	for	it	to	produce	the	desired	results.	Let's	see	how	this	function	is	used
with	a	few	examples.
    grayImg	is	the	input	grayscale	image	that	wants	to	calculate	its	histogram,	and
    histogram	will	contain	the	result.
    nimages	must	contain	the	number	of	images	for	which	we	want	histograms
    calculated,	which,	in	this	case,	is	just	one	image.
    channels	is	an	array	that	is	supposed	to	contain	the	zero-based	index	number
    of	the	channels	in	each	image	for	which	we	want	their	histogram	calculated.
    For	instance,	if	we	want	to	calculate	the	histogram	of	the	first,	second,	and
    fourth	channels	in	a	multi-channel	image,	the	channels	array	must	contain	the
    values	of	0,	1,	and	3.	In	our	example,	channels	only	contained	0,	since	we're
    calculating	the	histogram	of	the	only	channel	in	a	grayscale	image.
    mask,	which	is	common	to	many	other	OpenCV	functions,	is	a	parameter	that
    is	used	to	mask	(or	ignore)	certain	pixels,	or,	in	other	words,	prevent	them
    from	participating	in	the	calculated	result.	In	our	case,	and	as	long	as	we	are
    not	working	on	a	certain	portion	of	an	image,	mask	must	contain	an	empty
    matrix.
dims ,	or	the	dimensionality	parameters,	corresponds	to	the	dimensionality	of
the	result	histogram	that	we	are	calculating.	It	must	not	be	greater	than
CV_MAX_DIM,	which	is	32	in	current	OpenCV	versions.	We'll	be	using	1	in	most
cases,	since	we	expect	our	histogram	to	be	a	simple	array-shaped	matrix.
Consequently,	the	index	number	of	each	element	in	the	resulting	histogram
will	correspond	to	the	bin	number.
histSize	is	an	array	that	must	contain	the	size	of	the	histogram	in	each
dimension.	In	our	example,	since	the	dimensionality	was	1,	histSize	must
contain	a	single	value.	The	size	of	the	histogram,	in	this	case,	is	the	same	as
the	number	of	bins	in	a	histogram.	In	the	preceding	example	code,	bins	is
used	to	define	the	number	of	bins	in	the	histogram,	and	it	is	also	used	as	the
single	histSize	value.	Think	of	bins	as	the	number	of	groups	of	pixels	in	a
histogram.	This	will	be	further	clarified	with	examples	later	on,	but	for	now,
it	is	important	to	note	that	a	value	of	256	for	bins	will	result	in	a	histogram
containing	the	count	of	all	individual	possible	pixel	values.
ranges	must	contain	pairs	of	values	corresponding	to	the	lower	and	higher
bounds	of	each	range	of	possible	values	when	calculating	the	histogram	of
an	image.	In	our	example,	this	means	a	value	in	the	single	range	of	(0,	256),
which	is	what	we	have	provided	to	this	parameter.
The	uniform	parameter	is	used	to	define	the	uniformity	of	the	histogram.
Note	that	if	the	histogram	is	non-uniform,	as	opposed	to	what	is
demonstrated	in	our	example,	the	ranges	parameter	must	contain	the	lower
and	higher	bounds	of	all	dimensions,	respectively.
The	accumulate	parameter	is	used	to	decide	whether	the	histogram	should	be
cleared	before	it	is	calculated,	or	the	calculated	values	should	be	added	to
an	existing	histogram.	This	can	be	quite	useful	when	you	need	to	calculate	a
single	histogram	using	multiple	images.
   We'll	cover	the	parameters	mentioned	here	as	much	as	possible	in	the	examples	provided	in
   this	chapter.	However,	you	can	also	refer	to	the	online	documentation	of	the	calcHist	function
   for	more	information.
Displaying	histograms
Quite	obviously,	trying	to	display	the	resulting	histogram	using	a	function,	such
as	imshow,	is	futile	since	the	raw	format	of	the	stored	histogram	is	similar	to	a
single	column	matrix	that	has	bins	number	of	rows	in	it.	Each	row,	or,	in	other
words,	each	element,	of	the	histogram	corresponds	to	the	number	of	pixels	that
fall	into	that	specific	bin.	Considering	this,	we	can	draw	the	calculated	histogram
by	using	drawing	functions	from	Chapter	4,	Drawing,	Filtering,	and
Transformation.
Here's	an	example	that	shows	how	we	can	display	the	histogram	that	we
calculated	in	the	previous	code	sample	as	a	graph	with	custom	size	and
properties:
  int	gRows	=	200;	//	height	
  int	gCol	=	500;	//	width	
  Scalar	backGroundColor	=	Scalar(0,	255,	255);	//	yellow	
  Scalar	graphColor	=	Scalar(0,	0,	0);	//	black	
  int	thickness	=	2;	
  LineTypes	lineType	=	LINE_AA;	
  	
  Mat	theGraph(gRows,	gCol,	CV_8UC(3),	backGroundColor);	
  	
  Point	p1(0,0),	p2(0,0);	
  for(int	i=0;	i<bins;	i++)	
  {	
  				float	value	=	histogram.at<float>(i,0);	
  				value	=	maxVal	-	value;	//	invert	
  				value	=	value	/	maxVal	*	theGraph.rows;	//	scale	
  				line(theGraph,	
  									p1,	
  									Point(p1.x,value),	
  									graphColor,	
  									thickness,	
  									lineType);	
  				p1.y	=	p2.y	=	value;	
  				p2.x	=	float(i+1)	*	float(theGraph.cols)	/	float(bins);	
  				line(theGraph,	
  									p1,	p2,	
  									Scalar(0,0,0));	
  				p1.x	=	p2.x;	
  }	
In	the	preceding	code,	gRow	and	gCol	refer	to	the	height	and	width	of	the	resulting
graph,	respectively.	The	rest	of	the	parameters	are	either	self-explanatory
(backgroundColor	and	so	on),	or	you	have	already	learned	about	them	in	the
previous	chapters.	Notice	how	each	value	in	histogram	is	used	to	calculate	the
position	of	the	line	that	needs	to	be	drawn.	In	the	preceding	code,	maxVal	is	simply
used	to	scale	the	results	to	the	visible	range.	Here's	how	maxVal	itself	is	calculated:
  double	maxVal	=	0;	
  minMaxLoc(histogram,	
  										0,	
  										&maxVal,	
  										0,	
  										0);
Refer	to	Chapter	3,	Array	and	Matrix	Operations,	if	you	need	to	refresh	your
memory	about	how	the	minMaxLoc	function	is	used.	In	our	example,	we	only	need
the	value	of	the	biggest	element	in	the	histogram,	so	we	ignore	the	rest	of	the
parameters	by	passing	zero	to	them.
You	can	easily	change	the	background	or	the	graph	color	using	the	provided
backGroundColor	or	graphColor	parameters,	or	make	the	graph	thinner	or	thicker	by
The	following	are	the	resulting	histograms	of	the	same	image	as	the	previous
example,	with	150,	80,	and	25	bins,	from	left	to	right,	drawn	using	a	bar	chart
visualization	method:
	
You	can	easily	notice	that	the	lower	the	bins	value	is,	the	more	grouped	together
the	pixels	are.	Even	though	this	might	seem	more	like	a	lower	resolution	of	the
same	data	(from	left	to	right),	it	is	actually	better	to	use	a	lower	number	of	bins
to	group	similar	pixels	together.	Note	that	the	bar-chart	visualization	in	the
preceding	example	is	produced	by	replacing	the	for	loop	from	the	previous
example	code	with	the	following:
    Point	p1(0,0),	p2(0,	theGraph.rows-1);	
    for(int	i=0;	i<bins;	i++)	
    {	
    				float	value	=	histogram.at<float>(i,0);	
    				value	*=	0.95f;	//	5%	empty	at	top	
    				value	=	maxVal	-	value;	//	invert	
    				value	=	value	/	(maxVal)	*	theGraph.rows;	//	scale	
    				p1.y	=	value;	
    				p2.x	=	float(i+1)	*	float(theGraph.cols)	/	float(bins);	
    				rectangle(theGraph,	
    														p1,	
    														p2,	
    														graphColor,	
    														CV_FILLED,	
    														lineType);	
    				p1.x	=	p2.x;	
    }	
Both	of	these	visualizations	(graph	or	bar-chart)	have	their	own	pros	and	cons,
which	will	be	more	obvious	as	you	try	calculating	histograms	of	different	types
of	images.	Let's	try	calculating	the	histogram	of	a	color	image.	We'll	need	to
calculate	the	histogram	of	individual	channels,	as	was	mentioned	previously.
Here's	an	example	code	that	demonstrates	how	it's	done:
    Mat	image	=	imread("Test.png");	
    if(image.empty())	
    {	
    				cout	<<	"Empty	input	image!";	
    				return	-1;	
    }	
    	
    Mat	imgChannels[3];	
    Mat	histograms[3];	
  split(image,	imgChannels);	
  	
  //	each	imgChannels	element	is	an	individual	1-channel	image
You	can	download	the	complete	source	code	for	the	preceding	example	from	the
online	source	code	repository	for	this	chapter.	The	example	project	containing
the	full	source	code	of	the	preceding	code	snippet	is	called	CvHistGraphColor,	and
running	it	would	produce	a	result	similar	to	what	is	seen	in	the	following
diagram:
As	you	can	see	in	the	preceding	example	code,	the	split	function	is	used	to	create
three	individual	images,	each	containing	a	single	channel,	out	of	our	source	color
image	(BGR	by	default).	The	part	of	the	code	that	is	mentioned	with	the
commented	line	in	the	preceding	code	is	simply	a	for	loop	that	iterates	over	the
elements	of	imgChannels	and	draws	each	graph	using	the	exact	same	code	as	you
saw	before,	but	with	each	graph	having	its	own	unique	color	that	is	calculated
using	the	following	code	in	the	loop:
  Scalar	graphColor	=	Scalar(i	==	0	?	255	:	0,	
  																											i	==	1	?	255	:	0,	
  																											i	==	2	?	255	:	0);	
Depending	on	the	value	of	i,	graphColor	is	set	to	blue,	green,	or	red,	hence	the
resulting	histography	depicted	in	the	previous	picture.
Besides	interpreting	the	content	of	an	image,	or	to	seeing	how	pixel	values	are
distributed	in	an	image,	histograms	have	many	use	cases.	In	the	following
sections,	we'll	be	learning	about	back-projection	and	other	algorithms	that	are
used	for	utilizing	histograms	in	our	applications.
Back-projection	of	histograms
Considering	the	definition	of	histograms	from	the	start	of	the	previous	section,	it
can	be	said	that	back-projection	of	a	histogram	on	an	image	means	replacing
each	of	its	pixels	with	their	probability	distribution	value.	This	is,	in	a	way	(not
exactly),	the	reverse	operation	of	calculating	the	histogram	of	an	image.	When
we	back-project	a	histogram	on	an	image,	we	actually	use	a	histogram	to	modify
an	image.	Let's	first	see	how	back-projection	is	performed	using	OpenCV	and,
afterwards,	dive	into	how	it	is	actually	used.
parameters:
  calcBackProject(&image,	
  																nimages,	
  																channels,	
  																histogram,	
  																backProj,	
  																ranges,	
  																scale,	
  																uniform);	
The	nimages,	channels,	ranges,	and	uniform	parameters	in	the	calcBackProject	function
are	used	exactly	the	way	they	were	with	the	calcHist	function.	image	must	contain
the	input	image	and	histogram	needs	to	be	calculated	with	a	prior	call	to	the
calcHist	function	or	with	any	other	method	(or	even	manually).	The	result	will	be
scaled	by	using	the	scale	parameter	and	finally,	it	will	be	saved	in	backProj.	It's
important	to	note	that	the	values	in	histogram	can	be	over	the	correctly	displayable
range,	so	after	performing	the	back-projection,	the	resulting	backProj	object	will
not	be	displayable	correctly.	To	fix	this	issue,	we	need	to	first	make	sure	histogram
is	normalized	to	the	displayable	range	by	OpenCV.	The	following	code	must	be
executed	before	the	preceding	call	to	calcBackProject	in	order	for	the	resulting
backProj	to	be	displayable:
  normalize(histogram,	
  										histogram,	
  										0,	
  										255,	
  										NORM_MINMAX);
The	following	image	depicts	the	result	of	the	back-projection	of	the	image	with
its	original	histogram	(unaltered	histogram).	The	image	on	the	right-hand	side	is
the	result	of	the	back-projection	algorithm:
Here	is	an	example	that	demonstrates	how	you	can	use	the	concept	of	histograms
and	back-projection	to	detect	the	pixels	in	an	image	that	are	in	the	range	of	the
brightest	10%	of	possible	pixel	values:
  int	bins	=	10;	//	we	need	10	slices	
  float	rangeGS[]	=	{0,	256};	
  const	float*	ranges[]	=	{	rangeGS	};	
  int	channels[]	=	{0};	
  Mat	histogram(bins,	1,	CV_32FC1,	Scalar(0.0));	
  histogram.at<float>(9,	0)	=	255.0;	
  calcBackProject(&imageGray,	
  																1,	
  																channels,	
  																histogram,	
  																backProj,	
  																ranges);
Notice	that	the	histogram	is	formed	manually,	with	10	bins,	instead	of	being
calculated	from	the	original	image.	Then,	the	last	bin,	or,	in	other	words,	the	last
element	in	the	histogram,	is	set	to	255,	which	means	absolute	white.	Obviously,	if
this	wasn't	done,	we'd	need	to	perform	a	normalization	to	make	sure	the	result	of
back-projection	is	in	the	displayable	range	of	colors.
The	following	image	depicts	the	result	of	the	preceding	code	snippet	when	it	is
executed	on	the	same	sample	image	from	the	previous	examples:
The	extracted	mask	image	can	be	used	to	further	modify	an	image,	or,	in	the	case
of	a	uniquely-colored	object,	it	can	be	used	to	detect	and	track	the	object.	The
detection	and	tracking	algorithms	will	be	covered	thoroughly	in	the	upcoming
chapters,	but	how	exactly	we	can	use	the	color	of	an	object	is	what	we're	going
to	learn	next.
Learning	more	about	back-
projections
First	off,	let's	recall	that	the	HSV	color	space	is	far	better	suited	to	dealing	with
the	actual	color	value	of	pixels	in	an	image	than	the	standard	RGB	(or	BGR	and
so	on)	color	space.	You	might	want	to	revisit	Chapter	1,	Introduction	to	Computer
Vision,	for	more	information	about	this	phenomenon.	We're	going	to	use	this
simple	fact	to	find	regions	in	an	image	that	have	a	special	color,	regardless	of
their	color	intensity,	brightness,	and	so	on.	For	this	reason,	we	need	to	first
convert	an	image	to	the	HSV	color	s
Let's	simplify	this	with	an	example	case.	Imagine	we	want	to	replace	a	specific
color	in	an	image,	preserving	the	highlights,	brightness,	and	so	on.	To	be	able	to
perform	such	a	task,	we	need	to	be	able	to	accurately	detect	a	given	color	and
then	make	sure	we	only	change	the	color	in	the	detected	pixels	and	not	their
brightness	and	similar	properties.	The	following	example	code	demonstrates
how	we	can	use	a	manually-formed	histogram	of	the	hue	channel	and	its	back-
projection	to	extract	pixels	that	have	a	specific	color,	which,	in	this	example,	is
assumed	to	be	blue:
 2.	 Now	that	we	have	the	hue	channel	inside	the	hue	object,	we	need	to	form	a
     proper	histogram	of	the	hue	channel,	which	contains	only	the	pixels	with
     the	color	blue.	The	hue	value	can	be	a	value	between	0	and	360	(in	degrees)
   and	the	hue	value	of	blue	is	240.	So,	we	can	create	a	histogram	using	the
   following	code,	which	can	be	used	to	extract	the	blue-colored	pixels,	with
   an	offset	(or	threshold)	of	50	pixels:
        int	bins	=	360;	
        int	blueHue	=	240;	
        int	hueOffset	=	50;	
        Mat	histogram(bins,	1,	CV_32FC1);	
        for(int	i=0;	i<bins;	i++)	
        {	
        				histogram.at<float>(i,	0)	=		
        								(i	>	blueHue	-	hueOffset)	
        												&&	
        								(i	<	blueHue	+	hueOffset)	
        												?	
        								255.0	:	0.0;	
        }
      The	preceding	code	acts	like	a	simple	threshold,	in	which	all	elements	in	the	histogram	that
      have	an	index	of	240	(plus/minus	50)	are	set	to	255	and	the	rest	are	set	to	zero.
      Before	proceeding	with	the	next	steps,	let's	break	down	the	preceding
      example	code.	It	is	almost	exactly	the	same	as	visualizing	a	grayscale
    histogram	or	a	single	red-,	green-,	or	blue-channel	histogram.	However,
    the	interesting	fact	to	note	about	the	preceding	code	is	where	we	form	the
    colors	object.	The	colors	object	is	going	to	be	a	simple	vector	that	contains
    all	possible	colors	across	the	hue	spectrum,	but	according	to	the	number
    of	bins	we	have.	Notice	how	we	have	used	the	saturate_cast	function	in
    OpenCV	to	make	sure	the	hue	values	are	saturated	into	the	acceptable
    range.	The	S	and	V	channels	are	simply	set	to	their	highest	possible
    value,	which	is	255.	After	the	colors	object	is	correctly	created,	we	have
    used	the	same	visualization	function	as	before.	However,	since	OpenCV
    does	not	display	images	in	the	HSV	color	space	by	default	(and	you	can
    expect	the	same	behavior	in	most	image	display	functions	and	libraries),
    we	need	to	convert	the	HSV	color	space	to	BGR	in	order	to	display	the
    colors	correctly.
    Even	though	hue	can	take	a	value	in	the	range	of	(0,	360),	it	is	not	possible	to	store	it	in
    single-byte	C++	types	(such	as	uchar),	which	are	capable	of	storing	values	in	the	range	of	(0,
    255).	That	is	why	hue	values	are	considered	to	be	in	the	range	of	(0,	180)	in	OpenCV,	or,	in
    other	words,	they	are	simply	divided	by	two.
    The	following	image	depicts	the	result	of	the	preceding	example	code,	if
    we	try	to	display	theGraph	using	the	imshow	function:
    These	are	the	colors	we're	going	to	extract	in	a	mask,	if	we	use	its
    corresponding	histogram	to	calculate	the	back-project	of	an	image.	This
    range	of	colors	is	created	using	the	simple	threshold	(in	a	loop)	that	we
    did	when	we	formed	the	histogram	manually.	Obviously,	if	you	set	all	the
    values	of	the	histogram	to	255.0	instead	of	just	the	blue	range,	you'd	have
    the	whole	spectrum	of	colors.	Here's	a	simple	example:
      Mat	histogram(bins,	1,	CV_32FC1);	
      for(int	i=0;	i<bins;	i++)	
      {	
      				histogram.at<float>(i,	0)	=	255.0;	
      }
         Now	let's	go	back	to	our	original	histogram	of	only	the	blue	colors	and
         continue	with	the	remaining	steps.
 5.	 Shift	and	change	the	hue	in	the	pixels	we	extracted	using	the	calcBackProject
     function.	We	simply	need	to	loop	through	the	pixels	and	shift	their	first
     channel	with	any	desired	value.	The	result	obviously	must	be	converted	to
     BGR	before	it	is	suitable	for	being	displayed.	Here's	how:
         int	shift	=	-50;	
         for(int	i=0;	i<imgHsv.rows;	i++)	
         {	
         				for(int	j=0;	j<imgHsv.cols;	j++)	
         				{	
         								if(backProject.at<uchar>(i,	j))	
         								{	
         												imgHsv.at<Vec3b>(i,j)[0]	+=	shift;	
         								}	
         				}	
         }	
         Mat	imgHueShift;	
         cvtColor(imgHsv,	imgHueShift,	CV_HSV2BGR);
We	used	a	shift	value	of	-50	in	the	preceding	example,	which	will	cause	the	blue
pixels	to	turn	to	green,	preserving	their	brightness,	and	so	on.	Using	various	shift
values	would	result	in	different	colors	replacing	the	blue	pixels.	Here	are	two
examples:
What	we	learned	in	the	preceding	example	is	the	basis	of	many	color-based
detection	and	tracking	algorithms,	as	we'll	learn	in	the	upcoming	chapters.	Being
able	to	correctly	extract	pixels	of	a	certain	color,	regardless	of	their	brightness
shift,	is	extremely	handy.	Brightness	shift	in	a	color	is	what	happens	when	the
lighting	over	an	object	of	a	certain	color	is	changed,	or	during	day	and	night,
which	is	taken	into	account	when	hue	is	used	instead	of	red,	green,	or	blue
channels	in	an	RGB	image.
Before	proceeding	to	the	final	section	of	this	chapter,	it's	worth	noting	that	the
exact	same	visualization	method	that	we	used	for	displaying	the	manually-made
histogram	of	an	imaginary	hue	channel	can	also	be	used	to	visualize	the	color
histograms	calculated	from	an	image.	Let's	see	how	it's	done	with	an	example.
In	the	preceding	example,	right	after	the	initial	step,	instead	of	forming	the
histogram	manually,	simply	calculate	it	using	the	calcHist	algorithm,	as	seen	here:
  int	bins	=	36;	
  int	histSize[]	=	{bins};	
  int	nimages	=	1;	
  int	dims	=	1;	
  int	channels[]	=	{0};	
  float	rangeHue[]	=	{0,	180};	
  const	float*	ranges[]	=	{rangeHue};	
  bool	uniform	=	true;	
  bool	accumulate	=	false;	
  Mat	histogram,	mask;	
  	
  calcHist(&hue,	
  									nimages,	
  									channels,	
  									mask,	
  									histogram,	
  									dims,	
  									histSize,	
  									ranges,	
  									uniform,	
  									accumulate);	
Changing	the	bin	size	effect	is	similar	to	what	we	saw	in	grayscale	and	single
channel	histograms,	in	the	sense	that	it	groups	nearby	values	together.	However,
and	in	case	of	visualizing	the	hue	channel,	the	nearby	hue	values	will	be	grouped
together,	which	results	in	a	hue	histogram	that	better	represents	similar	colors	in
an	image.	The	following	example	images	depict	the	result	of	the	preceding
visualization,	but	with	different	bins	values.	From	top	to	bottom,	the	bins	value
used	to	calculate	each	histogram	is	360,	100,	36,	and	7.	Notice	how	the
resolution	of	the	histogram	decreases	as	the	bins	value	is	decreased:
Choosing	the	right	bins	value	completely	depends	on	what	type	of	objects	you
are	dealing	with	and	your	definition	of	similar	colors.	What	can	be	seen	from	the
preceding	image	is	that	obviously	choosing	a	very	high	bin	value	(such	as	360)
is	not	useful	when	we	need	at	least	some	level	of	grouping	of	similar	colors.	On
the	other	hand,	choosing	a	very	low	bin	size	can	result	in	an	extreme	grouping	of
colors	in	which	calculating	a	back-projection	would	not	produce	an	accurate
result.	Make	sure	to	choose	the	bins	value	wisely,	varying	them	for	different
subjects.
Comparing	histograms
Histograms	can	be	compared	with	each	other	in	other	to	get	some	insight	into	the
content	of	an	image.	OpenCV	allows	histogram	comparison	using	a	method
called	compareHist,	which	requires	the	comparison	method	to	be	set	first.	The
following	example	code	depicts	how	this	function	can	be	used	to	calculate	the
result	of	comparison	between	two	histograms	calculated	using	previous	calls	to
the	calcHist	function:
    HistCompMethods	method	=	HISTCMP_CORREL;	
    double	result	=	compareHist(histogram1,	histogram2,	method);	
histogram1	and	histogram2,	in	the	preceding	example,	are	simply	histograms	of	two
different	images,	or	different	channels	of	an	image.	method,	on	the	other	hand,
which	must	contain	a	valid	entry	from	the	HistCompMethods	enum,	defines	the
comparison	algorithm	used	by	the	compareHist	function	and	it	can	be	any	one	of
the	following	methods:
You	can	refer	to	the	latest	OpenCV	documentation	to	get	more	information	about
the	mathematical	details	of	each	method,	and	how	and	what	properties	of
histograms	are	used	by	them.	The	same	can	be	said	about	the	interpretation	of
the	results	of	any	method.	Let's	see	what	this	means	with	an	example.	Using	the
following	sample	code,	we	can	output	the	result	of	all	histogram-comparison
methods:
    cout	<<	"HISTCMP_CORREL:	"	<<	
    		compareHist(histogram1,	histogram2,	HISTCMP_CORREL)	
    				<<	endl;	
    cout	<<	"HISTCMP_CHISQR:	"	<<	
    		compareHist(histogram1,	histogram2,	HISTCMP_CHISQR)	
    				<<	endl;	
    cout	<<	"HISTCMP_INTERSECT:	"	<<	
    		compareHist(histogram1,	histogram2,	HISTCMP_INTERSECT)	
    				<<	endl;	
    cout	<<	"HISTCMP_BHATTACHARYYA:	"	<<	
    		compareHist(histogram1,	histogram2,	HISTCMP_BHATTACHARYYA)	
    				<<	endl;	
    cout	<<	"HISTCMP_HELLINGER:	"	<<	
    		compareHist(histogram1,	histogram2,	HISTCMP_HELLINGER)	
    				<<	endl;	
    cout	<<	"HISTCMP_CHISQR_ALT:	"	<<	
    		compareHist(histogram1,	histogram2,	HISTCMP_CHISQR_ALT)	
    				<<	endl;	
    cout	<<	"HISTCMP_KL_DIV:	"	<<	
    		compareHist(histogram1,	histogram2,	HISTCMP_KL_DIV)	
    				<<	endl;	
We	use	the	same	example	image	we	used	throughout	this	chapter	to	calculate
both	histogram1	and	histogram2,	or,	in	other	words,	if	we	compare	one	histogram
with	an	equal	histogram,	here's	what	we	would	get:
    HISTCMP_CORREL:	1	
    HISTCMP_CHISQR:	0	
    HISTCMP_INTERSECT:	426400	
    HISTCMP_BHATTACHARYYA:	0	
    HISTCMP_HELLINGER:	0	
    HISTCMP_CHISQR_ALT:	0	
    HISTCMP_KL_DIV:	0
The	following	results	would	be	created	if	the	image	on	the	left	is	used	to	create
histogram1	and	the	image	on	the	right	is	used	to	create	histogram2,	or,	in	other
It's	important	to	note	that	the	order	of	the	histograms	being	passed	to	the
compareHist	function	matters	in	some	cases,	such	as	when	HISTCMP_CHISQR	is	used	as
the	method.	Here	are	the	results	with	histogram1	and	histogram2	passed	in	reverse
order	to	the	compareHist	function:
  HISTCMP_CORREL:	-0.0449654	
  HISTCMP_CHISQR:	3.26926e+06	
  HISTCMP_INTERSECT:	64149	
  HISTCMP_BHATTACHARYYA:	0.825928	
  HISTCMP_HELLINGER:	0.825928	
  HISTCMP_CHISQR_ALT:	1.32827e+06	
  HISTCMP_KL_DIV:	1.15856e+07	
Except	the	integral	part,	which	is	simply	calculating	the	sum	of	the	values	in	all
bins,	the	rest	is	what	we	already	performed	in	this	chapter,	in	one	way	or	another.
OpenCV	includes	a	function	called	equalizeHist	that	performs	all	of	the	mentioned
operations	and	produces	an	image	with	an	equalized	histogram.	Let's	first	see
how	this	function	is	used	and	then	try	an	example	to	see	the	effect	for	ourselves.
The	following	example	codes	depict	how	the	equalizeHist	function	is	used,	which
is	extremely	easy	to	use	and	requires	no	special	parameters	whatsoever:
  Mat	equalized;	
  equalizeHist(gray,	equalized);	
Let's	consider	for	instance	that	we	have	the	following	image,	which	is	extremely
overexposed	(or	bright),	and	its	histogram,	which	is	depicted	on	the	right-hand
side:
	
Using	the	equalizeHist	function,	we	can	get	an	image	with	better	contrast	and
brightness.	Here	are	the	resulting	image	and	histogram	of	the	preceding	example
image	when	its	histogram	is	equalized:
Histogram	equalization	is	quite	helpful	when	we	have	to	deal	with	images	that
have	the	potential	to	be	overexposed	(too	bright)	or	underexposed	(too	dark).	For
instance,	x-ray	scan	images,	where	the	details	are	only	visible	when	the	contrast
and	brightness	is	increased	using	a	powerful	backlight,	or	when	we	are	working
with	video	frames	from	an	environment	that	can	have	intensive	light	changes,
are	examples	of	conditions	in	which	histogram	equalization	can	be	used	to	make
sure	the	rest	of	the	algorithms	always	deal	with	the	same,	or	just	slightly
different,	brightness	and	contrast	levels.
Summary
We	started	this	chapter	by	learning	about	histograms,	what	they	are,	and	how
they	are	calculated	using	the	OpenCV	library.	We	learned	about	the	bin	size	of	a
histogram	and	how	it	can	affect	the	accuracy	or	grouping	of	the	values	in	a
histogram.	We	continued	to	learn	about	visualizing	histograms	using	the
functions	and	algorithms	we	learned	in	Chapter	4,	Drawing,	Filtering	and
Transformation.	After	going	through	various	visualization	types,	we	learned
about	back-projection	and	how	we	can	update	an	image	using	a	histogram.	We
learned	about	detecting	pixels	with	a	certain	color	and	how	to	shift	the	hue
value,	and	consequently	the	color	of	only	those	specific	pixels.	In	the	final
sections	of	this	chapter,	we	learned	about	comparing	histograms	and	histogram-
equalization	algorithms.	We	performed	hands-on	examples	for	possible
histogram	comparison	scenarios	and	enhanced	the	contrast	and	brightness	of	an
overexposed	image.
Histograms	and	how	they	are	used	to	enhance	and	modify	images	using	back-
projection	is	one	of	the	computer	vision	subjects	that	cannot	be	easily	skipped
over	or	missed,	since	it	is	the	foundation	of	many	image	enhancement
algorithms	and	techniques	in	photo-editing	applications,	or,	as	we'll	see	later	on
in	the	upcoming	chapters,	the	basis	of	some	of	the	most	important	real-time
detection	and	tracking	algorithms.	What	we	learned	in	this	chapter	were	a	few	of
the	most	practical	use	cases	of	histograms	and	back-projection,	but	there	is
certainly	much	more	to	these	algorithms	if	you	start	building	real-life	projects
that	make	use	of	histograms.
In	the	next	chapter,	we'll	use	all	of	the	concepts	we	learned	in	this	and	the
previous	chapters	to	work	videos	and	video	frames	to	detect	objects	with	a
certain	color,	track	them	in	real-time,	or	detect	motion	in	a	video.
Questions
1.	 Calculate	the	histogram	of	the	second	channel	in	a	three-channel	image.
    Use	an	optional	bin	size	and	a	range	of	0	to	100	for	possible	values	of	the
    second	channel.
2.	 Create	a	histogram	that	can	be	used	with	the	calcBackProject	function	to
    extract	the	darkest	pixels	from	a	grayscale	image.	Consider	the	darkest	25%
    possible	pixel	values	as	the	grayscale	intensities	we	are	looking	to	extract.
3.	 In	the	previous	question,	what	if	we	needed	the	darkest	and	brightest	25%
    to	be	excluded,	instead	of	extracted,	in	a	mask?
4.	 What	is	the	hue	value	of	the	color	red?	How	much	should	it	be	shifted	to
    get	the	color	blue?
5.	 Create	a	hue	histogram	that	can	be	used	to	extract	red-colored	pixels	from
    an	image.	Consider	an	offset	of	50	for	pixels	that	are	considered	reddish.
    Finally,	visualize	the	hue	histogram	calculated.
6.	 Calculate	the	integral	of	a	histogram.
7.	 Perform	histogram	equalization	on	a	color	image.	Note	that	the	equalizeHist
    function	only	supports	histogram	equalization	of	single-channel	8-bit
    grayscale	images.
                                       	
Further	reading
  OpenCV	3.x	with	Python	By	Example	–	Second	Edition	(https://www.packtpub.
  com/application-development/opencv-3x-python-example-second-edition)
  Computer	Vision	with	OpenCV	3	and	Qt5	(https://www.packtpub.com/application
  -development/computer-vision-opencv-3-and-qt5)
Video	Analysis	–	Motion
Detection	and	Tracking
	
	
Technical	requirements
	
Refer	to	Chapter	2,	Getting	Started	with	OpenCV,	for	more	information	about	how
to	set	up	a	personal	computer	and	make	it	ready	for	developing	computer	vision
applications	using	the	OpenCV	library.
You	can	use	the	following	URL	to	download	the	source	codes	and	examples	for
this	chapter:	https://github.com/PacktPublishing/Hands-On-Algorithms-for-Computer-Vision/t
ree/master/Chapter06.
	
Processing	videos
To	be	able	to	use	any	of	the	algorithms	that	we've	learned	so	far	on	videos,	we
need	to	be	able	to	read	video	frames	and	store	them	in	Mat	objects.	We	have
already	learned	about	how	to	deal	with	video	files,	cameras,	and	RTSP	feeds	in
the	initial	chapters	of	this	book.	So,	extending	that,	using	what	we	learned	in	the
previous	chapters,	we	can	use	a	code	similar	to	the	following,	in	order	to	apply
colormaps	to	the	video	feed	from	the	default	camera	on	a	computer:
  VideoCapture	cam(0);	
  //	check	if	camera	was	opened	correctly	
  if(!cam.isOpened())	
  				return	-1;	
  //	infinite	loop	
  while(true)	
  {	
  				Mat	frame;	
  				cam	>>	frame;	
  				if(frame.empty())	
  								break;	
  	
  				applyColorMap(frame,	frame,	COLORMAP_JET);	
  	
  				//	display	the	frame	
  				imshow("Camera",	frame);	
  	
  				//	stop	camera	if	space	is	pressed	
  				if(waitKey(10)	==	'	')	
  								break;	
  }	
  	
  cam.release();	
The	original	output	video	of	the	camera	will	be	displayed	unless	any	of	the
mentioned	keys	are	pressed.	Pressing	J	will	trigger	COLORMAP_JET,	while	pressing	H
will	trigger	the	COLORMAP_HOT	colormap	being	applied	to	the	camera	frames.	Similar
to	the	previous	example,	pressing	the	spacebar	key	will	stop	the	process.	Also,
pressing	any	keys	other	than	space,	J,	or	H	will	result	in	the	original	video	being
displayed.
        The	applyColorMap	function	in	the	preceding	examples	is	just	a	random	algorithm	that	is	used	to
        describe	the	technique	used	to	process	videos	in	real-time.	You	can	use	any	of	the	algorithms
        you	have	learned	in	this	book	that	perform	on	a	single	image.	You	can,	for	instance,	write	a
        program	that	performs	a	smoothening	filter	on	the	video,	or	Fourier	transformation,	or	even	a
        program	that	displays	the	Hue	channel	histogram	in	real-time.	The	use	cases	are	infinite,
        however	the	method	used	is	almost	identical	for	all	algorithms	that	perform	a	single	complete
        operation	on	any	individual	image.
Let's	assume	we	want	to	find	the	average	brightness	of	the	last	60	frames	read
from	a	camera	at	any	given	moment.	Such	a	value	is	quite	useful	when	we	want
to	automatically	adjust	the	brightness	of	the	video	when	the	content	of	the
frames	is	extremely	dark	or	extremely	bright.	In	fact,	a	similar	operation	is
usually	performed	by	the	internal	processor	of	most	digital	cameras,	and	even
the	smartphone	in	your	pocket.	You	can	give	it	a	try	by	turning	on	the	camera	on
your	smartphone	and	pointing	it	toward	a	light,	or	the	sun,	or	by	entering	a	very
dark	environment.	The	following	code	demonstrates	how	the	average	of	the
brightness	of	the	last	60	frames	is	calculated	and	displayed	in	the	corner	of	the
video:
  VideoCapture	cam(0);	
  if(!cam.isOpened())	
  				return	-1;	
  vector<Scalar>	avgs;	
  int	key	=	-1;	
  while(key	!=	'	')	
  {	
  				Mat	frame;	
  				cam	>>	frame;	
  				if(frame.empty())	
  								break;	
  				if(avgs.size()	>	60)	//	remove	the	first	item	if	more	than	60	
  								avgs.erase(avgs.begin());	
  				Mat	frameGray;	
  				cvtColor(frame,	frameGray,	CV_BGR2GRAY);	
  				avgs.push_back(	mean(frameGray)	);	
  				Scalar	allAvg	=	mean(avgs);	
  				putText(frame,	
  												to_string(allAvg[0]),	
  												Point(0,frame.rows-1),	
  												FONT_HERSHEY_PLAIN,	
  												1.0,	
  												Scalar(0,255,0));	
  				imshow("Camera",	frame);	
  				int	k	=	waitKey(10);	
  				if(k	>	0)	
  								key	=	k;	
  }	
  cam.release();	
For	the	most	part,	this	example	code	is	quite	similar	to	the	examples	we	saw
earlier	in	this	chapter.	The	main	difference	here	is	that	we	are	storing	the	average
of	the	last	60	frames	that	are	calculated	using	the	OpenCV	mean	function	in	a
vector	of	Scalar	objects,	and	then	we	calculate	the	average	of	all	averages.	The
calculated	value	is	then	drawn	on	the	input	frame	using	the	putText	function.	The
following	image	depicts	a	single	frame	that	is	displayed	when	the	preceding
example	code	is	executed:
Notice	the	value	displayed	in	the	bottom-left	corner	of	the	image,	which	will
start	to	decrease	when	the	content	of	the	video	frames	becomes	darker	and
increase	when	they	become	brighter.	Based	on	this	result,	you	can,	for	instance,
change	the	brightness	value	or	warn	the	user	of	your	application	that	the	content
is	too	dark	or	bright,	and	so	on.
The	examples	in	this	initial	section	of	the	chapter	were	meant	to	teach	you	the
idea	of	processing	individual	frames	using	the	algorithms	you've	learned	in	the
previous	chapters,	and	a	few	simple	programming	techniques	used	to	calculate	a
value	based	on	consecutive	frames.	In	the	following	sections	of	this	chapter,
we'll	be	learning	about	some	of	the	most	important	video-processing	algorithms,
namely	object	detection	and	tracking	algorithms,	which	depend	on	the	concepts
and	techniques	we	learned	in	this	section	and	the	previous	chapters	of	this	book.
Understanding	the	Mean	Shift
algorithm
The	Mean	Shift	algorithm	is	an	iterative	algorithm	that	can	be	used	to	find	the
maxima	of	a	density	function.	A	very	rough	translation	of	the	preceding	sentence
to	computer	vision	terminology	would	be	the	following—the	Mean	Shift
algorithm	can	be	used	to	find	an	object	in	an	image	using	a	back-projection
image.	But	how	is	it	achieved	in	practice?	Let's	walk	through	this	step	by	step.
Here	are	the	individual	operations	that	are	performed	to	find	an	object	using	the
Mean	Shift	algorithm,	in	order:
 3.	 Just	like	any	iterative	algorithm,	some	termination	criteria	are	required	by
     the	Mean	Shift	algorithm	to	stop	the	algorithm	when	the	results	are	as
     expected	or	when	reaching	an	accepted	result	does	not	happen	as	fast	as
     needed.	So,	the	number	of	iterations	and	an	epsilon	value	are	used	as
     termination	criteria.	Either	by	reaching	the	number	of	iterations	in	the
     algorithm	or	by	finding	a	shift	distance	that	is	smaller	than	the	given
     epsilon	value	(convergence),	the	algorithm	will	stop.
Now,	let's	see	a	hands-on	example	of	how	this	algorithm	is	used	in	practice	by
using	the	OpenCV	library.	The	meanShift	function	in	OpenCV	implements	the
Mean	Shift	algorithm	almost	exactly	as	it	was	described	in	the	preceding	steps.
This	function	requires	a	back-projection	image,	a	search	window,	and	the
termination	criteria,	and	it	is	used	as	seen	in	the	following	example:
  Rect	srchWnd(0,	0,	100,	100);	
  TermCriteria	criteria(TermCriteria::MAX_ITER	
  																						+	TermCriteria::EPS,	
  																						20,	//	number	of	iterations	
  																						1.0	//	epsilon	value	
  																						);	
  //	Calculate	back-projection	image	
  meanShift(backProject,	
  										srchWnd,	
  										criteria);
srchWnd	is	a	Rect	object,	which	is	simply	a	rectangle	that	must	contain	an	initial
value	that	is	used	and	then	updated	by	the	meanShift	function.	backProjection	must
contain	a	proper	back-projection	image	that	is	calculated	with	any	of	the
methods	that	we	learned	in	Chapter	5,	Back-Projection	and	Histograms.	The
TermCriteria	class	is	an	OpenCV	class	that	is	used	by	iterative	algorithms	that
require	similar	termination	criteria.	The	first	parameter	defines	the	type	of	the
termination	criteria,	which	can	be	MAX_ITER	(same	as	COUNT),	EPS,	or	both.	In	the
preceding	example,	we	have	used	the	termination	criteria	of	20	iterations	and	an
epsilon	value	of	1.0,	which	of	course	can	be	changed	depending	on	the
environment	and	application.	The	most	important	thing	to	note	here	is	that	a
higher	number	of	iterations	and	a	lower	epsilon	can	yield	more	accurate	results,
but	it	can	also	lead	to	slower	performance,	and	vice	versa.
 1.	 The	structure	of	the	tracking	example	we'll	create	is	quite	similar	to	the
     previous	examples	in	this	chapter.	We	need	to	open	a	video,	or	a	camera,	on
     the	computer	using	the	VideoCapture	class	and	then	start	reading	the	frames,
     as	seen	here:
         VideoCapture	cam(0);	
         if(!cam.isOpened())	
         				return	-1;	
         	
         int	key	=	-1;	
         while(key	!=	'	')	
         {	
         				Mat	frame;	
         				cam	>>	frame;	
         				if(frame.empty())	
         								break;	
         	
         				int	k	=	waitKey(10);	
         				if(k	>	0)	
         								key	=	k;	
         }	
         	
         cam.release();	
       Again,	we	have	used	the	waitKey	function	to	stop	the	loop	if	the	spacebar
       key	is	pressed.
 2.	 We're	going	to	assume	that	our	object	of	interest	has	a	green	color.	So,	we're
     going	to	form	a	hue	histogram	that	contains	only	the	green	colors,	as	seen
     here:
         int	bins	=	360;	
         int	grnHue	=	120;	//	green	color	hue	value	
         int	hueOffset	=	50;	//	the	accepted	threshold	
         Mat	histogram(bins,	1,	CV_32FC1);	
         for(int	i=0;	i<bins;	i++)	
         {	
         				histogram.at<float>(i,	0)	=	
         												(i	>	grnHue	-	hueOffset)	
         												&&	
         												(i	<	grnHue	+	hueOffset)	
         												?	
         																255.0	:	0.0;	
         }	
       This	needs	to	happen	before	entering	the	process	loop,	since	our
       histogram	is	going	to	stay	constant	throughout	the	whole	process.
3.	 One	last	thing	to	take	care	of	before	entering	the	actual	process	loop	and	the
    tracking	code	is	the	termination	criteria,	which	will	stay	constant
    throughout	the	whole	process.	Here's	how	we'll	create	the	required
    termination	criteria:
        Rect	srchWnd(0,0,	100,	100);	
        TermCriteria	criteria(TermCriteria::MAX_ITER	
        																						+	TermCriteria::EPS,	
        																						20,	
        																						1.0);	
      The	initial	value	of	the	search	window	is	quite	important	when	using	the	Mean	Shift	algorithm
      to	track	objects,	since	this	algorithm	always	makes	an	assumption	about	the	initial	position	of
      the	object	to	be	tracked.	This	is	an	obvious	downside	of	the	Mean	Shift	algorithm,	which	we'll
      learn	how	to	deal	with	later	on	in	this	chapter	when	we	discuss	the	CAM	Shift	algorithm	and
      its	implementation	in	the	OpenCV	library.
4.	 After	each	frame	is	read	in	the	while	loop	we're	using	for	the	tracking	code,
    we	must	calculate	the	back-projection	image	of	the	input	frame	using	the
    green	hue	histogram	that	we	created.	Here's	how	it's	done:
        Mat	frmHsv,	hue;	
        vector<Mat>	hsvChannels;	
        cvtColor(frame,	frmHsv,	COLOR_BGR2HSV);	
        split(frmHsv,	hsvChannels);	
        hue	=	hsvChannels[0];	
        	
        int	nimages	=	1;	
        int	channels[]	=	{0};	
        Mat	backProject;	
        float	rangeHue[]	=	{0,	180};	
        const	float*	ranges[]	=	{rangeHue};	
        double	scale	=	1.0;	
        bool	uniform	=	true;	
        calcBackProject(&hue,	
        																nimages,	
        																channels,	
        																histogram,	
        																backProject,	
        																ranges,	
        																scale,	
        																uniform);	
5.	 Call	the	meanShift	function	to	update	the	search	window	using	the	back-
    projection	image	and	provided	termination	criteria,	as	seen	here:
        meanShift(backProject,	
        										srchWnd,	
        										criteria);	
 6.	 To	visualize	the	search	window,	or	in	other	words	the	tracked	object,	we
     need	to	draw	the	search-window	rectangle	on	the	input	frame.	Here's	how
     you	can	do	this	by	using	the	rectangle	function:
         rectangle(frame,	
         										srchWnd,	//	search	window	rectangle	
         										Scalar(0,0,255),	//	red	color	
         										2	//	thickness	
         										);
 7.	 Add	the	means	to	switch	between	the	back-projection	and	original	video
     frame	using	the	B	and	V	keys.	Here's	how	it's	done:
         switch(key)	
         {	
         case	'b':	imshow("Camera",	backProject);	
         				break;	
         case	'v':	default:	imshow("Camera",	frame);	
         				break;	
         }	
Let's	give	our	program	a	try	and	see	how	it	performs	when	executed	in	a	slightly
controlled	environment.	The	following	picture	demonstrates	the	initial	position
of	the	search	window	and	our	green	object	of	interest,	both	in	the	original	frame
view	and	the	back-projection	view:
Moving	the	object	around	will	cause	the	meanShift	function	to	update	the	search
window	and	consequently	track	the	object.	Here's	another	result,	depicting	the
object	tracked	to	the	bottom-right	corner	of	the	view:
Notice	the	small	amount	of	noise	that	can	be	seen	in	the	corner,	which	would	be
taken	care	of	by	the	meanShift	function	since	the	mass	center	is	not	affected	too
much	by	it.	However,	as	mentioned	previously,	it	is	a	good	idea	to	perform	some
sort	of	filtering	on	the	back-projection	image	to	get	rid	of	noise.	For	instance,
and	in	case	of	noise	similar	to	what	we	have	in	the	back-projection	image,	we
can	use	the	GaussianBlur	function,	or	even	better,	the	erode	function,	to	get	rid	of
unwanted	pixels	in	the	back-projection	image.	For	more	information	on	how	to
use	filtering	functions,	you	can	refer	to	Chapter	4,	Drawing,	Filtering,	and
Transformation.
Obviously,	the	route	is	a	vector	of	Point	objects.	route	needs	to	be	updated	after	the
meanShift	function	call,	and	then	we	can	use	the	following	call	to	the	polylines
function	in	order	to	draw	the	route	over	the	original	video	frame:
    polylines(frame,	
    										route,	//	the	vector	of	Point	objects	
    										false,	//	not	a	closed	polyline	
    										Scalar(0,255,0),	//	green	color	
    										2	//	thickness	
    										);	
The	following	picture	depicts	the	result	of	displaying	the	tracking	route	(for	the
last	60	frames)	on	the	original	video	frames	read	from	the	camera:
Now,	let's	address	some	issues	that	we	observed	while	working	with	the	meanShift
function.	First	of	all,	it	is	not	convenient	to	create	the	hue	histogram	manually.	A
flexible	program	should	allow	the	user	to	choose	the	object	they	want	to	track,	or
at	least	allow	the	user	to	choose	the	color	of	the	object	of	interest	conveniently.
The	same	can	be	said	about	the	search	window	size	and	its	initial	position.	There
are	a	number	of	ways	to	deal	with	such	issues	and	we're	going	to	address	them
with	a	hands-on	example.
When	using	the	OpenCV	library,	you	can	use	the	setMouseCallback	function	to
customize	the	behavior	of	mouse	clicks	on	an	output	window.	This	can	be	used
in	combination	with	a	few	simple	methods,	such	as	bitwise_not	to	mimic	an	easy-
to-use	object	selection	for	the	users.	setMouseCallback,	as	it	can	be	guessed	from	its
name,	sets	a	callback	function	to	handle	the	mouse	clicks	on	a	given	window.
The	following	callback	function	in	conjunction	with	the	variables	defined	here
can	be	used	to	create	a	convenient	object	selector:
  bool	selecting	=	false;	
  Rect	selection;	
  Point	spo;	//	selection	point	origin	
  	
  void	onMouse(int	event,	int	x,	int	y,	int	flags,	void*)	
  {	
  				switch(event)	
  				{	
  				case	EVENT_LBUTTONDOWN:	
  				{	
  								spo.x	=	x;	
  								spo.y	=	y;	
  								selection.x	=	spo.x;	
  								selection.y	=	spo.y;	
  								selection.width	=	0;	
  								selection.height	=	0;	
  								selecting	=	true;	
  	
  				}	break;	
  				case	EVENT_LBUTTONUP:	
  				{	
  								selecting	=	false;	
  				}	break;	
  				default:	
  				{	
  								selection.x	=	min(x,	spo.x);	
  								selection.y	=	min(y,	spo.y);	
  								selection.width	=	abs(x	-	spo.x);	
  								selection.height	=	abs(y	-	spo.y);	
  				}	break;	
  				}	
  }	
The	event	contains	an	entry	from	the	MouseEventTypes	enum,	which	describes
whether	a	mouse	button	was	pressed	or	released.	Based	on	such	a	simple	event,
we	can	decide	when	the	user	is	actually	selecting	an	object	that's	visible	on	the
screen.	This	is	depicted	as	follows:
  if(selecting)	
  {	
  				Mat	sel(frame,	selection);	
  				bitwise_not(sel,	sel);	//	invert	the	selected	area	
  	
  				srchWnd	=	selection;	//	set	the	search	window	
  	
  				//	create	the	histogram	using	the	hue	of	the	selection	
  }	
This	allows	a	huge	amount	of	flexibility	for	our	applications,	and	the	code	is
bound	to	work	with	objects	of	any	color.	Make	sure	to	check	out	the	example
codes	for	this	chapter	from	the	online	Git	repository	for	a	complete	example
project	that	uses	all	the	topics	we've	learned	so	far	in	this	chapter.
Another	method	of	selecting	an	object	or	a	region	on	an	image	is	by	using	the
selectROI	and	selectROIs	functions	in	the	OpenCV	library.	These	functions	allow
the	user	to	select	a	rectangle	(or	rectangles)	on	an	image	using	simple	mouse
clicks	and	drags.	Note	that	the	selectROI	and	selectROIs	functions	are	easier	to	use
than	handling	mouse	clicks	using	callback	functions,	however	they	do	not	offer
the	same	amount	of	power,	flexibility,	and	customization.
Before	moving	on	to	the	next	section,	let's	recall	that	meanShift	does	not	handle	an
increase	or	decrease	in	the	size	of	the	object	that	is	being	tracked,	nor	does	it
take	care	of	the	orientation	of	the	object.	These	are	probably	the	main	issues	that
have	led	to	the	development	of	a	more	sophisticated	version	of	the	Mean	Shift
algorithm,	which	is	the	next	topic	we're	going	to	learn	about	in	this	chapter.
Using	the	Continuously	Adaptive
Mean	(CAM)	Shift
To	overcome	the	limitations	of	the	Mean	Shift	algorithm,	we	can	use	an
improved	version	of	it,	which	is	called	the	Continuously	Adaptive	Mean	Shift,
or	simply	the	CAM	Shift	algorithm.	OpenCV	contains	the	implementation	for
the	CAM	Shift	algorithm	in	a	function	named	CamShift,	which	is	used	almost	in	an
identical	manner	to	the	meanShift	function.	The	input	parameters	of	the	CamShift
function	are	the	same	as	meanShift,	since	it	also	uses	a	back-projection	image	to
update	a	search	window	using	a	given	set	of	termination	criteria.	In	addition,
CamShift	also	returns	a	RotatedRect	object,	which	contains	both	the	search	window
and	its	angle.
Without	using	the	returned	RotatedRect	object,	you	can	simply	replace	any	call	to
the	meanShift	function	with	CamShift,	and	the	only	difference	would	be	that	the
results	will	be	scale-invariant,	meaning	the	search	window	will	become	bigger	if
the	object	is	nearer	(or	bigger)	and	vice	versa.	For	instance,	we	can	replace	the
call	to	the	meanShift	function	in	the	preceding	example	code	for	the	Mean	Shift
algorithm	with	the	following:
  CamShift(backProject,	
  									srchWnd,	
  									criteria);	
The	following	images	depict	the	result	of	replacing	the	meanShift	function	with
CamShift	in	the	example	from	the	previous	section:
Notice	that	the	results	are	now	scale-invariant,	even	though	we	didn't	change
anything	except	replace	the	mentioned	function.	As	the	object	moves	farther
away	from	the	camera,	or	becomes	smaller,	the	same	Mean	Shift	algorithm	is
used	to	calculate	its	position,	however,	this	time	the	search	window	is	resized	to
fit	the	exact	size	of	the	object,	and	the	rotation	is	calculated,	which	we	didn't	use.
To	be	able	to	use	the	rotation	value	of	the	object,	we	need	to	store	the	result	of
the	CamShift	function	in	a	RotatedRect	object	first,	as	seen	in	the	following	example:
  RotatedRect	rotRect	=	CamShift(backProject,	
  																															srchWnd,	
  																															criteria);	
To	draw	a	RotatedRect	object,	or	in	other	words	a	rotated	rectangle,	you	must	use
the	points	method	of	RotatedRect	to	extract	the	consisting	4	points	of	the	rotated
rectangle	first,	and	then	draw	them	all	using	the	line	function,	as	seen	in	the
following	example:
  Point2f	rps[4];	
  rotRect.points(rps);	
  for(int	i=0;	i<4;	i++)	
  				line(frame,	
  									rps[i],	
  									rps[(i+1)%4],	
  									Scalar(255,0,0),//	blue	color	
  									2);	
You	can	also	use	a	RotatedRect	object	to	draw	a	rotated	ellipse	that	is	covered	by
the	rotated	rectangle.	Here's	how:
  ellipse(frame,	
  								rotRect,	
  								Scalar(255,0,0),	
  								2);	
The	following	image	displays	the	result	of	using	the	RotatedRect	object	to	draw	a
rotated	rectangle	and	ellipse	at	the	same	time,	over	the	tracked	object:
In	the	preceding	image,	the	red	rectangle	is	the	search	window,	the	blue
rectangle	is	the	resulting	rotated	rectangle,	and	the	green	ellipse	is	drawn	by
using	the	resulting	rotated	rectangle.
To	summarize,	we	can	say	that	CamShift	is	far	better	suited	to	dealing	with	objects
of	varying	size	and	rotation	than	meanShift,	however,	there	are	still	a	couple	of
possible	enhancements	that	can	be	done	when	using	the	CamShift	algorithm.	First
things	first,	the	initial	window	size	still	needs	to	be	set,	but	since	CamShift	is
taking	care	of	the	size	changes,	then	we	can	simply	set	the	initial	window	size	to
be	the	same	as	the	whole	image	size.	This	would	help	us	avoid	having	to	deal
with	the	initial	position	and	size	of	the	search	window.	If	we	can	also	create	the
histogram	of	the	object	of	interest	using	a	previously	saved	file	on	disk	or	any
similar	method,	then	we	will	have	an	object	detector	and	tracker	that	works	out
of	the	box,	at	least	for	all	the	cases	where	our	object	of	interest	has	a	visibly
different	color	than	the	environment.
In	the	preceding	example	code,	the	variables	starting	with	lb	and	hb	refer	to	the
lower	bound	and	higher	bound	of	the	values	that	are	allowed	to	pass	the	inRange
function.	objImgHsv	is	obviously	a	Mat	object	containing	our	object	of	interest,	or	a
ROI	that	contains	our	object	of	interest.	objImgHue	is	the	first	channel	of	objImgHsv,
which	is	extracted	using	a	previous	call	to	the	split	function.	The	rest	of	the
parameters	are	nothing	new,	and	you've	already	used	them	in	previous	calls	to
the	functions	used	in	this	example.
Combining	all	of	the	algorithms	and	techniques	described	in	this	section	can
help	you	create	an	object-detector,	or	even	a	face-detector	and	tracker	that	can
work	in	realtime	and	with	stunning	speed.	However,	you	might	still	need	to
account	for	the	noise	that	will	interfere,	especially	with	the	tracking,	which	is
almost	inevitable	because	of	the	nature	of	color-based	or	histogram-based
trackers.	One	of	the	most	widely	used	solutions	to	these	issues	is	the	subject	of
the	next	section	in	this	chapter.
Using	the	Kalman	filter	for	tracking
and	noise	reduction
The	Kalman	filter	is	a	popular	algorithm	that	is	used	for	reducing	the	noise	of	a
signal,	such	as	the	result	of	the	tracking	algorithm	that	we	used	in	the	preceding
section.	To	be	precise,	the	Kalman	filter	is	an	estimation	algorithm	that	is	used	to
predict	the	next	state	of	a	signal	based	on	previous	observations.	Digging	deep
into	the	definition	and	details	of	the	Kalman	filter	would	require	a	chapter	of	its
own,	but	we'll	try	to	walk	through	this	simple,	yet	extremely	powerful,	algorithm
with	a	couple	of	hands-on	examples	to	learn	how	it	is	used	in	practice.
For	the	first	example,	we're	going	to	write	a	program	that	tracks	the	mouse
cursor	while	it	is	moved	on	a	canvas,	or	the	OpenCV	window.	The	Kalman	filter
is	implemented	using	the	KalmanFilter	class	in	OpenCV	and	it	includes	all	(and
many	more)	of	the	Kalman	filter	implementation	details,	which	we'll	discuss	in
this	section.
     X,	or	position	in	x	direction
     Y,	or	position	in	y	direction
     X',	or	velocity	in	x	direction
     Y',	or	velocity	in	y	direction
        A	higher	dimensionality	of	the	parameters	can	also	be	used,	which	would	then	cause	the
        preceding	list	to	be	followed	by	X''	(acceleration	in	x	direction)	and	so	on.
Note	that	in	this	example,	control	parameters	and	type	parameters	are	simply
ignored	and	set	to	their	default	values,	otherwise	we	could	have	written	the	same
code	as	seen	here:
  KalmanFilter	kalman(4,	2,	0,	CV_32F);	
After	completing	the	steps	required	for	this	example,	it	would	be	wise	to	return
here	and	update	the	values	in	the	transition	matrix	and	observe	the	behavior	of
the	Kalman	filter.	For	instance,	try	to	update	the	matrix	row	that	corresponds	to
the	estimated	Y	(marked	as	next	y	in	the	comments)	and	you'll	notice	that	the
tracked	position	Y	value	is	affected	by	it.	Try	experimenting	with	all	of	the
values	in	the	transition	matrix	for	a	better	understanding	of	its	effect.
Besides	the	transition	matrix,	we	also	need	to	take	care	of	the	initialization	of	the
dynamic	parameters'	state	and	measurements,	which	are	the	initial	mouse
positions	in	our	example.	Here's	how	we	initialize	the	mentioned	values:
  Mat_<float>	pos(2,1);	
  pos.at<float>(0)	=	0;	
  pos.at<float>(1)	=	0;	
  	
  kalman.statePre.at<float>(0)	=	0;	//	init	x	
  kalman.statePre.at<float>(1)	=	0;	//	init	y	
  kalman.statePre.at<float>(2)	=	0;	//	init	x'	
  kalman.statePre.at<float>(3)	=	0;	//	init	y'
As	you'll	see	later	on,	the	KalmanFilter	class	requires	a	vector	instead	of	a	Point
object,	since	it	is	designed	to	work	with	higher	dimensionalities	too.	For	this
reason,	we'll	update	the	pos	vector	in	the	preceding	code	snippet	with	the	last
mouse	positions	before	performing	any	calculations.	Other	than	the
initializations	that	we	just	mentioned,	we	also	need	to	initialize	the	measurement
matrix	of	the	Kalman	filter.	This	is	done	as	seen	here:
  setIdentity(kalman.measurementMatrix);	
provided	in	addition,	then	all	elements	of	the	identity	matrix	will	be	multiplied
(or	scaled)	using	the	given	Scalar	value.
One	last	initialization	is	the	process	noise	covariance.	We'll	use	a	very	small
value	for	this,	which	causes	a	tracking	with	a	natural-movement	feeling,
although	with	a	little	bit	of	overshoot	when	tracking.	Here's	how	we	initialize	the
process-noise	covariance	matrix:
  setIdentity(kalman.processNoiseCov,	
  												Scalar::all(0.000001));	
It	is	also	common	to	initialize	the	following	matrices	before	using	the	KalmanFilter
class:
Using	all	of	the	matrices	mentioned	in	the	preceding	list	will	provide	a	huge
amount	of	customization	to	the	KalmanFilter	class,	but	it	also	requires	a	great
amount	of	knowledge	about	the	type	of	noise-filtering	and	tracking	that	is
needed	and	the	environment	that	the	filter	will	be	implemented	in.	These
matrices	and	their	usage	have	deep	roots	in	control	theory,	and	control	science	in
general,	which	is	a	topic	for	another	book.	Note	that	in	our	example,	we'll	simply
use	the	default	values	of	the	mentioned	matrices,	thus	we	have	ignored	them
altogether.
The	next	thing	we	need	in	our	tracking	example	using	the	Kalman	filter	is	to	set
up	a	window	on	which	we	can	track	mouse	movements.	We	are	assuming	that
the	position	of	the	mouse	on	the	window	is	the	position	of	a	detected	and	tracked
object	and	we'll	use	our	KalmanFilter	object	to	predict	and	de-noise	these
detections,	or	in	Kalman-filter-algorithm	terminology,	we	are	going	to	correct
these	measurements.	We	can	use	the	namedWindow	function	to	create	a	window
using	OpenCV.	Consequently,	the	setMouseCallback	function	can	be	used	to	assign	a
callback	function	for	mouse	interactions	with	that	specific	window.	Here's	how
we	can	do	it:
  string	window	=	"Canvas";	
  namedWindow(window);	
  setMouseCallback(window,	onMouse);	
We've	used	the	word	Canvas	for	the	window,	but	obviously	you	can	use	any	other
name	you	like.	onMouse	is	the	callback	function	that	will	be	assigned	to	react	on
mouse	interactions	with	this	window.	It	is	defined	like	this:
  void	onMouse(int,	int	x,	int	y,	int,	void*)	
  {	
  				objectPos.x	=	x;	
  				objectPos.y	=	y;	
  }	
objectPos	in	the	preceding	code	snippet	is	the	Point	object	that	is	used	to	store	the
last	position	of	the	mouse	on	the	window.	It	needs	to	be	defined	globally	in	order
to	be	accessible	by	both	onMouse	and	the	main	function	in	which	we'll	use	the
KalmanFilter	class.	Here's	the	definition:
Point objectPos;
Now,	for	the	actual	tracking,	or	to	use	the	more	correct	terminology,	correcting
the	measurements	that	contain	noise,	we	need	to	use	the	following	code,	which
is	followed	by	the	required	explanations:
  vector<Point>	trackRoute;	
  while(waitKey(10)	<	0)	
  {	
  				//	empty	canvas	
  				Mat	canvas(500,	1000,	CV_8UC3,	Scalar(255,	255,	255));	
  	
  				pos(0)	=	objectPos.x;	
  				pos(1)	=	objectPos.y;	
  	
  				Mat	estimation	=	kalman.correct(pos);	
  	
  				Point	estPt(estimation.at<float>(0),	
  																estimation.at<float>(1));	
  	
  				trackRoute.push_back(estPt);	
  				if(trackRoute.size()	>	100)	
  								trackRoute.erase(trackRoute.begin());	
  	
  				polylines(canvas,	
  														trackRoute,	
  														false,	
  														Scalar(0,0,255),	
  														5);	
  	
  				imshow(window,	canvas);	
  	
  				kalman.predict();	
  }	
In	the	preceding	code,	we	are	using	the	trackRoute	vector	to	record	the	estimations
over	the	last	100	frames.	Pressing	any	key	will	cause	the	while	loop,	and
consequently	the	program,	to	return.	Inside	the	loop,	and	where	we	actually	use
the	KalmanFilter	class,	we	simply	perform	the	following	operations	in	order:
 1.	 Create	an	empty	Mat	object	to	be	used	as	a	canvas	to	draw,	and	for	the
     content	of	the	window	on	which	the	tracking	will	happen
 2.	 Read	the	objectPos,	which	contains	the	last	position	of	the	mouse	on	the
     window	and	store	it	in	the	pos	vector,	which	is	usable	with	the	KalmanFilter
     class
 3.	 Read	an	estimation	using	the	correct	method	of	the	KalmanFilter	class
 4.	 Convert	the	result	of	the	estimation	back	to	a	Point	object	that	can	be	used
     for	drawing
 5.	 Store	the	estimated	point	(or	the	tracked	point)	in	the	trackRoute	vector,	and
     make	sure	the	number	of	items	in	the	trackRoute	vector	doesn't	exceed	100,
     since	that	is	the	number	of	frames	for	which	we	want	to	keep	a	record	of
     estimated	points
 6.	 Use	the	polylines	function	to	draw	the	route,	stored	as	Point	objects	in
     trackRoute
 7.	 Display	the	results	using	the	imshow	function
 8.	 Update	the	internal	matrices	of	the	KalmanFilter	class	using	the	predict
     function
Try	executing	the	tracking	program	and	move	your	mouse	cursor	around	the
window	that	is	shown.	You'll	notice	a	smooth	tracking	result,	which	is	drawn
using	the	thick	red	line	that's	visible	in	the	following	screenshot:
Note	that	the	position	of	the	mouse	cursor	is	ahead	of	the	tracking,	and	the	noise
in	mouse	movement	is	almost	completely	removed.	It	is	a	good	idea	to	try	to
visualize	the	mouse	movement	for	a	better	comparison	between	the	KalmanFilter
results	and	actual	measurements.	Simply	add	the	following	code	to	the	loop	after
the	point	where	trackRoute	was	drawn	in	the	preceding	code:
  mouseRoute.push_back(objectPos);	
  if(mouseRoute.size()	>	100)	
  				mouseRoute.erase(mouseRoute.begin());	
  polylines(canvas,	
  										mouseRoute,	
  										false,	
  										Scalar(0,0,0),	
  										2);	
And	obviously,	you	need	to	define	the	mouseRoute	vector	before	entering	the	while
loop,	as	seen	here:
  vector<Point>	mouseRoute;	
Let's	try	the	same	application	with	this	minor	update	and	see	how	the	results
compare	to	each	other.	Here's	another	screenshot	depicting	the	results	of	actual
mouse	movements	and	corrected	movements	(or	tracked,	or	filtered,	depending
on	the	terminology)	drawn	in	the	same	window:
In	the	preceding	result,	the	arrow	is	simply	used	to	depict	the	overall	direction	of
an	extremely	noisy	measurement	(mouse	movement	or	detected	object	position,
depending	on	the	application),	which	is	drawn	using	a	thin	black	color,	and	the
corrected	results	using	the	Kalman	filter	algorithm,	which	are	drawn	with	a	thick
red	line	in	the	image.	Try	moving	you	mouse	around	and	comparing	the	results
visually.	Remember	what	we	mentioned	about	the	KalmanFilter	internal	matrices
and	how	you	need	to	set	their	values	according	to	the	use	case	and	application?
For	instance,	a	bigger	process-noise	covariance	would	have	resulted	in	less	de-
noising	and	consequently	less	filtering.	Let's	try	setting	the	process	noise
covariance	value	to	0.001,	instead	of	the	previous	0.000001	value,	try	the	same
program	once	again,	and	compare	the	results.	Here's	how	you	can	set	the
process-noise	covariance:
  setIdentity(kalman.processNoiseCov,	
  												Scalar::all(0.001));	
Now,	try	running	the	program	once	again	and	you	can	easily	notice	that	less	de-
noising	is	happening	as	you	move	the	mouse	cursor	around	the	window:
By	now,	you	can	probably	guess	that	the	result	of	setting	an	extremely	high
value	for	process-noise	covariance	would	be	almost	the	same	as	using	no	filter	at
all.	This	is	the	reason	why	setting	correct	values	for	the	Kalman	filter	algorithm
is	extremely	important	and	also	the	reason	why	it	depends	so	much	on	the
application.	However,	there	are	methods	for	setting	most	of	those	parameters
programmatically,	and	even	dynamically	while	the	tracking	is	being	done	to
achieve	the	best	results.	For	instance,	using	a	function	that	is	able	to	determine
the	possible	amount	of	noise	at	any	given	moment,	we	can	dynamically	set	the
process	noise	covariance	to	high	or	low	values	for	less	and	more	de-noising	of
the	measurements.
You	can	refer	to	the	online	source	code	repository	of	this	chapter	for	the
complete	example	project	containing	the	preceding	code,	which	is	almost
identical	to	what	you	saw	in	the	previous	sections,	except	for	the	simple	fact	that
KalmanFilter	is	used	to	correct	the	detected	and	tracked	object	position.	As	you	can
see,	objectPos,	which	was	previously	read	from	the	mouse	movement	position,	is
now	set	to	the	central	point	of	the	search	window.	After	that,	the	correct	function
is	called	to	perform	an	estimation	and	the	results	are	displayed	by	drawing	a
green	cross	mark	for	the	corrected	tracking	result.	Besides	the	main	advantage	of
using	the	Kalman	filter	algorithm,	which	is	useful	for	getting	rid	of	noise	in
detection	and	tracking	results,	it	can	also	help	with	cases	where	detection	is
momentarily	lost	or	impossible.	Although	losing	detection	is,	technically
speaking,	an	extreme	case	of	noise,	we're	trying	to	point	out	the	difference	in
terms	of	computer	vision.
By	going	through	a	few	examples	and	also	experimenting	with	your	own
projects	where	the	Kalman	filter	can	be	of	help	and	trying	different	set	of
parameters	for	it,	you'll	instantly	understand	its	long-standing	popularity	for
whenever	a	practical	algorithm	for	correcting	a	measurement	(that	contains
noise)	is	required.	What	we	learned	in	this	section	was	a	fairly	simple	case	of
how	the	Kalman	filter	is	used	(which	was	enough	for	our	use	case),	but	it	is
important	to	note	that	the	same	algorithm	can	be	used	to	de-noise	measurements
of	higher	dimensionalities	and	with	much	more	complexity.
How	to	extract	the
background/foreground
The	segmentation	of	background	and	foreground	content	in	images	is	one	of	the
most	important	video	and	motion	analysis	topics,	and	there	has	been	a	huge
amount	of	research	done	in	this	area	to	provide	some	very	practical	and	easy-to-
use	algorithms,	which	we're	going	to	learn	in	the	final	section	of	this	chapter.
Current	versions	of	OpenCV	include	the	implementation	of	two	background
segmentation	algorithms.
        To	use	a	terminology	that	is	shorter,	clearer,	and	more	compatible	with	OpenCV	functions	and
        classes,	we'll	refer	to	background/foreground	extraction	and	background/foreground
        segmentation	simply	as	background	segmentation.
The	following	two	algorithms	are	available	by	default	to	be	used	for	background
segmentation	using	OpenCV:
     BackgroundSubtractorKNN
     BackgroundSubtractorMOG2
Now,	we	can	simply	use	bgs	to	extract	foreground	masks	from	a	video	and	use	it
to	detect	movements	or	an	object	entering	the	scene.	Here's	how:
  VideoCapture	cam(0);	
  if(!cam.isOpened())	
  				return	-1;	
  	
  while(true)	
  {	
  				Mat	frame;	
  				cam	>>	frame;	
  				if(frame.empty())	
  								break;	
  	
  				Mat	fgMask;	//	foreground	mask	
  				bgs->apply(frame,	
  															fgMask);	
  	
  				Mat	fg;	//	foreground	image	
  				bitwise_and(frame,	frame,	fg,	fgMask);	
  	
  				Mat	bg;	//	background	image	
  				bgs->getBackgroundImage(bg);	
  	
  				imshow("Input	Image",	frame);	
  				imshow("Background	Image",	bg);	
  				imshow("Foreground	Mask",	fgMask);	
  				imshow("Foreground	Image",	fg);	
  	
  				int	key	=	waitKey(10);	
  				if(key	==	27)	//	escape	key	
  								break;	
  }	
  	
  cam.release();
Let's	quickly	review	the	parts	of	the	previous	code	that	are	new	and	maybe	not
that	obvious.	First	things	first,	we	use	the	apply	function	of	the
BackgroundSubtractorKNN	class	to	perform	a	background/foreground	segmentation
operation.	This	function	also	updates	the	internal	sampled	background	image	for
us.	After	that,	we	use	the	bitwise_and	function	with	the	foreground	mask	to	extract
the	foreground	image's	content.	To	retrieve	the	sampled	background	image	itself,
we	simply	use	the	getBackgroundImage	function.	Finally,	we	display	all	of	the	results.
Here	are	some	example	results	that	depict	a	scene	(top-left),	the	extracted
background	image	(top-right),	the	foreground	mask	(bottom-left),	and	the
foreground	image	(bottom-right):
Notice	that	the	shadow	of	the	hand	that	moved	into	the	scene	is	also	captured	by
the	background	segmentation	algorithm.	In	our	example,	we	omitted	the
learningRate	parameter	when	using	the	apply	function.	This	parameter	can	be	used
to	set	the	rate	at	which	the	learned	background	model	is	updated.	A	value	of	0
means	the	model	will	not	be	updated	at	all,	which	can	be	quite	useful	if	you	are
sure	that	the	background	will	stay	the	same	for	any	known	period.	A	value	of	1.0
means	an	extremely	quick	update	of	the	model.	As	in	the	case	of	our	example,
we	skipped	this	parameter,	which	causes	it	to	use	-1.0,	and	it	means	that	the
algorithm	itself	will	decide	on	the	learning	rate.	Another	important	thing	to	note
is	that	the	result	of	the	apply	function	can	yield	an	extremely	noisy	mask,	which
can	be	smoothed	out	by	using	a	simple	blur	function,	such	as	medianBlur,	as	seen
here:
  medianBlur(fgMask,fgMask,3);	
In	the	next	chapter,	we'll	be	learning	about	many	feature	extraction	algorithms,
functions,	and	classes,	and	how	to	use	features	to	detect	objects	or	extract	useful
information	from	images	based	on	their	key	points	and	descriptors.
	
Questions
	
    1.	 All	the	examples	in	this	chapter	that	deal	with	cameras	return	when	there	is
        a	single	failed	or	corrupted	frame	that	leads	to	the	detection	of	an	empty
        frame.	What	type	of	modification	is	needed	to	allow	a	predefined	number
        of	retries	before	stopping	the	process?
    2.	 How	can	we	call	the	meanShift	function	to	perform	the	Mean	Shift	algorithm
        with	10	iterations	and	an	epsilon	value	of	0.5?
    3.	 How	can	we	visualize	the	hue	histogram	of	the	tracked	object?	Assume
        CamShift	is	used	for	tracking.
    4.	 Set	the	process-noise	covariance	in	the	KalmanFilter	class	so	that	the	filtered
        and	measured	values	overlap.	Assume	only	the	process-noise	covariance	is
        set,	of	all	the	available	matrices	for	KalmanFilter	class-behavior	control.
	
Object	Detection	–	Features	and
Descriptors
In	the	previous	chapter,	we	learned	about	processing	videos	and	how	to	perform
the	operations	and	algorithms	from	all	of	the	previous	chapters	on	frames	read
from	cameras	or	video	files.	We	learned	that	each	video	frame	can	be	treated	as
an	individual	image,	so	we	can	easily	use	algorithms,	such	as	filtering,	on	videos
in	almost	the	same	way	as	we	did	with	images.	After	learning	how	to	process
videos	using	algorithms	that	work	on	single	individual	frames,	we	moved	on	to
learn	about	video	processing	algorithms	that	require	a	set	of	consecutive	video
frames	to	perform	object	detection,	tracking,	and	so	on.	We	learned	about	how	to
use	the	magic	of	the	Kalman	filter	to	improve	object-tracking	results,	and	ended
the	chapter	by	learning	about	background	and	foreground	extraction.
The	object	detection	(and	tracking)	algorithms	that	we	learned	about	in	the
previous	chapter	rely	heavily	on	the	color	of	an	object,	which	has	proven	not	to
be	too	reliable,	especially	if	the	object	and	the	environment	we	are	working	with
are	not	controlled	in	terms	of	lighting.	We	all	know	that	the	brightness	and	color
of	an	object	can	easily	(and	sometimes	extremely)	change	under	sunlight	and
moonlight,	or	if	a	light	of	a	different	color	is	near	the	object,	such	as	a	red	traffic
light.	These	difficulties	are	the	reason	why	the	detection	of	objects	is	more
reliable	when	their	physical	shape	and	features	are	used	as	a	basis	for	object
detection	algorithms.	Obviously,	the	shape	of	an	image	is	independent	of	its
color.	A	circular	object	will	remain	circular	during	the	day	or	night,	so	an
algorithm	that	is	capable	of	extracting	the	shape	of	such	an	object	would	be	more
reliable	to	be	used	for	detecting	that	object.
In	this	chapter,	we're	going	to	learn	about	computer	vision	algorithms,	functions,
and	classes	that	can	be	used	to	detect	and	recognize	objects	using	their	features.
We'll	learn	about	a	number	of	algorithms	that	can	be	used	for	shape	extraction
and	analysis,	and	then	we'll	proceed	to	learning	about	key-point	detection	and
descriptor-extraction	algorithms.	We'll	also	learn	how	to	match	descriptors	from
two	images	to	detect	objects	of	known	shapes	in	an	image.	In	addition	to	the
topics	that	we	just	mentioned,	this	chapter	will	also	include	the	required
functions	for	proper	visualization	of	key	points	and	matching	results.
Refer	to	Chapter	2,	Getting	Started	with	OpenCV,	for	more	information	about	how
to	set	up	a	personal	computer	and	make	it	ready	for	developing	computer	vision
applications	using	the	OpenCV	library.
You	can	use	the	following	URL	to	download	the	source	code	and	examples	for
this	chapter:
https://github.com/PacktPublishing/Hands-On-Algorithms-for-Computer-Vision/tree/master/Cha
     .
pter07
	
Template	matching	for	object
detection
Before	we	start	with	the	shape-analysis	and	feature-analysis	algorithms,	we	are
going	to	learn	about	an	easy-to-use,	extremely	powerful	method	of	object
detection	called	template	matching.	Strictly	speaking,	this	algorithm	does	not
fall	into	the	category	of	algorithms	that	use	any	knowledge	about	the	shape	of	an
object,	but	it	uses	a	previously	acquired	template	image	of	an	object	that	can	be
used	to	extract	a	template-matching	result	and	consequently	objects	of	known
look,	size,	and	orientation.	You	can	use	the	matchTemplate	function	in	OpenCV	to
perform	a	templating-matching	operation.	Here's	an	example	that	demonstrates
the	complete	usage	of	the	matchTemplate	function:
    Mat	object	=	imread("Object.png");	
    Mat	objectGr;	
    cvtColor(object,	objectGr,	COLOR_BGR2GRAY);	
    Mat	scene	=	imread("Scene.png");	
    Mat	sceneGr;	
    cvtColor(scene,	sceneGr,	COLOR_BGR2GRAY);	
    	
    TemplateMatchModes	method	=	TM_CCOEFF_NORMED;	
    	
    Mat	result;	
    matchTemplate(sceneGr,	objectGr,	result,	method);	
     	must	be	an	entry	from	the	TemplateMatchModes	enum,	which	can	be	any	of	the
method
following values:
      TM_SQDIFF
      TM_SQDIFF_NORMED
      TM_CCORR
      TM_CCORR_NORMED
      TM_CCOEFF
      TM_CCOEFF_NORMED
For	detailed	information	about	each	template-matching	method,	you	can	refer	to
the	OpenCV	documentation.	For	our	practical	examples,	and	to	learn	how	the
matchTemplate	function	is	used	in	practice,	it	is	important	to	note	that	each	method
will	result	in	a	different	type	of	result,	and	consequently	a	different	interpretation
of	the	result	is	required,	which	we'll	learn	about	in	this	section.	In	the	preceding
example,	we	are	trying	to	detect	an	object	in	a	scene	by	using	an	object	image
and	a	scene	image.	Let's	assume	the	following	images	are	the	object	(left-hand
side)	and	the	scene	(right-hand	side)	that	we'll	be	using:
The	very	simple	idea	in	template	matching	is	that	we	are	searching	for	a	point	in
the	scene	image	on	the	right-hand	side	that	has	the	highest	possibility	of
containing	the	image	on	the	left-hand	side,	or	in	other	words,	the	template
image.	The	matchTemplate	function,	depending	on	the	method	that	is	used,	will
provide	a	probability	distribution.	Let's	visualize	the	result	of	the	matchTemplate
function	to	better	understand	this	concept.	Another	important	thing	to	note	is	that
we	can	only	properly	visualize	the	result	of	the	matchTemplate	function	if	we	use
any	of	the	methods	ending	with	_NORMED,	which	means	they	contain	a	normalized
result,	otherwise	we	have	to	use	the	normalize	method	to	create	a	result	that
contains	values	in	the	displayable	range	of	the	OpenCV	imshow	function.	Here	is
how	it	can	be	done:
  normalize(result,	result,	0.0,	1.0,	NORM_MINMAX,	-1);	
This	function	call	will	translate	all	the	values	in	result	to	the	range	of	0.0	and	1.0,
which	can	then	be	properly	displayed.	Here	is	how	the	resulting	image	will	look
if	it	is	displayed	using	the	imshow	function:
As	mentioned	previously,	the	result	of	the	matchTemplate	function	and	how	it
should	be	interpreted	depends	completely	on	the	template	matching	method	that
is	used.	In	the	case	that	we	use	the	TM_SQDIFF	or	TM_SQDIFF_NORMED	methods	for
template	matching,	we	need	to	look	for	the	global	minimum	point	in	the	result	(it
is	shown	using	an	arrow	in	the	preceding	image),	which	has	the	highest
possibility	of	containing	the	template	image.	Here's	how	we	can	find	the	global
minimum	point	(along	with	global	maximum,	and	so	on)	in	the	template
matching	result:
  double	minVal,	maxVal;	
  Point	minLoc,	maxLoc;	
  minMaxLoc(result,	&minVal,	&maxVal,	&minLoc,	&maxLoc);	
Since	the	template-matching	algorithm	works	only	with	objects	of	a	fixed	size
and	orientation,	we	can	assume	that	a	rectangle	that	has	an	upper-left	point	that
is	equal	to	the	minLoc	point	and	has	a	size	that	equals	the	template	image	is	the
best	possible	bounding	rectangle	for	our	object.	We	can	draw	the	result	on	the
scene	image,	for	better	comparison,	using	the	following	sample	code:
  Rect	rect(minLoc.x,	
  										minLoc.y,	
  										object.cols,	
  										object.rows);	
  	
  Scalar	color(0,	0,	255);	
  int	thickness	=	2;	
  rectangle(scene,	
  										rect,	
  										color,	
  										thickness);
The	following	image	depicts	the	result	of	the	object	detection	operation	that	was
performed	using	the	matchTemplate	function:
If	we	use	TM_CCORR,	TM_CCOEFF,	or	their	normalized	versions,	we	must	use	the	global
maximum	point	as	the	point	with	the	highest	possibility	of	containing	our
template	image.	The	following	image	depicts	the	result	of	the	TM_CCOEFF_NORMED
method	used	with	the	matchTemplate	function:
As	you	can	see,	the	brightest	point	in	the	resultant	image	corresponds	to	the
upper-left	point	of	the	template	image	in	the	scene	image.
Before	ending	our	template	matching	lesson,	let's	also	note	that	the	width	and
height	of	the	template	matching	resultant	image	is	smaller	than	the	scene	image.
This	is	because	the	template	matching	resultant	image	can	only	contain	the
upper-left	point	of	the	template	image,	so	the	template	image	width	and	height
are	subtracted	from	the	scene	image's	width	and	height	to	determine	the	resultant
image's	width	and	height	in	the	template-matching	algorithm.
	
Detecting	corners	and	edges
	
It	is	not	always	possible	to	just	compare	images	pixel-wise	and	decide	whether
an	object	is	present	in	an	image	or	not,	or	whether	an	object	has	the	expected
shape	or	not,	and	many	more	similar	scenarios	that	we	can't	even	begin	to	list
here.	That	is	why	the	smarter	way	of	interpreting	the	contents	of	an	image	is	to
look	for	meaningful	features	in	it,	and	then	base	our	interpretation	on	the
properties	of	those	features.	In	computer	vision,	a	feature	is	synonymous	with	a
keypoint,	so	don't	be	surprised	if	we	use	them	interchangeably	in	this	book.	In
fact,	the	word	keypoint	is	better	suited	to	describe	the	concept,	since	the	most
commonly	used	features	in	an	image	are	usually	key	points	in	that	image	where
there	is	a	sudden	change	in	color	intensity,	which	can	happen	in	corners	and
edges	of	shapes	and	objects	in	an	image.
In	this	section,	we'll	learn	about	some	of	the	most	important	and	widely	used
keypoint-detection	algorithms,	namely	the	corner-	and	edge-detection	algorithms
that	are	the	basis	of	almost	all	of	the	feature-based	object	detection	algorithms
we'll	be	learning	about	in	this	chapter.
	
Learning	the	Harris	corner-detection
algorithm
One	of	the	most	well-known	corner-	and	edge-detection	algorithms	is	the	Harris
corner-detection	algorithm,	which	is	implemented	in	the	cornerHarris	function	in
OpenCV.	Here	is	how	this	function	is	used:
    Mat	image	=	imread("Test.png");	
    cvtColor(image,	image,	COLOR_BGR2GRAY);	
    	
    Mat	result;	
    int	blockSize	=	2;	
    int	ksize	=	3;	
    double	k	=	1.0;	
    cornerHarris(image,	
    													result,	
    													blockSize,	
    													ksize,	
    													k);
blockSize	determines	the	width	and	height	of	the	square	block	over	which	the
Harris	corner-detection	algorithm	will	calculate	a	2	x	2	gradient-covariance
matrix.	ksize	is	the	kernel	size	of	the	Sobel	operator	internally	used	by	the	Harris
algorithm.	The	preceding	example	demonstrates	one	of	the	most	commonly	used
sets	of	Harris	algorithm	parameters,	but	for	more	detailed	information	about	the
Harris	corner-detection	algorithm	and	its	internals	mathematics,	you	can	refer	to
the	OpenCV	documentation.	It's	important	to	note	that	the	result	object	from	the
preceding	example	code	is	not	displayable	unless	it	is	normalized	using	the
following	example	code:
    normalize(result,	result,	0.0,	1.0,	NORM_MINMAX,	-1);	
Here	is	the	result	of	the	Harris	corner-detection	algorithm	from	the	preceding
example,	when	normalized	and	displayed	using	the	OpenCV	imshow	function:
The	OpenCV	library	includes	another	famous	corner-detection	algorithm,	called
Good	Features	to	Track	(GFTT).	You	can	use	the	goodFeaturesToTrack	function	in
OpenCV	to	use	the	GFTT	algorithm	to	detect	corners,	as	seen	in	the	following
example:
  Mat	image	=	imread("Test.png");	
  Mat	imgGray;	
  cvtColor(image,	imgGray,	COLOR_BGR2GRAY);	
  	
  vector<Point2f>	corners;	
  int	maxCorners	=	500;	
  double	qualityLevel	=	0.01;	
  double	minDistance	=	10;	
  Mat	mask;	
  int	blockSize	=	3;	
  int	gradientSize	=	3;	
  bool	useHarrisDetector	=	false;	
  double	k	=	0.04;	
  goodFeaturesToTrack(imgGray,	
  																				corners,	
  																				maxCorners,	
  																				qualityLevel,	
  																				minDistance,	
  																				mask,	
  																				blockSize,	
  																				gradientSize,	
  																				useHarrisDetector,	
  																				k);	
As	you	can	see,	this	function	requires	a	single-channel	image,	so,	before	doing
anything	else,	we	have	converted	our	BGR	image	to	grayscale.	Also,	this
function	uses	the	maxCorners	value	to	limit	the	number	of	detected	corners	based
on	how	strong	they	are	as	candidates,	and	setting	maxCorners	to	a	negative	value	or
to	zero	means	all	detected	corners	should	be	returned,	which	is	not	a	good	idea	if
you	are	looking	for	the	best	corners	in	an	image,	so	make	sure	you	set	a
reasonable	value	for	this	based	on	the	environment	in	which	you'll	be	using	it.
qualityLevel	is	the	internal	threshold	value	for	accepting	detected	corners.
minDistance	is	the	minimum	allowed	distance	between	returned	corners.	This	is
another	parameter	that	is	completely	dependent	on	the	environment	this
algorithm	will	be	used	in.	You	have	already	seen	the	remaining	parameters	in	the
previous	algorithms	from	this	chapter	and	the	preceding	one.	It's	important	to
note	that	this	function	also	incorporates	the	Harris	corner-detection	algorithm,
so,	by	setting	useHarrisDetector	to	true,	the	resultant	features	will	be	calculated
using	the	Harris	corner-detection	algorithm.
You	might	have	already	noticed	that	the	goodFeaturesToTrack	function	returns	a	set
of	Point	objects	(Point2f	to	be	precise)	instead	of	a	Mat	object.	The	returned	corners
vector	simply	contains	the	best	possible	corners	detected	in	the	image	using	the
GFTT	algorithm,	so	we	can	use	the	drawMarker	function	to	visualize	the	results
properly,	as	seen	in	the	following	example:
  Scalar	color(0,	0,	255);	
  MarkerTypes	markerType	=	MARKER_TILTED_CROSS;	
  int	markerSize	=	8;	
  int	thickness	=	2;	
  for(int	i=0;	i<corners.size();	i++)	
  {	
  				drawMarker(image,	
  															corners[i],	
  															color,	
  															markerType,	
  															markerSize,	
  															thickness);	
  }
Here	is	the	result	of	the	preceding	example	and	detecting	corners	using	the
goodFeaturesToTrack	function:
You	can	also	use	the	GFTTDetector	class	to	detect	corners	in	a	similar	way	as	you
did	with	the	goodFeaturesToTrack	function.	The	difference	here	is	that	the	returned
type	is	a	vector	of	KeyPoint	objects.	Many	OpenCV	functions	and	classes	use	the
KeyPoint	class	to	return	various	properties	of	detected	keypoints,	instead	of	just	a
Point	object	that	corresponds	to	the	location	of	the	keypoint.	Let's	see	what	this
But	let's	get	back	to	the	KeyPoint	class	and	the	result	of	the	detect	function	from
the	previous	example.	Recall	that	we	used	a	loop	to	go	through	all	of	the
detected	points	and	draw	them	on	the	image.	There	is	no	need	for	this	if	we	use
the	GFTTDetector	class,	since	we	can	use	an	existing	OpenCV	function	called
drawKeypoints	to	properly	visualize	all	of	the	detected	keypoints.	Here's	how	this
function	is	used:
    Mat	outImg;	
    drawKeypoints(image,	
    														keypoints,	
    														outImg);	
The	drawKeypoints	function	goes	through	all	KeyPoint	objects	in	the	keypoints	vector
and	draws	them	using	random	colors	on	image	and	saves	the	result	in	the	outImg
object,	which	we	can	then	display	by	calling	the	imshow	function.	The	following
image	is	the	result	of	the	drawKeypoints	function	when	it	is	called	using	the
preceding	example	code:
The	drawKeypoints	function	can	be	provided	with	an	additional	(optional)	color
parameter	in	case	we	want	to	use	a	specific	color	instead	of	random	colors.	In
addition,	we	can	also	provide	a	flag	parameter	that	can	be	used	to	further
enhance	the	visualized	result	of	the	detected	keypoints.	For	instance,	if	the	flag
is	set	to	DRAW_RICH_KEYPOINTS,	the	drawKeypoints	function	will	also	use	the	size	and
orientation	values	in	each	detected	keypoint	to	visualize	more	properties	of
keypoints.
        Each	KeyPoint	object	may	contain	the	following	properties,	depending	on	the	algorithm	used
        for	calculating	it:
        -	pt:	A	Point2f	object	containing	the	coordinates	of	the	keypoint.
        -	size:	The	diameter	of	the	meaningful	keypoint	neighborhood.
        -	angle:	The	orientation	of	the	keypoint	in	degrees,	or	-1	if	not	applicable.
        -	response:	The	strength	of	the	keypoint	determined	by	the	algorithm.
        -	octave:	The	octave	or	pyramid	layer	from	which	the	keypoint	was	extracted.	Using	octaves
        allows	us	to	deal	with	keypoints	detected	from	the	same	image	but	in	different	scales.
        Algorithms	that	set	this	value	usually	require	an	input	octave	parameter,	which	is	used	to
        define	the	number	of	octaves	(or	scales)	of	an	image	that	is	used	to	extract	keypoints.
        -	class_id:	This	integer	parameter	can	be	used	to	group	keypoints,	for	instance,	when	keypoints
        belong	to	a	single	object,	they	can	have	the	same	optional	class_id	value.
In	addition	to	Harris	and	GFTT	algorithms,	you	can	also	use	the	FAST	corner-
detection	algorithm	using	the	FastFeatureDetector	class,	and	the	AGAST	corner-
detection	algorithm	(Adaptive	and	Generic	Corner	Detection	Based	on	the
Accelerated	Segment	Test)	using	the	AgastFeatureDetector	class,	quite	similar	to
how	we	used	the	GFTTDetector	class.	It's	important	to	note	that	all	of	these	classes
belong	to	the	features2d	module	in	the	OpenCV	library	and	they	are	all	subclasses
of	the	Feature2D	class,	therefore	all	of	them	contain	a	static	create	function	that
creates	an	instance	of	their	corresponding	classes	and	a	detect	function	that	can
be	used	to	extract	the	keypoints	from	an	image.
Try	increasing	the	threshold	value	if	too	many	corners	are	detected.	Also,	make
sure	to	check	out	the	OpenCV	documentation	for	more	information	about	the
type	parameter	used	in	the	FastFeatureDetector	class.	As	mentioned	previously,	you
can	simply	omit	all	of	the	parameters	in	the	preceding	example	code	to	use	the
default	values	for	all	parameters.
algorithm,	in	the	preceding	example,	is	an	integer	value	that	can	be	set	at	run-time
and	will	change	the	type	of	the	corner-detection	algorithm	assigned	to	the
detector	object,	which	has	the	Feature2D	type,	or	in	other	words,	the	base	class	of
all	corner-detection	algorithms.
Edge-detection	algorithms
Now	that	we've	gone	through	corner-detection	algorithms,	let's	take	a	look	at
edge-detection	algorithms,	which	are	crucial	when	it	comes	to	shape	analysis	in
computer	vision.	OpenCV	contains	a	number	of	algorithms	that	can	be	used	to
extract	edges	from	images.	The	first	edge-detection	algorithm	that	we're	going	to
learn	about	is	called	the	line-segment-detection	algorithm,	and	it	can	be
performed	by	using	the	LineSegmentDetector	class,	as	seen	in	the	following	example:
  Mat	image	=	imread("Test.png");	
  Mat	imgGray;	
  cvtColor(image,	imgGray,	COLOR_BGR2GRAY);	
  	
  Ptr<LineSegmentDetector>	detector	=	createLineSegmentDetector();	
  	
  vector<Vec4f>	lines;	
  detector->detect(imgGray,	
  																	lines);	
To	have	more	control	over	how	the	resultant	lines	are	visualized,	you	might	want
to	manually	draw	the	lines	vector,	as	seen	in	the	following	example:
  Mat	result(image.size(),	
  											CV_8UC3,	
  											Scalar(0,	0,	0));	
  	
  Scalar	color(0,0,255);	
  int	thickness	=	2;	
  for(int	i=0;	i<lines.size();	i++)	
  {	
  				line(result,	
  									Point(lines.at(i)[0],	
  												lines.at(i)[1]),	
  									Point(lines.at(i)[2],	
  												lines.at(i)[3]),	
  									color,	
  									thickness);	
  }	
For	more	details	about	how	to	customize	the	behavior	of	the	LineSegmentDetector
class,	make	sure	to	view	the	documentation	of	createLineSegmentDetector	and	its
parameters.	In	our	example,	we	simply	omitted	all	of	its	input	parameters	and
used	the	LineSegmentDetector	class	with	the	default	values	set	for	its	parameters.
In	the	preceding	code,	imageSize	is	a	Size	object	that	contains	the	size	of	the	input
image	where	the	lines	were	extracted	from.	The	result	is	an	integer	value	that
contains	the	result	of	the	comparison	function,	or	the	compareSegments	function,
which	will	be	zero	in	the	case	of	the	complete	overlapping	of	pixels.
The	next	edge-detection	algorithm	is	probably	one	of	the	most	widely	used	and
cited	edge-detection	algorithms	in	computer	vision,	called	the	Canny
algorithm,	which	has	a	function	of	the	same	name	in	OpenCV.	The	biggest
advantage	of	the	Canny	function	is	the	simplicity	of	its	input	parameters.	Let's	first
see	an	example	of	how	it's	used,	and	then	walk	through	its	details:
  Mat	image	=	imread("Test.png");	
  	
  double	threshold1	=	100.0;	
  double	threshold2	=	200.0;	
  int	apertureSize	=	3;	
  bool	L2gradient	=	false;	
  Mat	edges;	
  Canny(image,	
  						edges,	
  						threshold1,	
  						threshold2,	
  						apertureSize,	
  						L2gradient);	
The	threshold	values	(threshold1	and	threshold2)	are	the	lower	and	higher	bound
values	for	thresholding	the	input	image.	apertureSize	is	the	internal	Sobel	operator
aperture	size,	and	L2gradient	is	used	to	enable	or	disable	the	more	accurate	L2
norm	when	calculating	the	gradient	image.	The	result	of	the	Canny	function	is	a
grayscale	image	that	contains	white	pixels	where	edges	are	detected	and	black
pixels	for	the	rest	of	the	pixels.	This	makes	the	result	of	the	Canny	function	a
suitable	mask	wherever	such	a	mask	is	needed,	or,	as	you'll	see	later	on,	a
suitable	set	of	points	to	extract	contours	from.
The	following	image	depicts	the	result	of	the	Canny	function	used	in	the	previous
example:
As	we	mentioned	before,	the	result	of	the	Canny	function	is	suitable	to	use	as	the
input	to	algorithms	that	require	a	binary	image,	or	in	other	words,	a	grayscale
image	containing	only	absolute	black	and	absolute	white	pixel	values.	The	next
algorithm	that	we'll	learn	about	is	one	where	the	result	of	a	previous	Canny
function	must	be	used	as	the	input,	and	it	is	called	the	Hough	transformation.
The	Hough	transformation	can	be	used	to	extract	lines	from	an	image,	and	it	is
implemented	in	a	function	called	HoughLines	in	the	OpenCV	library.
Here	is	a	complete	example	that	demonstrates	how	the	HoughLines	function	is	used
in	practice:
 1.	 Call	the	Canny	function	to	detect	edges	in	the	input	image,	as	seen	here:
         Mat	image	=	imread("Test.png");	
         	
         double	threshold1	=	100.0;	
         double	threshold2	=	200.0;	
         int	apertureSize	=	3;	
         bool	L2gradient	=	false;	
         Mat	edges;	
         Canny(image,	
         						edges,	
         						threshold1,	
         						threshold2,	
         						apertureSize,	
         						L2gradient);	
 2.	 Call	the	HoughLines	function	to	extract	lines	from	the	detected	edges:
         vector<Vec2f>	lines;	
         double	rho	=	1.0;	//	1	pixel,	r	resolution	
         double	theta	=	CV_PI	/	180.0;	//	1	degree,	theta	resolution	
         int	threshold	=	100;	//	minimum	number	of	intersections	to	"detect"	a	line	
         HoughLines(edges,	
         											lines,	
         											rho,	
         											theta,	
         											threshold);	
 3.	 Use	the	following	code	to	extract	points	in	the	standard	coordinate	system,
     and	draw	them	over	the	input	image:
         Scalar	color(0,0,255);	
         int	thickness	=	2;	
         for(int	i=0;	i<lines.size();	i++)	
         {	
         				float	rho	=	lines.at(i)[0];	
         				float	theta	=	lines.at(i)[1];	
         				Point	pt1,	pt2;	
         				double	a	=	cos(theta);	
         				double	b	=	sin(theta);	
         				double	x0	=	a*rho;	
         				double	y0	=	b*rho;	
         				pt1.x	=	int(x0	+	1000*(-b));	
         				pt1.y	=	int(y0	+	1000*(a));	
         				pt2.x	=	int(x0	-	1000*(-b));	
         				pt2.y	=	int(y0	-	1000*(a));	
         				line(	image,	pt1,	pt2,	color,	thickness);	
         }	
The	following	images	depict	the	result	of	the	preceding	example	from	left	to
right,	starting	with	the	original	image,	edges	detected	using	the	Canny	function,
lines	detected	using	the	HoughLines	function,	and	finally	the	output	image:
To	avoid	having	to	deal	with	the	coordinate-system	change,	you	can	use	the
HoughLinesP	function	to	directly	extract	the	points	forming	each	detected	line.
Here's	an	example:
  vector<Vec4f>	lines;	
  double	rho	=	1.0;	//	1	pixel,	r	resolution	
  double	theta	=	CV_PI	/	180.0;	//	1	degree,	theta	resolution	
  int	threshold	=	100;	//	minimum	number	of	intersections	to	"detect"	a	line	
  HoughLinesP(edges,	
  												lines,	
  												rho,	
  												theta,	
  												threshold);	
  	
  Scalar	color(0,0,255);	
  int	thickness	=	2;	
  for(int	i=0;	i<lines.size();	i++)	
  {	
  				line(image,	
  									Point(lines.at(i)[0],	
  												lines.at(i)[1]),	
  									Point(lines.at(i)[2],	
  												lines.at(i)[3]),	
  									color,	
  									thickness);	
  }	
The	following	image	depicts	two	contours	that	are	extracted	from	two	3D
objects:
 2.	 Use	the	findContours	function	to	calculate	the	contours	using	the	detected
     edges.	It's	worth	noting	that	each	contour	is	a	vector	of	Point	objects,	making
       all	contours	a	vector	of	vector	of	Point	objects,	as	seen	here:
           vector<vector<Point>	>	contours;	
           int	mode	=	CV_RETR_TREE;	
           int	method	=	CV_CHAIN_APPROX_TC89_KCOS;	
           findContours(edges,	
           													contours,	
           													mode,	
           													method);
The	following	image	demonstrates	the	result	of	the	Canny	and	findContours
functions:
Note	the	different	colors	in	the	image	on	the	right-hand	side,	which	correspond
to	one	complete	contour	detected	by	using	the	findContours	function.
You	can	use	area	as	a	threshold	for	ignoring	the	detected	contours	that	do	not
pass	certain	criteria.	For	example,	in	the	preceding	example	code	where	we	used
the	drawContours	function,	we	could	get	rid	of	contours	with	smaller	areas	than
some	predefined	threshold	value.	Here's	an	example:
    for(	int	i	=	0;	i<	contours.size();	i++	)	
    {	
    				if(contourArea(contours[i])	>	thresholdArea)	
    				{	
    								drawContours(result,	
    																					contours,	
    																					i,	
    																					color,	
    																					thickness);	
    				}	
    }	
If	the	result	is	zero,	it	means	the	test	point	is	right	on	the	edge	of	the	contour.	A
negative	result	would	mean	the	test	point	is	outside,	and	a	positive	result	would
mean	the	test	point	is	inside	the	contour.	The	value	itself	is	the	distance	between
the	test	point	and	the	nearest	contour	edge.
To	be	able	to	check	whether	a	contour	is	convex	or	not,	you	can	use	the
isContourConvex	function,	as	seen	in	the	following	example:
Being	able	to	compare	two	contours	with	each	other	is	probably	the	most
essential	algorithm	that	you'll	need	when	dealing	with	contour	and	shape
analysis.	You	can	use	the	matchShapes	function	in	OpenCV	to	compare	and	try	to
match	two	contours.	Here	is	how	this	function	is	used:
    ShapeMatchModes	method	=	CONTOURS_MATCH_I1;	
    double	result	=	matchShapes(cntr1,	cntr2,	method,	0);	
     	can	take	any	of	the	following	values,	while	the	last	parameter	must	always
method
be	set	to	zero,	unless	specified	by	the	method	used:
      CONTOURS_MATCH_I1
      CONTOURS_MATCH_I2
      CONTOURS_MATCH_I3
For	details	about	the	mathematical	difference	between	the	preceding	list	of
contour-matching	methods,	you	can	refer	to	the	OpenCV	documentation.
Being	able	to	find	the	boundaries	of	a	contour	is	the	same	as	being	able	to
correctly	localize	it,	for	instance	to	find	a	region	that	can	be	used	for	tracking	or
performing	any	other	computer	vision	algorithm.	Let's	assume	we	have	the
following	image	and	its	single	contour	detected	using	the	findContours	function:
Having	this	contour,	we	can	perform	any	of	the	contour-	and	shape-analysis
algorithms	that	we've	learned	about.	In	addition,	we	can	use	a	number	of
OpenCV	functions	to	localize	the	extracted	contour.	Let's	start	with	the
boundingRect	function,	which	is	used	to	find	the	minimal	upright	rectangle	(Rect
object)	that	contains	a	given	point	set	or	contour.	Here's	how	this	function	is
used:
    Rect	br	=	boundingRect(contour);
The	following	is	the	result	of	drawing	the	upright	rectangle	acquired	by	using
boundingRect	in	the	preceding	sample	code:
Similarly,	you	can	use	the	minAreaRect	function	to	find	the	minimal	rotated
rectangle	that	contains	a	given	set	of	points	or	a	contour.	Here's	an	example:
  RotatedRect	br	=	minAreaRect(contour);	
You	can	use	the	following	code	to	visualize	the	resultant	rotated	rectangle:
  Point2f	points[4];	
  br.points(points);	
  for	(int	i=0;	i<4;	i++)	
  				line(image,	
  									points[i],	
  									points[(i+1)%4],	
  									Scalar(0,0,255),	
  									2);	
You	can	draw	an	ellipse	instead,	using	the	ellipse	function,	or	you	can	do	both,
which	would	result	in	something	similar	to	the	following:
In	addition	to	algorithms	for	finding	the	minimal	upright	and	rotated	bounding
rectangles	of	contours,	you	can	also	use	the	minEnclosingCircle	and
minEnclosingTriangle	functions	to	find	the	minimal	bounding	circle	and	rectangle	of
a	given	set	of	points	or	a	contour.	Here's	an	example	of	how	these	functions	can
be	used:
  //	to	detect	the	minimal	bounding	circle	
  Point2f	center;	
  float	radius;	
  minEnclosingCircle(contour,	center,	radius);	
  	
  //	to	detect	the	minimal	bounding	triangle	
  vector<Point2f>	triangle;	
  minEnclosingTriangle(contour,	triangle);	
There	is	no	end	to	the	list	of	possible	use	cases	of	contours,	but	we	will	name
just	a	few	of	them	before	moving	on	to	the	next	section.	You	can	try	using
contour-detection	and	shape-analysis	algorithms	in	conjunction	with
thresholding	algorithms	or	back-projection	images,	for	instance,	to	make	sure
your	tracking	algorithm	uses	the	shape	information	in	addition	to	the	color	and
intensity	values	of	pixels.	You	can	also	use	contours	to	count	and	analyze	shapes
of	objects	on	a	production	line,	where	the	background	and	the	visual
environment	is	more	controlled.
The	final	section	of	this	chapter	will	teach	you	how	to	use	feature	detection,
descriptor	extraction,	and	descriptor-matching	algorithms	to	detect	known
objects,	but	with	rotation,	scale,	and	even	perspective	invariance.
Detecting,	descripting,	and	matching
features
As	we	learned	earlier	in	this	chapter,	features	or	keypoints	can	be	extracted	from
images	using	various	feature-extraction	(detection)	algorithms,	most	of	which
rely	on	detecting	points	with	a	significant	change	in	intensity,	such	as	corners.
Detecting	the	right	keypoints	is	the	same	as	being	able	to	correctly	determine
which	parts	of	an	image	are	helpful	in	identifying	it.	But	just	a	keypoint,	or	in
other	words	the	location	of	a	significant	point	in	an	image,	by	itself	is	not	useful.
One	might	argue	that	the	collection	of	keypoint	locations	in	an	image	is	enough,
but	even	then,	another	object	with	a	totally	different	look	can	have	keypoints	in
the	exact	same	locations	in	an	image,	say,	by	chance.
xfeatures2d	(extra	module)	module.	You	should	use	these	algorithms	with	care	and
always	refer	to	the	OpenCV	documentation,	since	some	of	them	are	actually
patented	and	require	permission	from	their	owners	to	be	used	in	commercial
projects.	The	following	is	a	list	of	some	of	the	main	feature-detector	and
descriptor-extractor	algorithms	that	are	included	in	OpenCV	by	default:
All	of	these	algorithms	are	implemented	in	classes	of	exactly	the	same	title	in
OpenCV,	and	to	repeat	once	more,	they	are	all	subclasses	of	the	Feature2D	class.
They	are	extremely	simple	to	use,	especially	when	no	parameters	are	modified.
In	all	of	them,	you	can	simply	use	the	static	create	method	to	create	an	instance
of	them,	call	the	detect	method	to	detect	the	keypoints,	and	finally	call	computer	to
extract	descriptors	of	the	detected	keypoints.
     FLANNBASED
     BRUTEFORCE
     BRUTEFORCE_L1
     BRUTEFORCE_HAMMING
     BRUTEFORCE_HAMMINGLUT
     BRUTEFORCE_SL2
You	can	use	the	DescriptorMatcher	class,	or	its	subclasses,	namely	BFMatcher	and
FlannBasedMatcher,	to	perform	various	matching	algorithms.	You	simply	need	to	use
the	static	create	method	of	these	classes	to	create	an	instance	of	them,	and	then
use	the	match	method	to	match	two	sets	of	descriptors.
Let's	walk	through	all	of	what	we've	discussed	in	this	section	with	a	complete
example,	since	breaking	apart	the	feature	detection,	descriptor	extraction,	and
matching	is	impossible,	and	they	are	all	parts	of	a	chain	of	processes	that	lead	to
the	detection	of	an	object	in	a	scene	using	its	features:
 1.	 Read	the	image	of	the	object,	and	the	scene	that	will	be	searched	for	the
     object,	using	the	following	code:
       Mat	object	=	imread("object.png");	
       Mat	scene	=	imread("Scene.png");	
     In	the	following	picture,	let's	assume	the	image	on	the	left	is	the	object
     we	are	looking	for,	and	the	image	on	the	right	is	the	scene	that	contains
     the	object:
2.	 Extract	the	keypoints	from	both	of	these	images,	which	are	now	stored	in
    object	and	scene.	We	can	use	any	of	the	aforementioned	algorithms	for
    feature	detection,	but	let's	assume	we're	using	KAZE	for	our	example,	as
    seen	here:
       Ptr<KAZE>	detector	=	KAZE::create();	
       vector<KeyPoint>	objKPs,	scnKPs;	
       detector->detect(object,	objKPs);	
       detector->detect(scene,	scnKPs);	
3.	 We	have	the	keypoints	of	both	the	object	image	and	the	scene	image.	We
    can	go	ahead	and	view	them	using	the	drawKeypoints	function,	as	we	learned
    previously	in	this	chapter.	Try	that	on	your	own,	and	then	use	the	same	KAZE
    class	to	extract	descriptors	from	the	keypoints.	Here's	how	it's	done:
       Mat	objDesc,	scnDesc;	
       detector->compute(object,	objKPs,	objDesc);	
       detector->compute(scene,	scnKPs,	scnDesc);	
5.	 You	can	now	try	to	visualize	the	result	of	matching	by	using	the	drawMatches
    function,	as	seen	here:
       Mat	result;	
       drawMatches(object,	
       												objKPs,	
       												scene,	
       												scnKPs,	
       												matches,	
       												result,	
       												Scalar(0,	255,	0),	//	green	for	matched	
       												Scalar::all(-1),	//	unmatched	color	(not	used)	
       												vector<char>(),	//	empty	mask	
       												DrawMatchesFlags::NOT_DRAW_SINGLE_POINTS);	
     As	you	can	see,	some	of	the	matched	features	are	obviously	incorrect,
     some	at	the	top	of	the	scene	image	and	a	few	at	the	bottom:
6.	 The	bad	matches	can	be	filtered	out	by	using	a	threshold	on	the	distance
    value	of	the	DMatch	objects.	The	threshold	value	depends	on	the	algorithm
    and	the	type	of	image	content,	but	in	our	example	case,	and	with	the	KAZE
   algorithm,	a	value	of	0.1	seems	to	be	enough	for	us.	Here's	how	the
   thresholding	is	done	to	get	good	matches	out	of	all	the	matches:
       vector<DMatch>	goodMatches;	
       double	thresh	=	0.1;	
       for(int	i=0;	i<objDesc.rows;	i++)	
       {	
       				if(matches[i].distance	<	thresh)	
       								goodMatches.push_back(matches[i]);	
       }	
       	
       if(goodMatches.size()	>	0)	
       {	
       				cout	<<	"Found	"	<<	goodMatches.size()	<<	"	good	matches.";	
       }	
       else	
       {	
       				cout	<<	"Didn't	find	a	single	good	match.	Quitting!";	
       				return	-1;	
       }	
     The	following	image	depicts	the	result	of	the	drawMatches	function	on	the
     goodMatches	vector:
7.	 Obviously,	the	result	of	the	filtered	matches	is	much	better	now.	We	can	use
    the	findHomography	function	to	find	the	transformation	between	good	matched
    keypoints	from	the	object	image	to	the	scene	image.	Here's	how:
       vector<Point2f>	goodP1,	goodP2;	
       for(int	i=0;	i<goodMatches.size();	i++)	
       {	
       				goodP1.push_back(objKPs[goodMatches[i].queryIdx].pt);	
       				goodP2.push_back(scnKPs[goodMatches[i].trainIdx].pt);	
       }	
       Mat	homoChange	=	findHomography(goodP1,	goodP2);	
    9.	 The	transformed	points	can	be	used	to	draw	four	lines	that	localize	the
        detected	object	in	the	scene	image,	as	seen	here:
            line(result,	corners2[0],	corners2[1],	Scalar::all(255),	2);	
            line(result,	corners2[1],	corners2[2],	Scalar::all(255),	2);	
            line(result,	corners2[2],	corners2[3],	Scalar::all(255),	2);	
            line(result,	corners2[3],	corners2[0],	Scalar::all(255),	2);	
It's	important	to	also	change	the	x	values	of	the	resultant	points	to	account	for	the
width	of	the	object	image,	if	you	are	going	to	draw	the	four	lines	that	localize	the
object,	over	the	drawMatches	image	result.	Here's	an	example:
    for(int	i=0;	i<4;	i++)	
    				corners2[i].x	+=	object.cols;	
The	following	image	depicts	the	final	result	of	our	detection	operation:
Make	sure	to	try	the	rest	of	the	feature-detection,	descriptor-extraction,	and
matching	algorithms	by	yourself	and	compare	their	results.	Also,	try	to	measure
the	time	of	the	calculations	for	each	one.	For	instance,	you	might	notice	that
AKAZE	is	much	faster	than	KAZE,	or	that	BRISK	is	better	suited	to	some
images,	while	KAZE	or	ORB	is	better	with	others.	As	mentioned	before,	the
feature-based	methods	of	object	detection	are	much	more	reliable	with	scale,
rotation,	and	even	perspective	change.	Try	different	views	of	the	same	objects	to
figure	out	the	best	parameters	and	algorithms	for	your	own	project	and	use	case.
For	instance,	here's	another	example	that	demonstrates	the	rotation	and	scale
invariance	of	the	AKAZE	algorithm	and	brute-force	matching:
Note	that	the	source	code	used	for	producing	the	preceding	output	is	created	by
using	exactly	the	same	set	of	instructions	we	went	through	in	this	section.
Summary
	
The	final	chapter	of	this	book	will	take	us	through	computer	vision	and	machine
learning	algorithms	in	OpenCV	and	how	they	are	employed	to	detect	objects
using	a	previously	existing	set	of	their	images,	among	many	other	interesting
artificial-intelligence-related	topics.
	
Questions
	
	
Machine	Learning	in	Computer
Vision
In	the	previous	chapters,	we	learned	about	a	number	of	algorithms	for	object
detection	and	tracking.	We	learned	how	to	use	color-based	algorithms,	such	as
Mean	Shift	and	CAM	Shift,	in	conjunction	with	histograms	and	back-projection
images	to	locate	an	object	in	an	image	with	incredible	speed.	We	also	learned
about	template	matching	and	how	it	can	be	used	to	find	objects	with	a	known
template	of	pixels	in	an	image.	All	of	these	algorithms	rely	in	one	way	or
another	on	image	properties,	such	as	brightness	or	color,	that	are	easily	affected
by	a	change	in	lighting	of	the	environment.	Based	on	these	facts,	we	moved	on
to	learn	about	algorithms	that	are	based	on	knowledge	about	significant	areas	in
an	image,	called	keypoints	or	features.	We	learned	about	many	edge-	and
keypoint-detection	algorithms	and	how	to	extract	descriptors	for	those	keypoints.
We	also	learned	about	descriptor	matchers	and	how	to	detect	an	object	in	an
image	using	good	matches	of	descriptors	extracted	from	an	image	of	the	object
of	interest	and	the	scene	where	we're	looking	for	that	object.
In	this	chapter,	we're	going	to	take	one	big	step	forward	and	learn	about
algorithms	that	can	be	used	to	extract	a	model	from	a	large	number	of	images	of
an	object,	and	later	use	that	model	to	detect	an	object	in	an	image	or	simply
classify	an	image.	Such	algorithms	are	the	meeting	point	of	machine	learning
algorithms	and	computer	vision	algorithms.	Anyone	familiar	with	artificial
intelligence	and	machine	learning	algorithms	in	general	will	have	an	easy	time
proceeding	with	this	chapter,	even	if	they	are	not	fluent	in	the	exact	algorithms
and	examples	presented	in	this	chapter.	However,	those	who	are	totally	new	to
such	concepts	will	probably	need	to	grab	another	book,	preferably	about
machine	learning,	to	familiarize	themselves	with	algorithms,	such	as	support
vector	machines	(SVM),	artificial	neural	networks	(ANN),	cascade
classification,	and	deep	learning,	which	we'll	be	learning	about	in	this	chapter.
Refer	to	Chapter	2,	Getting	Started	with	OpenCV	for	more	information	about	how
to	set	up	a	personal	computer	and	make	it	ready	for	developing	computer	vision
applications	using	the	OpenCV	library.
You	can	use	this	URL	to	download	the	source	codes	and	examples	for	this
chapter:	https://github.com/PacktPublishing/Hands-On-Algorithms-for-Computer-Vision/tree/
master/Chapter08.
	
Support	vector	machines
To	put	it	as	simply	as	possible,	SVMs	are	used	for	creating	a	model	from	a
labeled	set	of	training	samples	that	can	be	used	to	predict	the	label	of	new
samples.	For	instance,	assume	we	have	a	set	of	sample	data	belonging	to	two
different	groups.	Each	sample	in	our	training	dataset	is	a	vector	of	floating-point
numbers	that	can	correspond	to	anything,	such	as	a	simple	point	in	2D	or	3D
space,	and	each	sample	is	labeled	with	a	number,	such	as	1,	2,	or	3.	Having	such
data,	we	can	train	an	SVM	model	that	can	be	used	to	predict	the	label	of	new	2D
or	3D	points.	Let's	think	about	another	problem.	Imagine	we	have	the	data	of
temperatures	for	365	days	in	cities	from	all	the	continents	in	the	world,	and	each
vector	of	the	365	temperature	values	is	labeled	with	1	for	Asia,	2	for	Europe,	3
for	Africa,	and	so	on.	We	can	use	this	data	to	train	an	SVM	model	that	can	be
used	to	predict	the	continent	of	new	vectors	of	temperature	values	(for	365	days)
and	associate	them	with	a	label.	Even	though	these	examples	might	not	be	useful
in	practice,	they	describe	the	concept	of	SVMs.
We	can	use	the	SVM	class	in	OpenCV	to	train	and	use	SVM	models.	Let's	go
through	the	usage	of	the	SVM	class	in	detail	with	a	complete	example:
         In	this	example,	each	sample	in	our	dataset	of	eight	samples	contains	two
         floating-point	values	that	can	be	demonstrated	using	a	point	on	an	image
         with	an	x	and	y	value.
    3.	 We	also	need	to	create	the	label	(or	response)	data,	which	obviously	must
        be	the	same	length	as	the	samples.	Here	it	is:
           int	responsesA[SAMPLE_COUNT]	
           								=	{2,	2,	2,	2,	1,	2,	2,	1};	
           Mat	responses(SAMPLE_COUNT,	1,	CV_32S,	responsesA);	
         As	you	can	see,	our	samples	are	labeled	with	the	1	and	2	values,	so	we're
         expecting	our	model	to	be	able	to	differentiate	new	samples	between	the
         given	two	groups	of	samples.
    4.	 OpenCV	uses	the	TrainData	class	to	simplify	the	preparation	and	usage	of	the
        training	dataset.	Here's	how	it's	used:
           Ptr<TrainData>	data;	
           SampleTypes	layout	=	ROW_SAMPLE;	
           data	=	TrainData::create(samples,	
           																									layout,	
           																									responses);	
    5.	 Create	the	actual	SVM	class	instance.	This	class	in	OpenCV	implements
        various	types	of	SVM	classification	algorithms,	and	they	can	be	used	by
        setting	the	correct	parameters.	In	this	example,	we're	going	to	use	the	most
        basic	(and	common)	set	of	parameters	for	the	SVM	class,	but	to	be	able	to	use
        all	possible	features	of	this	algorithm,	make	sure	to	go	through	the	OpenCV
        SVM	class	documentation	pages.	Here's	an	example	that	shows	how	we	can
       use	SVM	to	perform	a	linear	n-class	classification:
            Ptr<SVM>	svm	=	SVM::create();	
            svm->setType(SVM::C_SVC);	
            svm->setKernel(SVM::LINEAR);	
            svm->setTermCriteria(	
            												TermCriteria(TermCriteria::MAX_ITER	+	
            																									TermCriteria::EPS,	
            																									100,	
            																									1e-6));	
    6.	 Train	the	SVM	model	using	the	train	(or	trainAuto)	method,	as	seen	here:
            if(!svm->train(data))	
            {	
            				cout	<<	"training	failed"	<<	endl;	
            				return	-1;	
            }	
          Based	on	the	amount	of	data	in	our	training	samples	dataset,	the	training
          process	might	take	some	time.	In	our	case,	it	should	be	fast	enough
          though,	since	we	just	used	a	handful	of	samples	to	train	the	model.
    7.	 We're	going	to	use	the	SVM	model	to	actually	predict	the	label	of	new
        samples.	Remember	that	each	sample	in	our	training	set	was	a	2D	point	in
        an	image.	We're	going	to	find	the	label	of	each	2D	point	in	an	image	with	a
        width	and	height	of	300	pixels,	and	then	color	each	pixel	as	green	or	blue,
        based	on	whether	its	predicted	label	is	1	or	2.	Here's	how:
            Mat	image	=	Mat::zeros(300,	
            																							300,	
            																							CV_8UC3);	
            Vec3b	blue(255,0,0),	green(0,255,0);	
            for	(int	i=0;	i<image.rows;	++i)	
            {	
            				for	(int	j=0;	j<image.cols;	++j)	
            				{	
            								Mat_<float>	sampleMat(1,2);	
            								sampleMat	<<	j,	i;	
            								float	response	=	svm->predict(sampleMat);	
            	
            								if	(response	==	1)	
            												image.at<Vec3b>(i,	j)		=	green;	
            								else	if	(response	==	2)	
            												image.at<Vec3b>(i,	j)		=	blue;	
            				}	
            }	
The	two	types	of	samples	(1	and	2)	are	drawn	as	black	and	white	circles	over	the
resultant	image.	The	following	diagram	depicts	the	result	of	the	complete	SVM
classification	we	just	performed:
This	demonstration	is	quite	simple	and,	in	reality,	SVM	can	be	used	for	much
more	complex	classification	problems,	however,	it	literally	shows	the	most
essential	aspect	of	SVM,	which	is	the	separation	of	various	groups	of	data	that
are	labeled	the	same.	As	you	can	see	in	the	preceding	image,	the	line	separating
the	blue	region	from	the	green	region	is	the	best	single	line	that	can	most
efficiently	separate	the	black	dots	and	white	dots	on	the	image.
You	can	experiment	with	this	phenomenon	by	updating	the	labels,	or,	in	other
words,	the	responses	in	the	preceding	example,	as	seen	here:
  int	responsesA[SAMPLE_COUNT]	
  								=	{2,	2,	2,	2,	1,	1,	2,	1};	
Trying	to	visualize	the	results	now	will	produce	something	similar	to	the
following,	which	again	depicts	the	most	efficient	line	for	separating	the	two
groups	of	dots:
You	can	very	easily	add	more	classes	to	your	data,	or,	in	other	words,	have	more
labels	for	your	training	sample	set.	Here's	an	example:
  int	responsesA[SAMPLE_COUNT]	
  								=	{2,	2,	3,	2,	1,	1,	2,	1};
We	can	try	visualizing	the	results	again	by	adding	a	yellow	color,	for	instance,
for	the	third	class	region,	and	a	gray	dot	for	training	samples	that	belong	to	that
class.	Here's	the	result	of	the	same	SVM	example	when	used	with	three	classes
instead	of	two:
If	you	recall	the	example	of	365	days	from	before,	it	is	quite	obvious	that	we	can
also	add	more	dimensionality	to	the	SVM	model	and	not	just	classes,	but	it
wouldn't	be	visually	possible	to	display	the	results	with	a	simple	image	such	as
the	one	in	the	preceding	example.
Before	continuing	with	the	usage	of	the	SVM	algorithm	for	actual	object
detection	and	image	classification,	it's	worth	noting	that,	just	like	any	other
machine	learning	algorithm,	having	more	samples	in	your	dataset	will	result	in	a
much	better	classification	and	higher	accuracy,	but	it	will	also	take	more	time	to
train	the	model.
Classifying	images	using	SVM	and
HOG
Histogram	of	Oriented	Gradients	(HOG)	is	an	algorithm	that	can	be	used	to
describe	an	image	using	a	vector	of	floating-point	descriptors	that	correspond	to
the	oriented	gradient	values	extracted	from	that	image.	The	HOG	algorithm	is
very	popular	and	certainly	worth	reading	about	in	detail	to	understand	how	it	is
implemented	in	OpenCV,	but,	for	the	purposes	of	this	book	and	especially	this
section,	we'll	just	mention	that	the	number	of	the	floating-point	descriptors	will
always	be	the	same	when	they	are	extracted	from	images	that	have	exactly	the
same	size	with	the	same	HOG	parameters.	To	better	understand	this,	recall	that
descriptors	extracted	from	an	image	using	the	feature	detection	algorithms	we
learned	about	in	the	previous	chapter	can	have	different	numbers	of	elements	in
them.	The	HOG	algorithm,	though,	will	always	produce	a	vector	of	the	same
length	if	the	parameters	are	unchanged	across	a	set	of	images	of	the	same	size.
This	makes	the	HOG	algorithm	ideal	for	being	used	in	conjunction	with	SVM,	to
train	a	model	that	can	be	used	to	classify	images.	Let's	see	how	it's	done	with	an
example.	Imagine	we	have	a	set	of	images	that	contain	images	of	a	traffic	sign	in
one	folder,	and	anything	but	that	specific	traffic	sign	in	another	folder.	The
following	pictures	depicts	the	images	in	our	samples	dataset,	separated	by	a
black	line	in	between:
Using	images	similar	to	the	preceding	samples,	we're	going	to	train	the	SVM
model	to	detect	whether	an	image	is	the	traffic	sign	we're	looking	for	or	not.
Let's	start:
          Sample	images	should	have	the	same	size	as	the	window	size,	otherwise
          we	need	to	use	the	resize	function	to	make	sure	they	are	resized	to	the
          HOG	window	size	later	on.	This	guarantees	the	same	descriptor	size
          every	time	the	HOG	algorithm	is	used.
3.	 Assuming	the	images	of	the	traffic	sign	are	inside	a	folder	called	pos	(for
    positive)	and	the	rest	inside	a	folder	called	neg	(for	negative),	we	can	use	the
    glob	function	to	get	the	list	of	image	files	in	those	folders,	as	seen	here:
        vector<String>	posFiles;	
        glob("/pos",	posFiles);	
        	
        vector<String>	negFiles;	
        glob("/neg",	negFiles);	
4.	 Create	buffers	to	store	the	HOG	descriptors	for	negative	and	positive
    sample	images	(from	pos	and	neg	folders).	We	also	need	an	additional	buffer
    for	the	labels	(or	responses),	as	seen	in	the	following	example:
        int	scount	=	posFiles.size()	+	negFiles.size();	
        	
        Mat	samples(scount,	
        												descriptorSize,	
        												CV_32F);	
        	
        Mat	responses(scount,	
        														1,	
        														CV_32S);	
5.	 We	need	to	use	the	HOGDescriptor	class	to	extract	the	HOG	descriptors	from
    positive	images	and	store	them	in	samples,	as	seen	here:
        for(int	i=0;	i<posFiles.size();	i++)	
        {	
        				Mat	image	=	imread(posFiles.at(i));	
        				if(image.empty())	
        								continue;	
        				vector<float>	descriptors;	
        				if((image.cols	!=	hog.winSize.width)	
        												||	
        												(image.rows	!=	hog.winSize.height))	
        				{	
        								resize(image,	image,	hog.winSize);	
        				}	
        				hog.compute(image,	descriptors);	
        				Mat(1,	descriptorSize,	CV_32F,	descriptors.data())	
        												.copyTo(samples.row(i));	
        				responses.at<int>(i)	=	+1;	//	positive	
        }	
    6.	 After	the	positive	samples,	we	add	the	negative	samples	and	their	responses
        to	the	designated	buffers:
            for(int	i=0;	i<negFiles.size();	i++)	
            {	
            				Mat	image	=	imread(negFiles.at(i));	
            				if(image.empty())	
            								continue;	
            				vector<float>	descriptors;	
            				if((image.cols	!=	hog.winSize.width)	
            												||	
            												(image.rows	!=	hog.winSize.height))	
            				{	
            								resize(image,	image,	hog.winSize);	
            				}	
            				hog.compute(image,	descriptors);	
            				Mat(1,	descriptorSize,	CV_32F,	descriptors.data())	
            												.copyTo(samples.row(i	+	posFiles.size()));	
            				responses.at<int>(i	+	posFiles.size())	=	-1;
            }	
    8.	 Now,	we	need	to	train	the	SVM	model	as	seen	in	the	following	example
        code:
            Ptr<SVM>	svm	=	SVM::create();	
            svm->setType(SVM::C_SVC);	
            svm->setKernel(SVM::LINEAR);	
            svm->setTermCriteria(	
            												TermCriteria(TermCriteria::MAX_ITER	+	
            																									TermCriteria::EPS,	
            																									10000,	
            																									1e-6));	
            	
            svm->train(tdata);	
After	the	training	is	completed,	the	SVM	model	is	ready	to	be	used	for
classifying	images	with	the	same	size	as	the	HOG	window	size	(in	this	case,	128
by	128	pixels)	using	the	predict	method	of	the	SVM	class.	Here	is	how:
    Mat	image	=	imread("image.jpg");	
    	
    if((image.cols	!=	hog.winSize.width)	
    								||	
    								(image.rows	!=	hog.winSize.height))	
    {	
    				resize(image,	image,	hog.winSize);	
    }	
    	
    vector<float>	descs;	
    hog.compute(image,	descs);	
    int	result	=	svm->predict(descs);	
    if(result	==	+1)	
    {	
    				cout	<<	"Image	contains	a	traffic	sign."	<<	endl;	
    }	
    else	if(result	==	-1)	
    {	
    				cout	<<	"Image	does	not	contain	a	traffic	sign."	<<	endl;	
    }	
In	the	preceding	code,	we	simply	read	an	image	and	resize	it	to	the	HOG
window	size.	Then	we	use	the	compute	method	of	the	HOGDescriptor	class,	just	like
when	we	were	training	the	model.	Except,	this	time,	we	use	the	predict	method	to
find	the	label	of	this	new	image.	If	the	result	equals	+1,	which	was	the	label	we
assigned	for	traffic	sign	images	when	we	trained	the	SVM	model,	then	we	know
that	the	image	is	the	image	of	a	traffic	sign,	otherwise	it's	not.
The	accuracy	of	the	result	completely	depends	on	the	quantity	and	quality	of	the
data	you	have	used	to	train	your	SVM	model.	This,	in	fact,	is	the	case	for	each
and	every	machine	learning	algorithm.	The	more	you	train	your	model,	the	more
accurate	it	becomes.
          This	method	of	classification	assumes	that	the	input	image	is	of	the	same	characteristics	as
          the	trained	images.	Meaning,	if	the	image	contains	a	traffic	sign,	it	is	cropped	similarly	to	the
          images	we	used	to	train	the	model.	For	instance,	if	you	use	an	image	that	contains	the	traffic
          sign	image	we're	looking	for,	but	also	contain	much	more,	then	the	result	will	probably	be
          incorrect.
As	the	amount	of	data	in	your	training	set	increases,	it	will	take	more	time	to
train	your	model.	So,	it's	important	to	avoid	retraining	your	model	every	time
you	want	to	use	it.	The	SVM	class	allows	you	to	save	and	load	SVM	models	using
the	save	and	load	methods.	Here	is	how	you	can	save	a	trained	SVM	model	for
later	use	and	to	avoid	retraining	it:
  svm->save("trained_svm_model.xml");	
The	file	will	be	saved	using	the	provided	filename	and	extension	(XML	or	any
other	file	type	supported	by	OpenCV).	Later,	using	the	static	load	function,	you
can	create	an	SVM	object	that	contains	the	exact	parameters	and	trained	model.
Here's	an	example:
  Ptr<SVM>	svm	=	SVM::load("trained_svm_model.xml	");	
Try	using	the	SVM	class	along	with	HOGDescriptor	to	train	models	that	can	detect	and
classify	more	types	using	images	of	various	objects	stored	in	different	folders.
Training	models	with	artificial	neural
networks
ANN	can	be	used	to	train	a	model	using	a	set	of	sample	input	and	output	vectors.
ANN	is	a	highly	popular	machine	learning	algorithm	and	the	basis	of	many
modern	artificial	intelligence	algorithms	that	are	used	to	train	models	for
classification	and	correlation.	Especially	in	computer	vision,	the	ANN	algorithm
can	be	used	along	with	a	wide	range	of	feature-description	algorithms	to	learn
about	images	of	objects,	or	even	faces	of	different	people,	and	then	used	to
detect	them	in	images.
You	can	use	the	ANN_MLP	class	(which	stands	for	artificial	neural	networks
—multi-layer	perceptron)	in	OpenCV	to	implement	ANN	in	your	applications.
The	usage	of	this	class	is	quite	similar	to	that	of	the	SVM	class,	so	we're	going	to
give	a	simple	example	to	learn	the	differences	and	how	it's	used	in	practice,	and
we'll	leave	the	rest	for	you	to	discover	by	yourself.
Creating	the	training	samples	dataset	is	exactly	the	same	for	all	machine	learning
algorithms	in	OpenCV,	or,	to	be	precise,	for	all	subclasses	of	the	StatsModel	class.
The	ANN_MLP	class	is	no	exception	to	this,	so,	just	like	with	the	SVM	class,	first	we
need	to	create	a	TrainData	object	that	contains	all	the	sample	and	response	data
that	we	need	to	use	when	training	our	ANN	model,	as	seen	here:
    SampleTypes	layout	=	ROW_SAMPLE;	
    data	=	TrainData::create(samples,	
    																									layout,	
    																									responses);	
samples	and	responses,	in	the	preceding	code,	are	both	Mat	objects	that	contain	a
number	of	rows	that	equals	the	number	of	all	the	training	data	we	have	in	our
dataset.	As	for	the	number	of	columns	in	them,	let's	recall	that	the	ANN
algorithm	can	be	used	to	learn	the	relationship	between	vectors	of	input	and
output	data.	This	means	that	the	number	of	columns	in	the	training	input	data,	or
samples,	can	be	different	from	the	number	of	columns	in	the	training	output	data,
or	responses.	We'll	refer	to	the	number	of	columns	in	samples	as	the	number	of
features,	and	to	the	number	of	columns	in	responses	as	the	number	of	classes.
Simply	put,	we're	going	to	learn	the	relationship	of	features	to	classes	using	a
training	dataset.
After	taking	care	of	the	training	dataset,	we	need	to	create	an	ANN_MLP	object	using
the	following	code:
    Ptr<ANN_MLP>	ann	=	ANN_MLP::create();	
We	have	skipped	all	the	customizations	and	used	the	default	set	of	parameters.	In
the	case	that	you	need	to	use	a	fully	customized	ANN_MLP	object,	you	need	to	set
the	activation	function,	termination	criteria,	and	various	other	parameters	in	the
ANN_MLP	class.	To	learn	more	about	this,	make	sure	to	refer	to	the	OpenCV
Setting	the	correct	layer	sizes	in	the	ANN	algorithm	requires	experience	and
depends	on	the	use	case,	but	it	can	also	be	set	using	a	few	trial-and-error
sessions.	Here's	how	you	can	set	the	number	and	size	of	each	layer	in	the	ANN
algorithm,	and	the	ANN_MLP	class	to	be	specific:
    Mat_<int>	layers(4,1);	
    layers(0)	=	featureCount;				//	input	layer	
    layers(1)	=	classCount	*	4;		//	hidden	layer	1	
    layers(2)	=	classCount	*	2;		//	hidden	layer	2	
    layers(3)	=	classCount;						//	output	layer	
    ann->setLayerSizes(layers);	
In	the	preceding	code,	the	number	of	rows	in	the	layers	object	refers	to	the
number	of	layers	we	want	to	have	in	our	ANN.	The	first	element	in	the	layers
object	should	contain	the	number	of	features	in	our	dataset,	and	the	last	element
in	the	layers	object	should	contain	the	number	of	classes.	Recall	that	the	number
of	features	equals	the	column	count	of	samples,	and	the	number	of	classes	equals
the	column	count	of	responses.	The	rest	of	the	elements	in	the	layers	object	contain
the	sizes	of	hidden	layers.
Training	the	ANN	model	is	done	by	using	the	train	method,	as	seen	in	the
following	example:
    if(!ann->train(data))	
    {	
    				cout	<<	"training	failed"	<<	endl;	
    				return	-1;	
    }	
After	the	training	is	completed,	we	can	use	the	save	and	load	methods	in	exactly
the	same	way	as	we	saw	before,	to	save	the	model	for	later	use,	or	reload	it	from
a	saved	file.
Using	the	model	with	the	ANN_MLP	class	is	also	quite	similar	to	the	SVM	class.	Here's
an	example:
    Mat_<float>	input(1,	featureCount);	
    Mat_<float>	output(1,	classCount);	
    //	fill	the	input	Mat	
    ann->predict(input,	output);	
Choosing	the	right	machine	learning	algorithm	for	each	problem	requires
experience	and	knowledge	about	where	the	project	is	going	to	be	used.	SVM	is
quite	simple,	and	suitable	when	we	need	to	work	with	the	classification	of	data
and	in	the	segmentation	of	groups	of	similar	data,	whereas	ANN	can	be	easily
used	to	approximate	a	function	between	sets	of	input	and	output	vectors
(regression).	Make	sure	to	try	out	different	machine	learning	problems	to	better
understand	where	and	when	to	use	a	specific	algorithm.
	
The	cascading	classification
algorithm
	
The	most	important	aspect	of	a	cascade	classifier,	as	can	be	guessed	from	its
name,	is	its	cascading	nature	of	learning	and	detecting	an	object	using	the
extracted	features.	The	most	widely	used	features	in	cascade	classifiers,	and
consequently	cascade	classifier	types,	are	Haar	and	local	binary	pattern	(LBP).
In	this	section,	we're	going	to	learn	how	to	use	existing	OpenCV	Haar	and	LBP
cascade	classifiers	to	detect	faces,	eyes,	and	more	in	real-time,	and	then	learn
how	to	train	our	own	cascade	classifiers	to	detect	any	other	objects.
	
Object	detection	using	cascade
classifiers
To	be	able	to	use	previously	trained	cascade	classifiers	in	OpenCV,	you	can	use
the	CascadeClassifier	class	and	the	simple	methods	it	provides	for	loading	a
classifier	from	file	or	performing	scale-invariant	detection	in	images.	OpenCV
contains	a	number	of	trained	classifiers	to	detect	faces,	eyes,	and	so	on	in	real-
time.	If	we	browse	to	the	OpenCV	installation	(or	build)	folder,	it	usually
contains	a	folder	called	etc,	which	contains	the	following	subfolders:
     haarcascades
     lbpcascades
Under	each	one	of	the	classifier	folders	we	just	mentioned,	you	can	find	a
number	of	pre-trained	cascade	classifiers.	You	can	load	these	classifiers	and
prepare	them	for	object	detection	in	real-time	using	the	load	method	of	the
CascadeClassifier	class,	as	seen	in	the	following	example:
  CascadeClassifier	detector;	
  if(!detector.load("classifier.xml"))	
  {	
  				cout	<<	"Can't	load	the	provided	cascade	classifier."	<<	endl;	
  				return	-1;	
  }	
After	a	cascade	classifier	is	successfully	loaded,	you	can	use	the	detectMultiScale
method	to	detect	objects	in	an	image	and	return	a	vector	containing	the	bounding
rectangles	of	the	detected	objects,	as	seen	in	the	following	example:
  vector<Rect>	objects;	
  detector.detectMultiScale(frame,	
  																										objects);	
  	
  	
  for(int	i=0;	i<	objects.size();	i++)	
  {	
  				rectangle(frame,	
  														objects	[i],	
  														color,	
  														thickness);	
  }	
    	and	thickness	are	previously	defined	to	affect	the	rectangle	drawn	for	each
color
detected	object,	as	seen	here:
  Scalar	color	=	Scalar(0,0,255);	
  int	thickness	=	2;	
The	last	line	in	the	preceding	code	is	used	to	convert	the	unit	of	the	time
measurement	from	seconds	to	milliseconds.	You	can	use	the	following	code	to
print	out	the	result	over	the	output	image,	in	the	lower-left	corner	for	example:
  Scalar	green	=	Scalar(0,255,0);	
  int	thickness	=	2;	
  double	scale	=	0.75;	
  putText(frame,	
  								"Took	"	+	to_string(int(t))	+	"ms	to	detect",	
  								Point(0,	frame.rows-1),	
  								FONT_HERSHEY_SIMPLEX,	
  								scale,	
  								green,	
  								thickness);
This	will	produce	an	output	image	that	contains	text	similar	to	what	is	seen	in
the	following	example:
Try	different	pre-trained	cascade	classifiers	shipped	with	OpenCV	and	check
their	performance	against	each	other.	One	very	obvious	observation	will	be	the
significantly	faster	detection	speed	of	LBP	cascade	classifiers.
In	the	previous	examples,	we	used	only	the	default	set	of	parameters	needed	for
the	detectMultiScale	method	of	the	CascadeClassifier	class,	however,	to	modify	its
behavior	and,	in	some	cases,	to	significantly	improve	its	performance,	you'll
need	to	adjust	a	few	more	parameters,	as	seen	in	the	following	example:
    double	scaleFactor	=	1.1;	
    int	minNeighbors	=	3;	
    int	flags	=	0;	//	not	used	
    Size	minSize(50,50);	
    Size	maxSize(500,	500);	
    vector<Rect>	objects;	
    detector.detectMultiScale(image,	
    																										objects,	
    																										scaleFactor,	
    																										minNeighbors,	
    																										flags,	
    																										minSize,	
    																										maxSize);	
The	scaleFactor	parameter	is	used	to	specify	the	scaling	of	the	image	after	each
detection.	This	means	resizing	the	image	and	performing	the	detection	internally.
This	is	in	fact	how	multi-scale	detection	algorithms	work.	An	image	is	searched
for	in	an	object,	its	size	is	reduced	by	the	given	scaleFactor,	and	the	search	is
performed	again.	Size	reduction	is	performed	repeatedly	until	the	image	size	is
smaller	than	the	classifier	size.	The	results	from	all	detections	in	all	scales	are
then	returned.	The	scaleFactor	parameter	must	always	contain	a	value	greater	than
1.0	(not	equal	to	and	not	lower	than).	For	higher	sensitivity	in	multi-scale
detection,	you	can	set	a	value	such	as	1.01	or	1.05,	which	will	lead	to	much
longer	detection	times,	and	vice	versa.	The	minNeighbors	parameter	refers	to	the
grouping	of	detections	that	are	near	or	similar	to	each	other	to	retain	a	detected
object.
	
The	flags	parameter	is	simply	ignored	in	recent	versions	of	OpenCV.	As	for	the
minSize	and	maxSize	parameters,	they	are	used	to	specify	the	minimum	and
simplifying	the	examples,	but	you	should	check	them	out	for	yourself	to	learn
more	about	cascade	classifiers	and	multi-scale	detection	in	general.	Make	sure	to
also	search	online	for	pre-trained	classifiers	by	fellow	computer	vision
developers	and	try	using	them	in	your	applications.
Training	cascade	classifiers
As	we	mentioned	previously,	you	can	also	create	your	own	cascade	classifiers	to
detect	any	other	object	if	you	have	enough	positive	and	negative	sample	images.
Training	a	classifier	using	OpenCV	involves	taking	a	number	of	steps	and	using
a	number	of	special	OpenCV	applications,	which	we'll	go	through	in	this	section.
                                       	
Creating	samples
First	things	first,	you	need	a	tool	called	opencv_createsamples	to	prepare	the	positive
image	sample	set.	The	negative	image	samples,	on	the	other	hand,	are	extracted
automatically	during	the	training	from	a	provided	folder	containing	arbitrary
images	that	do	NOT	include	the	object	of	interest.	The	opencv_createsamples
application	can	be	found	inside	the	bin	folder	of	the	OpenCV	installation.	It	can
be	used	to	create	the	positive	samples	dataset,	either	by	using	a	single	image	of
the	object	of	interest	and	applying	distortions	and	transformations	to	it,	or	by
using	previously	cropped	or	annotated	images	of	the	object	of	interest.	Let's
learn	about	the	former	case	first.
Imagine	you	have	the	following	image	of	a	traffic	sign	(or	any	other	object,	for
that	matter)	and	you	want	to	create	a	positive	sample	dataset	using	it:
You	should	also	have	a	folder	containing	the	source	of	the	negative	samples.	As
we	mentioned	previously,	you	need	to	have	a	folder	containing	arbitrary	images
that	do	not	contain	the	object	of	interest.	Let's	assume	we	have	some	images
similar	to	the	following	that	we'll	be	using	to	create	negative	samples	from:
Note	that	the	size	and	aspect	ratio	of	negative	images,	or	to	use	the	correct
terminology,	the	background	images,	is	not	at	all	important.	However,	they	must
be	at	least	as	big	as	the	minimum-detectable	object	(classifier	size)	and	they
must	never	contain	images	of	the	object	of	interest.
      vec	is	used	to	specify	the	positive	samples	file	that	will	be	created.	In	this
      case,	it	is	the	samples.vec	file.
      img	is	used	to	specify	the	input	image	that	will	be	used	to	generate	the
    samples.	In	our	case,	it's	sign.png.
    bg	is	used	to	specify	the	background's	description	file.	A	background's
    description	file	is	a	simple	text	file	that	contains	the	paths	to	all	background
    images	(each	line	in	the	background's	description	file	contains	the	path	to
    one	background	image).	We	have	created	a	file	named	bg.txt	and	provided	it
    to	the	bg	parameter.
    The	num	parameter	determines	the	number	of	positive	samples	you	want	to
    generate	using	the	given	input	image	and	backgrounds;	250,	in	our	case.
    You	can,	but	of	course,	use	a	higher	or	lower	number,	depending	on	the
    accuracy	and	duration	of	training	that	you	require.
    bgcolor	can	be	used	to	define	the	background	color	in	terms	of	its	grayscale
    intensity.	As	you	can	see	in	our	input	image	(the	traffic	sign	image),	the
    background	color	is	black,	thus	the	value	of	this	parameter	is	zero.
    The	bgthresh	parameter	specifies	the	threshold	of	the	accepted	bgcolor
    parameter.	This	is	especially	useful	in	the	case	of	compression	artifacts	that
    are	common	to	some	image	formats	and	might	cause	slightly	different	pixel
    values	for	the	same	color.	We	have	used	10	for	the	value	of	this	parameter
    to	allow	a	slight	level	of	tolerance	for	background	pixels.
    maxidev	can	be	used	to	set	the	maximum	intensity	deviation	of	the	foreground
    pixel	values	while	generating	the	samples.	A	value	of	50	means	the
    intensity	of	the	foreground	pixels	can	vary	between	their	original	values	+/-
    50.
    maxxangle,	maxyangle,	and	maxzangle	correspond	to	the	maximum	possible
    rotation	allowed	in	the	x,	y,	and	z	directions	when	creating	new	samples.
    These	values	are	in	radians,	for	which	we	have	provided	0.7,	0.7,	and	0.5.
    The	w	and	h	parameters	define	the	width	and	height	of	the	samples.	We	have
    used	32	for	both	of	them	since	the	object	we're	looking	to	train	a	classifier
    for	fits	in	a	square	shape.	These	same	values	will	be	used	later	on,	when
    training	the	classifier.	Also	note	that	this	will	be	the	minimum	detectable
    size	in	your	trained	classifier	later	on.
       Besides	the	parameters	in	the	preceding	list,	the	opencv_createsamples	application	also	accepts	a
       show	parameter	that	can	be	used	to	display	the	created	samples,	an	inv	parameter	that	can	be
       used	to	invert	the	colors	of	samples,	and	a	randinv	parameter	that	can	be	used	to	set	or	unset
       the	random	inversion	of	pixels	in	samples.
Running	the	preceding	command	will	produce	the	given	number	of	samples	by
performing	rotations	and	intensity	changes	to	the	foreground	pixels.	Here	are
some	of	the	resultant	samples:
Now	that	we	have	a	positive	samples	vector	file,	produced	by	opencv_createsamples,
and	a	folder	that	contains	the	background	images	along	with	a	background's
description	file	(bg.txt	from	the	previous	example),	we	can	start	the	training	of
our	cascade	classifier,	but	before	that,	let's	also	learn	about	the	second	method	of
creating	our	positive	samples	vector,	which	is	by	extracting	them	from	various
annotated	images	that	contain	our	object	of	interest.
This	second	method	involves	using	another	official	OpenCV	tool	which	is	used
for	annotating	positive	samples	in	images.	This	tool	is	called	opencv_annotation	and
it	can	be	used	to	conveniently	mark	the	areas	in	a	number	of	images	that	contain
our	positive	samples,	or	in	other	words,	the	objects,	we're	going	to	train	a
cascade	classifier	for	them.	The	opencv_annotation	tool	produces	an	annotation	text
file	(after	a	manual	annotation	of	the	objects)	that	can	be	used	with	the
opencv_createsamples	tool	to	produce	a	positive	samples	vector	suitable	for	use	with
the	OpenCV	cascade	training	tool	that	we'll	learn	about	in	the	next	section.
All	of	these	images	are	located	in	a	single	folder	and	all	of	them	contain	one	or
more	samples	of	the	traffic	sign	(the	object	of	interest)	that	we're	looking	for.	We
can	use	the	following	command	to	start	the	opencv_annotation	tool	and	manually
annotate	the	samples:
  opencv_annotation	--images=imgpath	--annotations=anno.txt	
In	the	preceding	command,	imgpath	must	be	replaced	with	the	path	(preferably	the
absolute	path	and	with	forward	slashes)	to	the	folder	containing	the	images.
anno.txt,	or	any	other	file	name	provided	instead,	will	be	filled	with	the
Immediately	after	the	preceding	output,	a	window	similar	to	the	following	will
be	displayed:
You	can	highlight	an	object	using	your	mouse's	left	button,	which	will	cause	a
red	rectangle	to	be	drawn.	Pressing	the	C	key	will	finalize	the	annotation	and	it
will	become	red.	Continue	this	process	for	the	rest	of	the	samples	(if	any)	in	the
same	image	and	press	N	to	go	to	the	next	image.	After	all	images	are	annotated,
you	can	exit	the	application	by	pressing	the	Esc	key.
        In	addition	to	the	-images	and	-annotations	parameters,	the	opencv_annotation	tool	also	includes	an
          optional	parameter,	called	-maxWindowHeight,	that	can	be	used	to	resize	images	that	are	bigger
          than	a	given	size.	The	resize	factor	in	this	case	can	be	specified	with	another	optional
          parameter	called	-resizeFactor.
The	annotation	file	created	by	the	opencv_annotation	tool	will	look	like	the
following:
    signs01.jpg	2	145	439	105	125	1469	335	185	180	
    signs02.jpg	1	862	468	906	818	
    signs03.jpg	1	1450	680	530	626	
    signs04.jpg	1	426	326	302	298	
    signs05.jpg	0	
    signs06.jpg	1	1074	401	127	147	
    signs07.jpg	1	1190	540	182	194	
    signs08.jpg	1	794	460	470	488
Each	line	in	the	annotations	file	contains	the	path	to	an	image,	followed	by	the
number	of	objects	of	interest	in	that	image	followed	by	the	x,	y,	width,	and
height	values	of	the	bounding	rectangles	of	those	objects.	You	can	use	the
following	command	to	produce	a	samples	vector	using	this	annotation	text	file:
    opencv_createsamples	-info	anno.txt	-vec	samples.vec	-w	32	-h	32	
Note	that	this	time	we	used	the	opencv_createsamples	tool	with	the	-info	parameter,
which	wasn't	present	when	we	used	this	tool	to	generate	samples	from	an	image
and	arbitrary	backgrounds.	We	are	now	ready	to	train	a	cascade	classifier	that	is
capable	of	detecting	the	traffic	sign	we	created	the	samples	for.
Creating	the	classifier
The	last	tool	we're	going	to	learn	about	is	called	opencv_traincascade,	which,	as	you
can	guess,	is	used	to	train	cascade	classifiers.	If	you	have	enough	samples	and
background	images,	and	if	you	have	already	taken	care	of	the	samples	vector	as
it	was	described	in	the	preceding	section,	then	the	only	thing	you	need	to	do	is	to
run	the	opencv_traincascade	tool	and	wait	for	the	training	to	be	completed.	Let's	see
an	example	training	command	and	then	go	through	the	parameters	in	detail:
  opencv_traincascade	-data	classifier	-vec	samples.vec	
  				-bg	bg.txt	-numPos	200	-numNeg	200	-w	32	-h	32	
This	is	the	simplest	way	of	starting	the	training	process,	and	only	uses	the
mandatory	parameters.	All	parameters	used	in	this	command	are	self-
explanatory,	except	the	-data	parameter,	which	must	be	an	existing	folder	that
will	be	used	to	create	the	files	required	during	the	training	process	and	the	final
trained	classifier	(called	cascade.xml)	will	be	created	in	this	folder.
        numPos 	cannot	contain	a	number	higher	than	the	number	of	positive	samples	in	your	samples.vec
        file,	however,	numNeg	can	contain	basically	any	number	since	the	training	process,	will	simply
        try	to	create	random	negative	samples	by	extracting	portions	of	the	provided	background
        images.
The	opencv_traincascade	tool	will	create	a	number	of	XML	files	in	the	folder	set	as
the	-data	parameter,	which	must	not	be	modified	until	the	training	process	is
completed.	Here	is	a	short	description	for	each	one	of	them:
     The	params.xml	file	will	contain	the	parameters	used	for	training	the	classifier.
     stage#.xml	files	are	checkpoints	that	are	created	after	each	training	stage	is
     completed.	They	then	can	be	used	to	resume	the	training	later	on	if	the
     training	process	was	terminated	for	an	unexpected	reason.
     The	cascade.xml	file	is	the	trained	classifier	and	the	last	file	that	will	be
     created	by	the	training	tool.	You	can	copy	this	file,	rename	it	to	something
     convenient	(such	as	trsign_classifier.xml	or	something	like	that),	and	use	it
     with	the	CascadeClassifier	class,	as	we	learned	in	the	previous	sections,	to
     perform	multi-scale	object	detection.
                  	is	an	extremely	customizable	and	flexible	tool,	and	you	can
opencv_traincascade
easily	modify	its	many	optional	parameters	to	make	sure	the	trained	classifier
fits	your	needs.	Here	is	a	description	of	some	of	its	most	used	parameters:
     numStages	can	be	used	to	set	the	number	of	stages	used	to	train	the	cascade
     classifier.	By	default,	numStages	equals	20,	but	you	can	decrease	this	value	to
     shorten	the	training	time	while	sacrificing	the	accuracy	or	vice	versa.
     The	precalcValBufSize	and	precalcIdxBufSize	parameters	can	be	used	to	increase
     or	decrease	the	amount	of	memory	used	for	various	calculations	during	the
     training	of	the	cascade	classifier.	You	can	modify	these	parameters	to	make
     sure	the	training	process	is	performed	with	more	efficiency.
     featureType	is	one	of	the	most	important	parameters	of	the	training	tool,	and
     it	can	be	used	to	set	the	type	of	the	trained	classifier	to	HAAR	(default	if
     ignored)	or	LBP.	As	mentioned	before,	LBP	classifiers	are	trained	much
     faster	than	Haar	classifiers	and	their	detection	is	also	significantly	faster,
     but	they	lack	the	accuracy	of	the	Haar	cascade	classifiers.	With	a	proper
     amount	of	training	samples,	you	might	be	able	to	train	an	LBP	classifier
     that	can	compete	with	a	Haar	classifier	in	terms	of	accuracy.
For	a	complete	list	of	parameters	and	their	descriptions,	make	sure	to	refer	to	the
OpenCV	online	documentation.
                                          	
Using	deep	learning	models
In	recent	years,	there	has	been	a	huge	improvement	in	the	field	of	deep	learning,
or,	to	be	precise,	deep	neural	networks	(DNN),	and	more	and	more	libraries
and	frameworks	are	being	introduced	that	use	deep	learning	algorithms	and
models,	especially	for	computer	vision	purposes	such	as	object	detection	in	real-
time.	You	can	use	the	most	recent	versions	of	the	OpenCV	library	to	read	pre-
trained	models	for	the	most	popular	DNN	frameworks,	such	as	Caffe,	Torch,	and
TensorFlow,	and	use	them	for	object	detection	and	prediction	tasks.
DNN-related	algorithms	and	classes	in	OpenCV	are	all	located	under	the	dnn
namespace,	so,	to	be	able	to	use	them,	you	need	to	make	sure	to	include	the
following	in	your	code:
  using	namespace	cv;	
  using	namespace	dnn;	
We're	going	to	walk	through	the	loading	and	use	of	a	pre-trained	model	from	the
TensorFlow	library	in	OpenCV	for	real-time	object	detection.	This	example
demonstrates	the	basics	of	how	to	use	deep	neural	networ	models	trained	by	a
third-party	library	(TensorFlow	in	this	case).	So,	let's	start:
       Note	that	this	link	can	possibly	change	in	the	future	(maybe	not	soon,	but
       it's	worth	mentioning),	so,	in	case	this	happens,	you	need	to	simply
       search	online	for	the	TensorFlow	model	zoo,	which,	in	TensorFlow	terms,	is	a
       zoo	containing	the	pre-trained	object	detection	models.
4.	 After	making	sure	the	model	is	correctly	loaded,	you	can	use	the	following
    code	to	perform	a	real-time	object	detection	in	a	frame	read	from	the
    camera,	an	image,	or	a	video	file:
        const	int	inWidth	=	300;	
        const	int	inHeight	=	300;	
        const	float	meanVal	=	127.5;	//	255	divided	by	2	
        const	float	inScaleFactor	=	1.0f	/	meanVal;	
        bool	swapRB	=	true;	
        bool	crop	=	false;	
        Mat	inputBlob	=	blobFromImage(frame,	
        																						inScaleFactor,	
        																						Size(inWidth,	inHeight),	
        																						Scalar(meanVal,	meanVal,	meanVal),	
        																						swapRB,	
        																						crop);	
        	
        network.setInput(inputBlob);	
        	
        Mat	result	=	network.forward();
                                                 	
     It's	worth	noting	that	the	values	passed	to	the	blobFromImage	function
     completely	depend	on	the	model,	and	you	should	use	the	exact	same
     values	if	you're	using	the	same	model	from	this	example.	The
     blobFromImage	function	will	create	a	BLOB	that	is	suitable	for	use	with	the
5.	 After	the	detection	is	complete,	you	can	use	the	following	code	to	extract
    the	detected	objects	and	their	bounding	rectangles,	all	into	a	single	Mat
    object:
    Mat	detections(result.size[2],	
    															result.size[3],	
    															CV_32F,	
    															result.ptr<float>());	
     The	confidence	level,	which	is	the	third	element	in	each	row	of	the
     detections	object,	can	be	adjusted	to	get	more	accurate	results,	but	0.5
     should	be	a	reasonable	value	for	most	cases,	or	at	least	for	a	start.
7.	 After	a	detection	passes	the	confidence	criteria,	we	can	extract	the	detected
    object	ID	and	bounding	rectangle	and	draw	it	on	the	input	image,	as	seen
    here:
        int	objectClass	=	(int)(detections.at<float>(i,	1))	-	1;	
        int	left	=	static_cast<int>(	
        												detections.at<float>(i,	3)	*	frame.cols);	
        int	top	=	static_cast<int>(	
        												detections.at<float>(i,	4)	*	frame.rows);	
        int	right	=	static_cast<int>(	
        												detections.at<float>(i,	5)	*	frame.cols);	
        int	bottom	=	static_cast<int>(	
        												detections.at<float>(i,	6)	*	frame.rows);	
        rectangle(frame,	Point(left,	top),	
        										Point(right,	bottom),	Scalar(0,	255,	0));	
          String	label	=	"ID	=	"	+	to_string(objectClass);	
          if(objectClass	<	labels.size())	
          				label	=	labels[objectClass];	
          int	baseLine	=	0;	
          Size	labelSize	=	getTextSize(label,	FONT_HERSHEY_SIMPLEX,	
          																													0.5,	2,	&baseLine);	
          top	=	max(top,	labelSize.height);	
          	
          rectangle(frame,	
          										Point(left,	top	-	labelSize.height),	
          										Point(left	+	labelSize.width,	top	+	baseLine),	
          										white,	
          																	CV_FILLED);	
          	
          putText(frame,	label,	Point(left,	top),	
          								FONT_HERSHEY_SIMPLEX,	0.5,	red);	
We	have	converted	this	into	the	following	labels	vector	used	in	the	preceding
example:
  const	vector<string>	labels	=	{	"person",	"bicycle"	...};	
The	following	image	demonstrates	the	result	of	the	object	detection	using	a	pre-
trained	TensorFlow	model	in	OpenCV:
Using	deep	learning	has	proven	to	be	highly	efficient,	especially	when	we	need
to	train	and	detect	multiple	objects	in	real-time.	Make	sure	to	refer	to	the
TensorFlow	and	OpenCV	documentation	for	more	about	how	to	use	pre-trained
models,	or	how	to	train	and	retrain	DNN	models	for	an	object	that	doesn't	have
an	already	trained	model.
Summary
We	started	the	final	chapter	of	this	book	by	learning	about	SVM	models	and	how
to	train	them	to	classify	groups	of	similar	data.	We	learned	how	SVM	can	be
used	in	conjunction	with	the	HOG	descriptor	to	learn	about	one	or	more	specific
objects	and	then	detect	and	classify	them	in	new	images.	After	learning	about
SVM	models,	we	moved	on	to	using	ANN	models,	which	offer	much	more
power	in	cases	where	we	have	multiple	columns	in	both	the	input	and	output	of
the	training	samples.	This	chapter	also	included	a	complete	guide	on	how	to	train
and	use	Haar	and	LBP	cascade	classifiers.	We	are	now	familiar	with	the	usage	of
official	OpenCV	tools	that	can	be	used	to	prepare	a	training	dataset	from	scratch
and	then	train	a	cascade	classifier	using	that	dataset.	Finally,	we	ended	this
chapter	and	this	book	by	learning	about	the	usage	of	pre-trained	deep	learning
object	detection	models	in	OpenCV.
                                        	
Questions
	
    1.	 What	is	the	difference	between	the	train	and	trainAuto	methods	in	the	SVM
        class?
    2.	 Demonstrate	the	difference	between	the	linear	and	histogram	intersection.
    3.	 How	do	you	calculate	the	HOG	descriptor	size	for	a	HOG	window	size	of
        128	x	96	pixels	(the	rest	of	the	HOG	parameters	are	untouched)?
    4.	 How	do	you	update	an	existing	trained	ANN_MLP,	instead	of	training	from
        scratch?
    5.	 What	is	the	required	command	(by	using	opencv_createsamples)	to	create	a
        positive	samples	vector	from	a	single	image	of	a	company	logo?	Assume
        we	want	to	have	1,000	samples	with	a	width	of	24	and	a	height	of	32,	and
        by	using	default	parameters	for	rotations	and	inversions.
    6.	 What	is	the	required	command	to	train	an	LBP	cascade	classifier	for	the
        company	logo	from	the	previous	question?
    7.	 What	is	the	default	number	of	stages	for	training	a	cascade	classifier	in
        opencv_traincascade?	How	can	we	change	it?	What	is	the	downside	of
        increasing	and	decreasing	the	number	of	stages	far	beyond	its	default
        value?
	
Assessments
Chapter	1,	Introduction	to	Computer
Vision
1.	Name	two	industries	besides	the	ones	mentioned	in	this	chapter	that	can
significantly	benefit	from	computer	vision.
   The	sports	industry	can	use	computer	vision	for	better	analysis	of	matches.
   The	food	industry	can	use	computer	vision	for	quality	control	of	products.
4.	How	many	megabytes	would	be	needed	to	store	a	1920	x	1080	image	with
four	channels	and	a	depth	of	32-bits?
    This	mostly	depends	on	the	aspect	ratio.	For	a	common	16:9	aspect	ratio,
    here	are	the	answers:
6.	Name	two	commonly	used	color	spaces	besides	the	ones	mentioned	in	this
chapter.
7.	Compare	OpenCV	libraries	with	computer	vision	tools	in	MATLAB.	What	are
the	pros	and	cons	of	each	one,	when	compared?
2.	What	is	the	effect	of	building	OpenCV	3	with	the	BUILD_opencv_world	flag	turned
on?
   Building	OpenCV	3	with	the	BUILD_opencv_world	flag	will	combine	all	binary
   library	files,	such	as	core,	imcodecs,	and	highgui,	into	a	single	world	library.
3.	Using	the	ROI	pixel	access	method	described	in	this	chapter,	how	can	we
construct	a	Mat	class	that	can	access	the	middle	pixel,	plus	all	of	its	neighboring
pixels	(the	middle	nine	pixels)	in	another	image?
4.	Name	another	pixel	access	method	of	the	Mat	class,	besides	the	ones	mentioned
in	this	chapter.
      Mat::row	can	be	used	to	access	a	single	row
      	Mat::column	can	be	used	to	access	a	single	column
	
5.	Write	a	program,	only	using	the	Mat	method	and	a	for	loop,	that	creates	three
separate	color	images,	each	one	containing	only	one	channel	of	an	RGB	image
read	from	disk.
     Mat	image	=	imread("Test.png");
     if(image.empty())
     	{
     	cout	<<	"image	empty";
     	return	0;
     	}
     Mat	red(image.rows,	image.cols,	CV_8UC3,	Scalar::all(0));
     	Mat	green(image.rows,	image.cols,	CV_8UC3,	Scalar::all(0));
     	Mat	blue(image.rows,	image.cols,	CV_8UC3,	Scalar::all(0));
     for(int	i=0;	i<image.rows;	i++)
     	{
     	for(int	j=0;	j<image.cols;	j++)
     	{
     	blue.at<Vec3b>(i,	j)[0]	=	image.at<Vec3b>(i,	j)[0];
     	green.at<Vec3b>(i,	j)[1]	=	image.at<Vec3b>(i,	j)[1];
     	red.at<Vec3b>(i,	j)[2]	=	image.at<Vec3b>(i,	j)[2];
     	}
     	}
     imshow("Blue",	blue);
     	imshow("Green",	green);
     	imshow("Red",	red);
     waitKey();
7.	Write	a	program	using	VideoCapture,	waitKey,	and	imwrite,	that	displays	your
webcam	and	saves	the	visible	image	when	the	S	key	is	pressed.	This	program
will	stop	the	webcam	and	exit	if	the	spacebar	is	pressed.
     VideoCapture	cam(0);
     if(!cam.isOpened())
     	return	-1;
     while(true)
     	{
     	Mat	frame;
     	cam	>>	frame;
     	if(frame.empty())
     	break;
     imshow("Camera",	frame);
     //	stop	camera	if	space	is	pressed
     char	key	=	waitKey(10);
     if(key	==	'	')
     	break;
     if(key	==	's')
     	imwrite("d:/snapshot.png",	frame);
     	}
     cam.release();
Chapter	3,	Array	and	Matrix
Operations
1.	Which	of	the	element-wise	mathematical	operations	and	bitwise	operations
would	produce	the	exact	same	results?
The bitwise_xor and absdiff functions would produce the same result.
2.	What	is	the	purpose	of	the	gemm	function	in	OpenCV?	What	is	the	equivalent	of
AxB	with	the	gemm	function?
4.	Create	the	same	identity	matrix	in	the	Identity	matrix	section	of	this	chapter,
but	use	the	setIdentity	function	instead	of	the	Mat::eye	function.
      Mat	m(10,	10,	CV_32F);
      setIdentity(m,	Scalar(0.25));
5.	Write	a	program	using	the	LUT	function	(a	look-up	table	transformation)	that
performs	the	same	task	as	bitwise_not	(invert	colors)	when	executed	on	grayscale
and	color	(RGB)	images.
     Mat	image	=	imread("Test.png");
     Mat	result;
     LUT(image,	lut,	result);
6.	Besides	normalizing	the	values	of	a	matrix,	the	normalize	function	can	be	used
to	brighten	or	darken	images.	Write	the	required	function	call	to	darken	and
brighten	a	grayscale	image	using	the	normalize	function.
	
     normalize(image,	result,	200,	255,	CV_MINMAX);	//	brighten
     normalize(image,	result,	0,	50,	CV_MINMAX);	//	darken
7.	Remove	the	blue	channel	(the	first	channel)	from	an	image	(a	BGR	image
created	using	the	imread	function)	using	the	merge	and	split	functions.
     vector<Mat>	channels;
     split(image,	channels);
     channels[0]	=	Scalar::all(0);
     merge(channels,	result);
Chapter	4,	Drawing,	Filtering,	and
Transformation
1.	Write	a	program	that	draws	a	cross	over	the	whole	image,	with	a	thickness	of
3	pixels	and	a	red	color.
     line(image,
     	Point(0,0),
     	Point(image.cols-1,image.rows-1),
     	Scalar(0,0,255),
     	3);
     line(image,
     	Point(0,image.rows-1),
     	Point(image.cols-1,0),
     	Scalar(0,0,255),
     	3);
Here's an example:
4.	Using	cvtColor,	convert	a	color	image	to	grayscale	and	make	sure	only	the
darkest	100	shades	of	gray	are	filtered	out	using	the	threshold	function.	Make	sure
the	filtered	pixels	are	set	to	white	in	the	resulting	image,	and	the	rest	of	the
pixels	are	set	to	black.
      Mat	imageGray;
      	cvtColor(image,
      	imageGray,
      	COLOR_BGR2GRAY);
      threshold(imageGray,
      	result,
      	100,
      	255,
      	THRESH_BINARY_INV);
Here's an example:
5.	Use	the	remap	function	to	resize	an	image	to	half	of	its	original	width	and
height,	thus	preserving	the	aspect	ratio	of	the	original	image.	Use	a	default
border	type	for	the	extrapolation.
      Mat	mapX(image.size(),	CV_32FC1);
      	Mat	mapY(image.size(),	CV_32FC1);
        	for(int	i=0;	i<image.rows;	i++)
        	for(int	j=0;	j<image.cols;	j++)
        	{
        	mapX.at<float>(i,j)	=	j*2.0;
        	mapY.at<float>(i,j)	=	i*2.0;
        	}
        	InterpolationFlags	interpolation	=	INTER_LANCZOS4;
        	BorderTypes	borderMode	=	BORDER_DEFAULT;
        	remap(image,
        	result,
        	mapX,
        	mapY,
        	interpolation,
        	borderMode);
Here's an example:
   a)
        Mat	userColor(256,	1,	CV_8UC3);
        	for(int	i=0;	i<=255;	i++)
        	userColor.at<Vec3b>(i,0)	=	Vec3b(i,	i,	i);
        	applyColorMap(image,
        	result,
        	userColor);
   b)
        Mat	userColor(256,	1,	CV_8UC3);
        	for(int	i=0;	i<=255;	i++)
        	userColor.at<Vec3b>(i,0)	=	Vec3b(255-i,	255-i,	255-i);
        	applyColorMap(image,
        	result,
        	userColor);
2.	Create	a	histogram	that	can	be	used	with	calcBackProject	function	to	extract	the
darkest	pixels	from	a	grayscale	image.	Consider	the	darkest	25%	possible	pixel
values	as	the	grayscale	intensities	we	are	looking	to	extract.
     int	bins	=	4;
     	float	rangeGS[]	=	{0,	256};
     	const	float*	ranges[]	=	{	rangeGS	};
     	int	channels[]	=	{0};
     	Mat	histogram(bins,	1,	CV_32FC1,	Scalar(0.0));
     	histogram.at<float>(0,	0)	=	255.0;
     	calcBackProject(&imageGray,
     	1,
     	channels,
     	histogram,
     	backProj,
     	ranges);
3.	In	the	previous	question,	what	if	we	needed	the	darkest	and	brightest	25%	to
be	excluded,	instead	of	extracted	in	a	mask?
     int	bins	=	4;
     	float	rangeGS[]	=	{0,	256};
     	const	float*	ranges[]	=	{	rangeGS	};
     	int	channels[]	=	{0};
     	Mat	histogram(bins,	1,	CV_32FC1,	Scalar(0.0));
     	histogram.at<float>(1,	0)	=	255.0;
     	histogram.at<float>(2,	0)	=	255.0;
     	calcBackProject(&imageGray,
     	1,
     	channels,
     	histogram,
     	backProj,
     	ranges);
4.	What	is	the	hue	value	of	the	red	color?	How	much	should	it	be	shifted	to	get
the	blue	color?
   0	and	360	are	the	hue	values	for	the	red	color.	Shifting	it	by	240	would	result
   in	the	blue	color.
5.	Create	a	hue	histogram	that	can	be	used	to	extract	red	colored	pixels	from	an
image.	Consider	an	offset	of	50	for	pixels	that	are	considered	red.	Finally,
visualize	the	created	hue	histogram.
     const	int	bins	=	360;
     	int	hueOffset	=	35;
     	Mat	histogram(bins,	1,	CV_32FC1);
     	for(int	i=0;	i<bins;	i++)
     	{
     	histogram.at<float>(i,	0)	=
     	(i	<	hueOffset)	||	(i	>	bins	-	hueOffset)	?	255.0	:	0.0;
     	}
     double	maxVal	=	255.0;
     int	gW	=	800,	gH	=	100;
     	Mat	theGraph(gH,	gW,	CV_8UC3,	Scalar::all(0));
     Mat	colors(1,	bins,	CV_8UC3);
     	for(int	i=0;	i<bins;	i++)
     	{
     	colors.at<Vec3b>(i)	=
     	Vec3b(saturate_cast<uchar>(
     	(i+1)*180.0/bins),	255,	255);
     	}
     	cvtColor(colors,	colors,	COLOR_HSV2BGR);
     	Point	p1(0,0),	p2(0,theGraph.rows-1);
     	for(int	i=0;	i<bins;	i++)
     	{
     	float	value	=	histogram.at<float>(i,0);
     	value	=	maxVal	-	value;	//	invert
     	value	=	value	/	maxVal	*	theGraph.rows;	//	scale
     	p1.y	=	value;
     	p2.x	=	float(i+1)	*	float(theGraph.cols)	/	float(bins);
     	rectangle(theGraph,
     	p1,
     	p2,
     	Scalar(colors.at<Vec3b>(i)),
     	CV_FILLED);
     	p1.x	=	p2.x;
     	}
6.	Calculate	the	integral	of	a	histogram.
     float	integral	=	0.0;
     	for(int	i=0;	i<bins;	i++)
     	{
     	integral	+=	histogram.at<float>(i,	0);
     	}
2.	How	can	we	call	the	meanShift	function	to	perform	the	Mean	Shift	algorithm
with	10	iterations	and	an	epsilon	value	of	0.5?
     TermCriteria	criteria(TermCriteria::MAX_ITER
     	+	TermCriteria::EPS,
     	10,
     	0.5);
     meanShift(backProject,
     	srchWnd,
     	criteria);
3.	How	do	you	visualize	the	hue	histogram	of	the	tracked	object?	Assume	that
CamShift	is	used	for	tracking.
We	can	call	the	following	right	after	the	CamShift	function	call	to	visualize	the	hue
of	the	detected	object:
  CamShift(backProject,
  	srchWnd,
  	criteria);
  visualizeHue(Mat(hue,	srchWnd));
4.	Set	the	process	noise	covariance	in	the	KalmanFilter	class	so	that	the	filtered	and
measured	values	overlap.	Assume	that	only	process	noise	covariance	is	set	of	all
the	available	matrices	for	the	KalmanFilter	class's	behavior	control.
     setIdentity(kalman.processNoiseCov,
     	Scalar::all(1.0));
5.	Let's	assume	that	the	Y	position	of	the	mouse	on	a	window	is	used	to	describe
the	height	of	a	filled	rectangle	that	starts	from	the	top-left	corner	of	the	window
and	has	a	width	that	equals	the	window's	width.	Write	a	Kalman	filter	that	can	be
used	to	correct	the	height	of	the	rectangle	(single	value)	and	remove	noise	in	the
mouse's	movement,	resulting	in	a	visually	smooth	resizing	of	the	filled	rectangle.
     int	fillHeight	=	0;
     void	onMouse(int,	int,	int	y,	int,	void*)
     	{
     	fillHeight	=	y;
     	}
     int	main()
     	{
     	KalmanFilter	kalman(2,1);
     Mat_<float>	tm(2,	2);	//	transition	matrix
     	tm	<<	1,0,
     	0,1;
     kalman.transitionMatrix	=	tm;
     	Mat_<float>	h(1,1);
     	h.at<float>(0)	=	0;
     kalman.statePre.at<float>(0)	=	0;	//	init	x
     	kalman.statePre.at<float>(1)	=	0;	//	init	x'
     setIdentity(kalman.measurementMatrix);
     setIdentity(kalman.processNoiseCov,
     	Scalar::all(0.001));
     string	window	=	"Canvas";
     	namedWindow(window);
     	setMouseCallback(window,	onMouse);
     while(waitKey(10)	<	0)
     	{
     	//	empty	canvas
     	Mat	canvas(500,	500,	CV_8UC3,	Scalar(255,	255,	255));
     h(0)	=	fillHeight;
     Mat	estimation	=	kalman.correct(h);
     float	estH	=	estimation.at<float>(0);
     rectangle(canvas,
     	Rect(0,0,canvas.cols,	estH),
     	Scalar(0),
     	FILLED);
     imshow(window,	canvas);
      kalman.predict();
      	}
      	return	0;
      }
    a)	Use	the	resize	function	to	scale	the	template	image,	and	then	call	the
    matchTemplate	function:
    b)	Rotate	the	template	by	90	degrees	and	then	call	the	matchTemplate	function:
      rotate(templ,	templ,	ROTATE_90_CLOCKWISE);
      	matchTemplate(image,	templ,	TM_CCOEFF_NORMED);
2.	Use	the	GFTTDetector	class	to	detect	the	keypoints	with	the	Harris	corner-
detection	algorithm.	You	can	set	any	values	for	the	corner-detection	algorithm.
      Mat	image	=	imread("Test.png");
      	Ptr<GFTTDetector>	detector	=
      	GFTTDetector::create(500,
      	0.01,
      	1,
      	3,
      	true);
      vector<KeyPoint>	keypoints;
      	detector->detect(image,	keypoints);
      drawKeypoints(image,
      	keypoints,
      	image);
3.	The	Hough	transformation	can	also	be	used	to	detect	circles	in	an	image	using
the	HoughCircles	function.	Search	for	it	in	the	OpenCV	documentation	and	write	a
program	to	detect	circles	in	an	image.
	
      Mat	image	=	imread("Test.png");
      	cvtColor(image,	image,	COLOR_BGR2GRAY);
      	vector<Vec3f>	circles;
      	HoughCircles(image,
      	circles,
      	HOUGH_GRADIENT,
      	2,
      	image.rows/4);
      	for(int	i=0;	i<circles.size();	i++)
      	{
      	Point	center(cvRound(circles[i][0]),
      	cvRound(circles[i][1]));
      	int	radius	=	cvRound(circles[i][2]);
      circle(	image,	center,	radius,	Scalar(0,0,255));
      	}
5.	Use	the	ORB	class	to	detect	keypoints	in	two	images,	extract	their	descriptors,
and	match	them.
      Mat	object	=	imread("Object.png");
      	Mat	scene	=	imread("Scene.png");
      Ptr<ORB>	orb	=	ORB::create();
      	vector<KeyPoint>	objKPs,	scnKPs;
      	Mat	objDesc,	scnDesc;
      	orb->detectAndCompute(object,
      	Mat(),
      	objKPs,
      	objDesc);
     	orb->detectAndCompute(scene,
     	Mat(),
     	scnKPs,
     	scnDesc);
     Ptr<BFMatcher>	matcher	=	BFMatcher::create();
     vector<DMatch>	matches;
     	matcher->match(objDesc,	scnDesc,	matches);
     Mat	result;
     	drawMatches(object,
     	objKPs,
     	scene,
     	scnKPs,
     	matches,
     	result);
     	imshow("image",	result);
   You	cannot	use	the	FLANN-based	matching	algorithm	with	descriptors	that
   have	a	bit	string	type,	such	as	ORB.
7.	You	can	use	the	following	OpenCV	functions	and	the	sample	to	calculate	the
time	required	to	run	any	number	of	lines	of	code.	Use	it	to	calculate	the	time	it
takes	for	the	matching	algorithms	on	your	computer.
     double	freq	=	getTickFrequency();
     	double	countBefore	=	getTickCount();
     //	your	code	goes	here	..
     double	countAfter	=	getTickCount();
     	cout	<<	"Duration:	"	<<
     	(countAfter	-	countBefore)	/	freq	<<	"	seconds";
Chapter	8,	Machine	Learning	in
Computer	Vision
1.	What	is	the	difference	between	the	train	and	trainAuto	methods	in	the	SVM	class?
    The	trainAuto	method	chooses	the	optimal	values	for	the	SVM	parameters,
    such	as	C,	Gamma	and	so	on,	and	trains	the	model,	while	the	train	method	simply
    uses	any	given	parameters.	(Read	the	SVM	class	documentation	for	more
    details	about	the	trainAuto	function	and	how	exactly	the	optimization	happens.
    We	can	set	the	kernel	type	to	LINEAR	using	the	following	code:
      svm->setKernel(SVM::LINEAR);
    Similarly,	we	can	use	the	following	code	to	set	the	kernel	type	to	histogram
    intersection:
      svm->setKernel(SVM::INTER);
	
   Here	is	the	result	of	the	same	data	segmented	with	the	histogram	intersection
   kernel:
3.	How	do	you	calculate	the	HOGdescriptor	size	for	a	HOG	window	size	of	128	by
96	pixels?	(The	rest	of	the	HOG	parameters	are	untouched.)
     HOGDescriptor	hog;
     hog.winSize	=	Size(128,	128);
     vector<float>	tempDesc;
     hog.compute(Mat(hog.winSize,	CV_8UC3),
     							tempDesc);
     int	descriptorSize	=	tempDesc.size();
   You	do	this	by	setting	the	UPDATE_WEIGHTS	flag	during	the	training.	Here's	an
   example:
     ann->train(trainData,	UPDATE_WEIGHTS);
5.	What	is	the	required	command	(by	using	opencv_createsamples)	you	need	to	use
to	create	a	positive	samples	vector	from	a	single	image	of	a	company	logo?
Assume	you	want	to	have	1,000	samples	with	a	width	of	24	and	a	height	of	32,
and	by	using	default	parameters	for	rotations	and	inversions.
     opencv_createsamples	-vec	samples.vec	-img	sign.png	-bg	bg.txt				
     		-num	1000	-w	24	-h	32
6.	What	is	the	command	required	to	train	an	LBP	cascade	classifier	for	the
company	logo	from	the	previous	question?
     opencv_traincascade	-data	classifier	-vec	samples.vec
     		-bg	bg.txt	-numPos	1000	-numNeg	1000	-w	24	-h	32
     			-featureType	LBP
and decreasing the number of stages far beyond its default value?
ISBN: 978-1-78829-768-4
ISBN: 978-1-78847-239-5