Arduino Computer Vision Programming - Sample Chapter
Arduino Computer Vision Programming - Sample Chapter
ee
Sa
pl
Preface
Computer vision is the next level of sensing the environment, especially for modern
machines. Many present-day devices try to mimic human capabilities with a limited
set of resources. Moreover, most of these imitations can be seen as insufficient
because generally they are too indirect to reflect real human capabilities,
especially in terms of vision.
Even though the variations of the conventional sensors are huge; they are incapable
of reflecting the power of human vision systems, which is one of the most complex
perception capabilities of human beings. So, we surely need the visual information to
make our electronic systems more intelligent. This is where computer vision starts.
A camera can be seen as the ultimate vision sensor, which is very close to
the human vision sensing system. However, the problem is that using a camera
as a vision sensor was simply too complex and very difficult in action. The purpose
of this book is to make computer vision easy by dividing a complex problem
into basic, realizable substeps. The best part is that we can make it easy for
real-life applications!
When we deal with real-life applications, there is no doubt that there should be
a way to interact with real life. Embedded systems are exactly standing for these
physical interactions. Arduino is one of the most popular embedded system
platforms that provides an easy way of prototyping with its huge community
and learning sources. Along with its key properties, which will be discussed in
detail later, Arduino is a perfect candidate for the physical life interaction of any
vision system.
Preface
Preface
General Overview of
Computer Vision Systems
In this chapter, you will learn about the fundamentals and the general scheme of
a computer vision system. The chapter will enable you to take a wide perspective
when approaching computer vision problems.
[1]
Images and timed series of images can be called video, in other words the computed
representations of the real world. Any vision-enabled device recreates real scenes
via images. Because extracting interpretations and hidden knowledge from images
via devices is complex, computers are generally used for this purpose. The term,
computer vision, comes from the modern approach of enabling machines to
understand the real world in a human-like way. Since computer vision is necessary
to automate daily tasks with devices or machines, it is growing quickly, and lots of
frameworks, tools and libraries have already been developed.
Open Source Computer Vision Library (OpenCV) changed the game in computer
vision and lots of people contributed to it to make it even better. Now it is a
mature library which provides state-of-the-art design blocks which are handled
in subsequent sections of this book. Because it is an easy-to-use library, you don't
need to know the complex calculations under-the-hood to achieve vision tasks.
This simplicity makes sophisticated tasks easy, but even so you should know
how to approach problems and how to use design tools in harmony.
[2]
Chapter 1
Any computer vision system consists of well-defined design blocks ordered by data
acquisition, preprocessing, image processing, post filtering, recognition (or detection)
and actuation. This book will handle all of these steps in detail with a practical
approach. We can draw a generic diagram of a computer vision system by mapping
the steps to the related implementation platforms. In the following diagram, you can
find a generic process view of a computer vision system:
Data acquisition
As can be seen, the first step is data acquisition, which normally collects the sensory
information from the environment. Within the perspective of the vision controller,
there are two main data sourcesthe camera, and the Arduino system.
[3]
The camera is the ultimate sensor to mimic the human vision system and it is directly
connected to the vision controller in our scheme. By using OpenCV's data acquisition
capabilities, the vision controller reads the vision data from the camera. This data is
either an image snapshot or a video created from the timed series of image frames.
The camera can be of various types and categories.
In the most basic categorization, a camera can give out analog or digital data. All
of the cameras used in the examples in this book are digital because the processing
environment and processing operation itself are also digital. Each element of the
picture is referred to as a pixel. In digital imaging, a pixel, pel, or picture element
is a physical point in a raster image or the smallest addressable element in an
all-points-addressable display device; so it is the smallest controllable element
of a picture represented on the screen. You can find more information on this at
http://en.wikipedia.org/wiki/Pixel.
Cameras can also be classified by their color sensing capabilities. RGB cameras are
able to sense both main color components and a huge amount of combinations of
these colors. Grayscale cameras are able to detect the scene only in terms of shades of
gray. Hence, rather than color information, these cameras provide shape information.
Lastly, binary cameras sense the scene only in black or white. By the way, a pixel in a
binary camera can have only two valuesblack and white.
Another classification for cameras is their communication interface. Some examples
are a USB camera, IP camera, wireless camera, and so on. The communication
interface of the camera also directly affects the usability and capability of that
camera. At home generally we have web cameras with USB interfaces. When using
USB web cameras, generally you don't need external power sources or the external
stuff that makes using the camera harder, so it is really easy to use a USB webcam
for image processing tasks. Cameras also have properties such as resolution but we'll
handle camera properties in forthcoming chapters.
Regular USB cameras, most often deployed as webcams, offer a 2D image. In
addition to 2D camera systems, we now have 3D camera systems which can detect
the depth of each element in the scene. The best known example of 3D camera
systems is probably Kinect, which is shown here:
[4]
Chapter 1
OpenCV supports various types of cameras, and it is possible to read the vision
information from all these cameras by using simple interfaces, as this issue is
handled by examples in the forthcoming chapters. Please keep in mind that image
acquisition is the fundamental step of the vision process and we have lots of options.
Generally, we need information in addition to that from the camera to analyze the
environment around us. Some of this information is related to our other four senses.
Moreover, sometimes we need additional information beyond human capabilities.
We can capture this information by using the Arduino sensors.
Imagine that you want to build a face-recognizing automatic door lock project. The
system will probably be triggered by a door knock or a bell. You need a sound sensor
to react when the door is knocked or the bell is rung. All of this information can be
easily collected by Arduino. Let's add a fingerprint sensor to make it doubly safe!
In this way, you can combine the data from the Arduino and the camera to reach a
conclusion about the scene by running the vision system.
In conclusion, both the camera and the Arduino system (with sensors) can be used
by the vision controller to capture the environment in detail!
Preprocessing
Preprocessing means getting something ready for processing. It can include
various types of substeps but the principle is always the same. We will now
explain preprocessing and why it is important in a vision system.
Firstly, let's make something clear. This step aims to make the collected vision data
ready for processing. Preprocessing is required in computer vision systems since
raw data is generally noisy. In the image data we get from the camera, we have lots
of unneeded regions and sometimes we have a blurry image because of vibration,
movement, and so on. In any case, it is better to filter the image to make it more
useful for our system. For example, if you want to detect a big red ball in the image,
you can just remove small dots, or you can even remove those parts which are not
red. All of these kinds of filtering operations will make our life easy.
Generally, filtering is also done in data acquisition by the cameras, but every
camera has different preprocessing capabilities and some of them even have
vibration isolation. But, when built-in capabilities increase, cost is increased in
parallel. So we'll handle how to do the filtering inside of our design via OpenCV.
By the way, it is possible to design robust vision systems even with cheap equipment
such as a webcam.
[5]
The same is valid for the sensor data. We always get noisy data in real life cases
so noise should be filtered to get the actual information from the sensor. Some of
these noises come from the environment and some of them come from the internal
structure of the sensor. In any case, data should be made ready for processing; this
book will give practical ways to achieve that end.
It should be understood that the complexity of image data is generally much greater
than with any regular sensor such as a temperature sensor or a humidity sensor.
The dimensions of the data which represents the information are also different. RGB
images include three color components per pixel; red, green and blue. To represent
a scene with a resolution of 640x480, a RGB camera needs 640x480x3 = 921600 bytes.
Multiplication by three comes from the dimension of each pixel. Each pixel holds 3
bytes of data in total, 1 byte for each color. To represent the temperature of a room,
we generally need 4 bytes of data. This also explains why we need highly capable
devices to work on images. Moreover, the complexity of image filters is different
from simple sensor filters.
But it doesn't mean that we cannot use complex filters in a simple way. If we know
the purpose of the filter and the meanings of filter parameters, we can use them
easily. This book aims to make you aware of the filtering process and how to apply
advanced filtering techniques in an easy way.
So, filtering is for extracting the real information from the data and it is an
integral step in the computer vision process. Many computer vision projects fail
in the development phase because of the missing layer of filtering. Even the best
recognition algorithms fail with noisy and inaccurate data. So, please be aware
of the importance of data filtering and preprocessing.
[6]
Chapter 1
[7]
The information which was extracted from the image will be used in the next step of
the computer vision system. Because this processing step will summarize the image,
it is very important to do this correctly to make the whole vision system work.
Again, you don't need to know the complex calculations under the hood. Instead,
you should know where and how to use image processing techniques to get valuable
small information sets from the scene. That is exactly what this book deals with in
the forthcoming chapters.
Recognition or detection
The main purpose of the vision system is to reach a conclusion by interpreting the
scheme via images or the image arrays. The way to the conclusion is recognition
or detection.
Detection can be counted as a basic form of recognition. The aim is to detect an
object or event. There are two types of conclusion. An object or an event either exists
or it doesn't. Because of this binary nature of conclusion it is a special classification
process with two classes. The first class is existence and the second class is nonexistence. "To be or not to be, that is the question."
Recognition is a more complex term which is also called classification and tells the
identification process of one or more pre-specified or learned objects or object classes.
Face recognition is a good example of such an operation. A vision system should
identify you by recognizing your face. This is generally a complex classification
process with multiple classes. In this case, each face is a class, so it is a complex
problem. But, thanks to OpenCV, we have lots of easy-to-use mechanisms for
recognition, even for complex problems.
[8]
Chapter 1
Sometimes, complex algorithms take a lot of time to finish. Similarly, in some cases,
very fast behavior is needed, especially for real-time performance requirements. In
such cases, we can also use simple but effective decision algorithms. As Leonardo da
Vinci says, "Simplicity is the ultimate sophistication". This book also will tell you about
how to build robust recognition systems by using simple design blocks.
Again, you should be aware of the aim of recognition or classification. This
awareness will show you the path which you should follow to succeed.
[ 10 ]
Chapter 1
We need a hand detector as the next step. We can use the number of fingers by
applying a skeleton analysis to the hand and by comparing the positions of the
fingers; we can classify the hand gesture.
If it is the right hand gesture, we can send the information to the Arduino door
unlock controller and it will unlock the door for a limited time to welcome the
authorized visitor!
You can apply all these principals to any problem to get familiar with it. Do not focus
on the algorithmic details now. Just try to divide the problem into pieces and try to
decide what properties you can use to solve the problem.
As long as you get used to the approach, this book will show you how to realize
each step and how you can find the right algorithm to achieve it. So, go on and try to
repeat the approach for a garage door open/close system which will recognize your
car's number plate!
Summary
We now know how to approach vision projects and how to divide them into
isolated pieces which make the realization of the projects much easier. We also
have some idea about how the complex tasks of vision systems can be achieved
in a systematic way.
We also talked about the reason and importance of each sub-step in the approach.
We are now aware of the key points in the approach and have a solid knowledge of
how we can define a solution frame for any computer vision problem.
Now, it is time to get your hands dirty!
[ 11 ]
www.PacktPub.com
Stay Connected: