2023 - Van - Den - Boogaard - Rik - Gerardus - Aalto University - Masters Thesis
2023 - Van - Den - Boogaard - Rik - Gerardus - Aalto University - Masters Thesis
Master’s Thesis
2023
© 2023
This work is licensed under a Creative Commons
“Attribution-NonCommercial-ShareAlike 4.0 Interna-
tional” license.
Author Rik van den Boogaard
Title The development of a Hardware-in-the-Loop test setup for event-based vision
near-space space objects
Degree programme Space Science and Technology
Major Space Robotics and Automation
Supervisor Prof. Esa Kallio
Advisor MSc. Olli Knuuttila
Date 29 May 2023 Number of pages 103 Language English
Abstract
The purpose of this thesis work was to develop a Hardware-in-the-Loop imaging
setup that enables experimenting with an event-based and frame-based camera under
simulated space conditions. The generated data sets were used to compare visual
navigation algorithms in terms of an event-based and frame-based feature detection
and tracking algorithm. The comparative analyses of the feature detection and tracking
algorithms were used to get insights into the feasibility of event-based vision near-space
space objects. Event-based cameras differ from frame-based cameras by how they
produce an asynchronous and independent stream of events caused by brightness
changes at each pixel instead of capturing images at a fixed rate.
I would also like to thank Maria Winnebäck, Anette Snällfot-Brändström, and Alli
Palojärvi for their support during my time in Kiruna and Espoo, particularly amidst
the Covid-19 pandemic.
This thesis has been done within the framework of the Joint International Program in
Space Science and Technology − SpaceMaster.
4
Contents
Abstract 3
Preface 4
Contents 5
1 Introduction 9
1.1 Visual navigation in space . . . . . . . . . . . . . . . . . . . . . . 9
1.2 Thesis objective and scope . . . . . . . . . . . . . . . . . . . . . . 10
1.2.1 Success criteria for the HiL test-bench . . . . . . . . . . . . 11
1.3 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.4 Document overview . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2 Literature review 13
2.1 Hardware-in-the-Loop review . . . . . . . . . . . . . . . . . . . . . 13
2.1.1 HiL description . . . . . . . . . . . . . . . . . . . . . . . . 13
2.1.2 HiL facilities and test-benches . . . . . . . . . . . . . . . . 14
2.2 Event-based sensors . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2.1 Event-based sensors historic timeline . . . . . . . . . . . . 17
2.2.2 Working principles . . . . . . . . . . . . . . . . . . . . . . 17
2.2.3 Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.2.4 DVS application field . . . . . . . . . . . . . . . . . . . . . 20
2.3 Frame-based sensors . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.4 Event-based sensor versus frame-based sensor . . . . . . . . . . . . 24
2.5 Conclusion of the literature study . . . . . . . . . . . . . . . . . . . 25
3 Theoretical framework 26
3.1 Event-based camera . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.1.1 Event-based camera lens . . . . . . . . . . . . . . . . . . . 27
3.2 Frame-based camera . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.2.1 Frame-based camera lens . . . . . . . . . . . . . . . . . . . 29
3.3 Comparison of camera parameters . . . . . . . . . . . . . . . . . . 30
3.4 Image formation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.5 Space-object type . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.6 Simulated space environment calculations . . . . . . . . . . . . . . 34
3.7 Camera calibration . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.7.1 Data format . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4 Setup development 43
4.1 System constraints . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.2 Electronic design . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.2.1 Development board . . . . . . . . . . . . . . . . . . . . . . 43
4.2.2 Motor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.2.3 Motor driver . . . . . . . . . . . . . . . . . . . . . . . . . 44
5
4.3 Mechanical design . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.3.1 Camera module . . . . . . . . . . . . . . . . . . . . . . . . 47
4.3.2 Space object module . . . . . . . . . . . . . . . . . . . . . 48
4.3.3 Iterations . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.4 Light source selection . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.5 Algorithm description . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.5.1 Computer vision task . . . . . . . . . . . . . . . . . . . . . 54
4.5.2 Frame-based feature detection and tracking . . . . . . . . . 54
4.5.3 Event-based feature detection and tracking . . . . . . . . . 57
4.5.4 Feature tracking performance metric . . . . . . . . . . . . . 59
4.6 Environment overview . . . . . . . . . . . . . . . . . . . . . . . . 60
6 Results 69
6.1 Experiment 1: side setup . . . . . . . . . . . . . . . . . . . . . . . 69
6.1.1 Experiment 1.1: 30 rpm . . . . . . . . . . . . . . . . . . . 70
6.1.2 Experiment 1.2: 55 rpm . . . . . . . . . . . . . . . . . . . 73
6.1.3 Experiment 1.3: 100 rpm . . . . . . . . . . . . . . . . . . . 76
6.2 Experiment 2: frontal setup . . . . . . . . . . . . . . . . . . . . . . 78
6.3 Experiment 3: behind setup . . . . . . . . . . . . . . . . . . . . . . 80
6.4 Discussion of results and closing remarks . . . . . . . . . . . . . . 81
6.4.1 Closing remarks . . . . . . . . . . . . . . . . . . . . . . . 82
7 Conclusions 83
7.1 Answers to the research questions . . . . . . . . . . . . . . . . . . 84
7.2 Proposed future work . . . . . . . . . . . . . . . . . . . . . . . . . 86
Bibliography 87
6
E Appendix: Overview of quantitative tracking results 102
7
Abbreviations
AER Address-Event Readout protocol
APS Active Pixel Sensor
AR0135 ArduCam 0135
ATIS Asynchronous Time Based Image Sensor
BMP Bitmap
CCD Charged Coupled Device
CMOS Complementary Metal Oxide Semiconductor
CSV Comma Separated Values
DAVIS Dynamic and Active Pixel Sensor
DC Direct Current
DRM Design Requirement Methodology
DVS Dynamic Vision Sensor
DoF Depth of Field
EPOS European Proximity Operations Simulator
FF Far Field
FOV Field of View
FPS Frames Per Second
HASTE multi-Hypothesis Asynchronous Speeded-Up Tracking of Events
HiL Hardware-in-the-Loop
IDE Integrated Development Environment
IMU Inertial Measurement Unit
KLT Kanade Lucas & Tomasi
LED Light Emitting Diode
LiDAR Light Detection and Ranging
NF Near Field
PLA Polylactic acid
RMS Root Mean Square
ROI Region of Interest
RPE Re-Projection Error
RPM Revolutions Per Minute
SLAM Simultaneous Localization and Mapping
SPI Serial Peripheral Interface
SSD Sum of Squared Differences
SiL Software-in-Loop
TRON Testbed for Robotic Optical Navigation
UA Astronomical Unit
USB Universel Serial Bus
VGA Video Graphics Array
VO Visual Odometry
WD Working Distance
8
1 Introduction
1.1 Visual navigation in space
During the last decade, the space industry has seen a significant rise in missions using
interplanetary spacecraft. Current and future space missions expect precise, safe and
autonomous landings in addition to performing rendezvous and docking maneuvers.
Such space missions are often completed at a distance reaching from Earth extending
up to several astronomical units. Therefore, the onboard navigation algorithms must
be tested and verified based on real-world and synthetic-simulated sensor behavior.
For space exploration missions, a camera is a navigation sensor that can provide
navigation information with only a few constraints at the spacecraft system level. The
conventional frame-based camera is frequently used in space exploration missions.
Frame-based cameras capture images at a fixed frame rate by recording the entire
scene at once. However, under low-light conditions, the frame-based camera is limited
in its dynamic range to capture these images.
In recent years, event-based cameras have attracted significant attention due to their
ability to only record changes in a scene when a change in brightness or contrast
occurs. However, little work has focused on applying the event-based camera to a
space context [1].
In order to increase the technology readiness level, testing and verifying such sensors in
simulated conditions will play a significant role in preparing for these space missions.
To do this, it will be necessary to verify image processing and vision-based navigation
algorithms. These algorithms interpret and extract information from the external
environment provided by the cameras, such as the spacecraft position. Therefore, it
is crucial to generate realistic image data sets in order to design, validate, and test
autonomous navigation algorithms [2].
Currently, space navigation algorithms are mostly developed using synthetic data.
However, these synthetic events are currently unable to recreate noise comparable to
that the noise caused during real-world applications [3].
9
1.2 Thesis objective and scope
Since future space missions will rely more on precise methods for ensuring safe
autonomous landings, as well as rendezvous and docking maneuvers in a space
environment, we need to verify both cameras and visual navigation algorithms under
simulated space conditions. To increase the technology readiness level of these
maneuvers, it is important to test and verify camera sensors, including algorithms to
interpret and extract information from the external environment in simulated conditions.
The goal of this thesis is to design and manufacture a Hardware-in-the-Loop (HiL)
test bench, as well as to generate a data set and analyze this data set using an image
processing algorithm. The test bench should enable testing of both event-based and
frame-based cameras in a space-simulated environment. In addition, the setup must
be able to recreate the conditions during an exploration mission to an asteroid for near
object observation. The generated data output of these cameras during testing will be
compared quantitively to evaluate the feasibility of an event-based camera for use in
space.
Thus, the thesis will aim to answer the following research question:
2. Which factors influence the generation of a data set from a frame-based and event-
based camera during Hardware-in-the-Loop experiments under space-simulated
conditions?
This thesis will focus on the design, manufacturing and implementation of the HiL
hardware, which includes interfaces, sensors and drivers. The camera systems are
provided by Aalto University and are used by the author. The development of software
algorithms is excluded from the scope of this thesis because of time constraints.
Instead, pre-existing algorithms will need to be modified, since these algorithms are
defined as research grade and are therefore often still under development and difficult
to interpret. In addition, the project is operationally bounded. The project will occur
in a darkroom at Aalto, so the setup is dimension specified. The budget is another
operational constraint which is set to a few hundred euros. Next to that, there is a time
constraint. Due to the nature of this study, which covers a broad number of topics, a
decision is made to obtain a basic proficiency in each subject instead of mastering
just a few topics. Adding to that, the setup is required to be operationally flexible. In
other words, the setup must be able to move to other darkrooms and deployed in a
reasonable time.
10
1.2.1 Success criteria for the HiL test-bench
For this HiL testbench project, the following criteria have been chosen for evaluating
the success of the designed HiL test bench:
• A quantitative comparison between the camera data sets assessing the feasibility
of an event-based camera for navigation in the proximity of an asteroid.
1.3 Methodology
To comply with the success criteria, the project type needs to be defined. Since
the developed HiL test bench requires testing cameras under space conditions, the
thesis can be called a combined exploratory and descriptive case study. The reason
for using an exploratory case study is that little is known about the behavior of
the event-based camera under real-world simulated space conditions. Meanwhile a
descriptive case study is used to specify and focus on describing the system. Combining
these methodologies gives a preliminary understanding of the development of an HiL
imaging system, including the behavior of the tested camera using data analysis.
Following this methodology results in the following steps, which are visualized in
Figure 1:
3. Collection of data: the generated data from the developed HiL imaging test-bench
is collected using available research grade computer vision algorithms.
5. Conclusions and recommendations: These are based on the analysis of the data
and overall design process of the HiL imaging system.
11
Figure 1: Summarized project methodology used in this work.
12
2 Literature review
In this chapter, a literature review will be conducted covering the main topics of this
thesis. It starts with a general description of existing HiL facilities (2.1). Subsequently,
attention will be given to the working principles of the event-based sensors (2.2) and
frame-based imaging sensors (2.3), since these sensors are tested in this thesis. Finally,
a comparison will be made based on the theoretical description of the event-based and
frame-based camera (2.4).
HiL facilities are a means to verify and validate advanced sensors such as cameras
as well as complex algorithms [4]. These facilities are typically utilizing image
processing and vision-based navigation algorithms and are classified into two types of
HiL facilities [5]: (1) robotic and (2) optical.
A robotic HiL facility typically comprises one or more robotic components, sensors, 3D
models, and a bright light source. The sensor and 3D model can be either stationary or
move dynamically in relation to each other. Typically, the robotic arm with a mounted
camera moves while the object is static. In HiL facilities for space applications, a bright
light source is used and pointed toward the object to simulate the Sun. The phase angle
of the light source to the object model is essential since this determines the reflection
and shadow regions on the object. Furthermore, the position and orientation of the
sensor in relation to the object are important as it is used in vision-based navigation
algorithms to detect local features. These features can, in turn, be used to navigate
near celestial bodies, for example, to locate an area to land. The 3D model is the next
crucial component of the robotic HiL facility. It is a scaled physical representation
of the studied object. For the sensor algorithm to detect features, a scale model must
have a sufficient resolution to detect details. The disadvantage of the robotic facility
type is that they are costly due to the required robotic components. In addition, the
robotic facility can often not emulate deep-space scenarios since deep-space objects
still need a detailed physical description.
The optical HiL facility type consists of a camera observing a scene on a digital
screen via a lens system. The digital screen provides a high-fidelity rendering of a 3D
model to stimulate the camera. The camera produces real-world images of the digital
scene, which can be used to evaluate the performance of the hardware in the simulated
environment. The acquired image is limited by the resolution of the digital screen that
stimulates the camera.
13
While each facility type has its advantages and disadvantages, they can complement
each other to create an optimal facility to test hardware. An example of combining the
HiL facilities is using a camera to observe a scale 3D model in a blackout environment.
Utilizing either HiL version in a space context can reduce the risk of mission failure
due to a camera or algorithm failure since both the camera and algorithms can be
intensively tested in a HiL environment.
Starting with a HiL facility developed by the German Aerospace Center (DLR)
called Testbed for Robotic Optical Navigation (TRON). This robotic facility has
been designed to simulate lunar landings under different geometric, dynamic, and
optical conditions [4]. And therefore, its main purpose is to increase the technological
readiness level of optical navigation technology. Hardware that can be tested at TRON
are active and passive optical sensors. The main features of this facility are the three
different surface models, including one of the moon with a 1:10000 scale factor, which
is used to confirm the conclusion obtained during Software-in-Loop (SiL) testing.
Besides the sensors, robots, and terrain models, it simulates the space environment
by utilizing a blackout system, an anti-reflection system and a lighting system. The
blackout system comprises moving curtains, which can be closed to exclude light
coming from outside the lab. At the same time, the anti-reflection system aims to avoid
secondary lighting originating from a light inside the test environment. In addition,
all walls are painted black and the floor is covered with black curtains. Figure 2 shows
an overview of this setup.
Performing tests with this HiL test bench starts in the simulated world. The physical and
geometrical properties of the celestial bodies are simulated in a virtual environment.
The output from this virtual environment is a render containing a sample of a synthetic
environment. This render is shown on the digital screen, which stimulates the camera.
In this way, the camera can capture real-world images of the synthetic environment to
verify and validate image processing and vision-based navigation algorithms [6].
14
Figure 2: A photograph of the Testbed for Robotic Optical Navigation (TRON) facility
[4].
In addition to HiL facilities that observe and navigate to asteroids, facilities that
focus on docking orbital satellites exist. One such facility is the European Proximity
Operations Simulator (EPOS). The facility consists of two robotic manipulators with
each having six degrees of freedom as well as a linear slide of 25 meters long on which
one robot can be move. A computer-based monitoring and control system enables the
robot’s movement, as seen in Figure 4.
15
Compared to the previously described HiL facilities, EPOS is a HiL facility that uses
satellite models mounted on the robotic arm to simulate docking and rendezvous using
non-cooperative or cooperative sensors instead of celestial models.
16
2.2 Event-based sensors
In this sub-chapter, the event-based history (2.2.1), working principles (2.2.2), camera
types (2.2.3) and application fields are discussed (2.2.4).
In 1991, Misha Mahowald and Carver Mead developed the first concept of an event-
based vision sensor [8]. Their concept went by the name: silicon retina. The silicon
retina includes adaptive photo-receptors, self-timed communication and a spatial
network simulating the neural architecture of a human eye. The human eye-based
model used in the silicon retina added a lot of complexity resulting in performance
issues. The complexity was caused by each wire-wrapped retina board requiring a
precise adjustment of bias setting, which resulted in a significant mismatch in pixel
response. Due to the complexity, their concept device was not applicable in real-world
scenarios.
Fast forward to 2004, when Zaghoal and Boahen made significant progress in event-
based sensors by creating a device that could capture the adaptive features of the
biological retinas [9]. However, despite this improvement, the devices experienced
issues such as poor latency characteristics, poor brightness sensitivity, and poor dynamic
range. Until 2004, event-based sensors were driven by imitating the biological eye as
accurately as possible.
Following the early 2000s, research continued the development of event-based sensors
that could be used in practical applications like computer vision. This research trend
led to the development of multiple new sensors, some in cooperation with conventional
frame-based sensors. The first commercial event-based camera, the DVS128 [10],
was created in 2008 and had a resolution of 128x128 pixels. In contrast to previous
event-based concepts, this camera focused on performing the task of the human eye
system instead of replicating the human eye itself. From this point forward, interest in
applying the event camera rises. Nowadays, commercial companies such as Sony and
Samsung have taken notice of the event-based sensor advantages resulting in a shift
from the academic world into the commercial world. These companies are heavily
investing in event-based camera research and mass-producing event-based cameras.
At the same time, the academic world has seen a significant rise in published papers
on the event-based vision topic, as shown in Figure 5.
The working principle of event-based cameras, also known as Dynamic Vision Sensors
(DVS), respond asynchronously and independently to brightness changes in the scene
for each pixel. The DVS produces a stream of digital spikes called events, which vary
in data rate. Each event is triggered by a change in brightness exceeding a user-defined
threshold. In uniform-lighting scenes, an event sensor perceives brightness as a
measure of log intensity. The DVS pixels remember the brightness magnitude each
17
Figure 5: Rise of research papers related to event-based cameras [11].
time it detects an event and continuously monitor for any variation in magnitude
relative to the previously stored brightness intensity value. If the magnitude change
surpasses a threshold, the camera produces an event. The camera transmits an event
from the chip with a location (x,y), time (t), and the polarity (p) of the change in
brightness. The polarity is a 1-bit variable that encodes the event as (1, ‘On’) when the
brightness increases and returns as (-1, ‘Off’) when the brightness decreases. After
an event occurs, the ON and Off thresholds return to the reset level awaiting another
event in the form of a brightness change to occur, as shown in Figure 6.
18
2.2.3 Types
The above stated working principles resulted into three major commercial available
categories of event-based sensors:
The DVS was developed based on a frame-based silicon retina design. A DVS aimed
to couple the continuous-time photo-receptor to a readout circuit that could reset every
time a pixel was sampled. The generated stream of events by each pixel of the DVS
camera indicates a brightness change [13]. However, the ATIS event camera has been
developed to expand the application cases of event-based cameras by including the
generation of absolute brightness information. The ATIS has pixels that include a
DVS subpixel which triggers another subpixel to read out the absolute intensity. Figure
7 shows the event streams of the DVS and an ATIS camera, where the colored dots
represent the events generated by observing a rotating dot.
The ATIS event camera has a disadvantage since each pixel is at least twice the pixel
size of a DVS pixel. However, the continued development of both DVS and ATIS
resulted in the creation of the now widely-used DAVIS camera [15].
The DAVIS combines the DVS and an active pixel sensor (APS) in the same pixel,
resulting in a significant pixel size reduction compared to the ATIS. The advantage
of the DAVIS designed by Brandli et al. [16] is that the sensor shares a photocurrent
between the synchronous readout of APS frames and the asynchronous detection of
brightness changes (DVS events). In this way, it uses the same photodiode to generate
both frame and event data, which means that the DAVIS can trigger an APS frame at
19
a constant frame rate while analyzing the DVS events. The DAVIS camera is most
frequently used in the event-based scientific community due to the combination of
events and frames. Figure 8 represents the data output of such a DAVIS camera. It
shows the principle of capturing frames at a specific time interval while the DAVIS
sensors continuously capture events indicated by the blue dots between the frames.
However, the limiting factor of the DAVIS is that the APS readout has a limited
dynamic range similar to other frame-based cameras. Thus, the APS is limited to a
maximum dynamic range of 55 dB, whereas the DVS outperforms it ten-fold. Later
on, a detailed description of the event camera used during this project is included,
Section 2.2.
Since all three of the above designs include DVS pixels, i.e., brightness change detector,
they all refer to their binary-polarity event output as DVS output.
The current space-related event-based vision research focuses on using simulators that
emulate the behavior of an event-based camera in space. The simulator generates an
event-based output by rendering synthetic frames of a certain space environment and
20
comparing the corresponding pixels’ intensity. One example can be found in the study
by Sikorski et al. [1], where simulated event-based sensors are used for autonomous
vision-based landings. The vision-based landing techniques rely on optical flow
measurements generated by the PANGU image generator. These generated synthetic
images are converted into events by the event-based simulator. The research shows
that Time-to-Contact can be estimated from the output of an event-based camera with
sufficient precision, thus paving the way for a purely event-based landing system.
Another space-related navigation study using a DAVIS camera, titled: "Exploring the
application of event-based odometry for a planetary robot" has shown that event-based
odometry outperforms a frame-based visual-inertial odometry algorithm by 32 percent
[18] based on challenging benchmarks. Unlike the previously mentioned study that
used simulation data, this research captures DAVIS data from different environments.
The data was processed using the Asynchronous, Photometric Feature Tracking using
Events and Frames (EKLT) feature tracker with an Extended Kalman filter (EKF) to
fuse feature tracks with Inertial Measurement Unit (IMU) measurements to provide
pose estimates. The EKLT uses the simultaneous captured capture events and frames
together for the feature tracker. In this way, the EKLT manages to track features
between the frames captured at a fixed frame rate. Both research papers focused on
using event-based cameras on planetary robots but have different approaches.
21
2.3 Frame-based sensors
Digital image sensors have had significant advantages since the first image was captured
in 1920. Modern frame-based cameras use one of two main types of sensors: the
charged coupled device (CCD) or the active pixel sensor (APS), also known as the
complementary metal oxide semiconductor (CMOS) APS sensor. The fundamental
working principle is the same, both convert light (photons) into electrical signals
(electrons) [19].
However, the charge in a CCD sensor is transported across the chip and read at one
corner of an array, where an analog-to-digital (A/D) converter converts each pixel
value into a digital value. In other words, the A/D conversion in a CCD sensor happens
outside the sensor. In contrast, the CMOS uses an A/D converter for every column of
pixels (see Figure 9). As a result, the bottleneck of the CCD, which is the limitation of
the ultimate frame rate, is eliminated. The total frame rate of a CCD sensor slows as
the number of pixels to be transferred increases.
In contrast to the CCD sensor, the CMOS sensor uses three transistors per pixel to
transfer and amplifies the charge using traditional wires. As described above, the
CMOS sensor uses an A/D converter for each column of pixels, resulting in many
parallel A/D converters sharing the workload. This parallelization leads to a slight lag
between each row readout. Therefore the CMOS method is more flexible since each
pixel can be read out individually [20].
Figure 9: Frame-based camera working principles (left) CCD architecture, and (right)
CMOS architecture [21].
Where in the past, mechanical shutters were used to let light through on the sensors,
nowadays, the sensors are equipped with electronic shutters. The electronic shutter can
be divided into two main read-out categories, the global shutter and the rolling shutter.
Both types expose light on the sensor by activating and deactivating pixels at certain
times. These read-out modes impact exposure time, illumination synchronization,
spatial distortion, and noise levels.
22
In short, the rolling shutter sensor does not expose an image as a whole but with a
time delay according to the rows and columns discussed earlier. Therefore, the image
is not read out for all pixels simultaneously but sequentially. In comparison, the global
shutter exposes the full image at the same time, as shown in Figure 10.
Figure 10: An illustration of the frame-based camera exposure methods: (left) rolling
shutter read-out, and (right) global shutter read-out [22].
As a result of the different conversion methods, the global shutter does not have a
problem called the rolling shutter effect [22]. The effect results from the rolling shutter
still reading out the previous position of an object while the object has already moved
to a new location. When the columns are combined into an image, it turns out that the
image is distorted.
23
2.4 Event-based sensor versus frame-based sensor
As discussed earlier in this chapter, event-based and frame-based cameras have different
methods of generating data from a scene. A theoretical comparison between the
cameras can be made based on these differences. Figure 11 shows the data output
difference. The top part of the figure shows data generation of the frame-based camera.
In contrast, the bottom part shows the event-based camera output. When the object is
not moving, the event-based camera will not generate any output since no brightness
change is detected.
According to the survey of Gallego et al. [3], event-based cameras have multiple
potential advantages over the standards cameras: First of all, the event-based cameras
can monitor the brightness changes fast in analog circuitry and the read-out of the
events (digital), resulting in a microsecond resolution. Therefore cameras can capture
swift motion without suffering from motion blur. In other words, an event-based
camera has a high temporal resolution.
Next to that, the event-based camera has a low latency since each pixel works on its
own, and there is no need to wait for global exposure of the frame. This means as soon
as the brightness change is detected, an event is transmitted.
The third potential benefit of event-based cameras over frame-based cameras is the
power consumption. Since the event-based camera only transmits brightness changes
and therefore doesn’t transmit redundant data of the captured scene. The event-based
camera only uses the power to process the changing pixels.
Lastly, the high dynamic range of event cameras exceeds the high-quality frame-based
cameras. This difference comes from the fact that the photoreceptors of each pixel
operate on a logarithmic scale and works independently compared to frame-based
cameras, which have to wait for a global shutter. And therefore, the event-based camera
can generate information from moonlight to daylight.
24
On the other side, since the event-based cameras are based on an entirely new
acquisition of visual information, they also have challenges to overcome. The
conventional computer vision algorithms are not directly applicable to the event-based
camera output since the events are asynchronous and sparse instead of synchronous
and dense. Therefore, the conventional matured vision algorithms cannot be directly
applied to the output of an event-based camera. In addition, the brightness change not
only depends on the scene brightness, like in the frame-based cameras, but also on the
current and past relative motion between the scene and the camera. Another major
challenge is dealing with noise since limiting noise is a trade-off between camera
characteristics.
25
3 Theoretical framework
This chapter describes the event-based (3.1) and frame-based (3.2) camera provided
by Aalto University, including a camera characteristics comparison (3.3). After both
cameras are introduced, the image formation is discussed (3.4), followed by the model
chosen to represent the space-object (3.5). The chapter concludes by presenting the
space-environment scaling calculations (3.6).
Figure 12: An image of the Prophesee Gen 3.1 event-based vision sensor
(PPS3MVCD) with an M0814-MP2 lens [24].
The sensor has a diagonal image size of 12 mm with a video graphics array (VGA) of
640 pixels horizontally and 480 pixels vertically. Each pixel detects temporal contrast
changes of light exceeding a user-defined threshold and has a size of 15 micrometers by
15 micrometers. The detected brightness changes are encoded into so-called "events"
that asynchronously transmit the edge detected via the Address-Event protocol. In
short, according to Shih et al. [25], AER circuits are used to multiplex communication
for a cluster of neurons into an individual communication channel.
To theoretically understand the AER readout procedure, Figure 13 (left) is included to
visualize it. When a pixel detects a change of light exceeding a predefined polarity
threshold (1), it triggers a request in the y-dimension readout circuitry (2). The
y-readout latches the y-address (3). The y-readout will send an acknowledgment to the
pixel (4), which confirms the y-address of the detected event. Similarly, the x-address
is generated (5-6). After the x-readout sends an acknowledgment back to the pixel,
the pixel resets and is ready to detect another event (7). In order to achieve high data
throughput, the readout circuitry of the event-based sensor aims to lock the x and y
addresses as well as their polarity and transfers them to the parallel output data bus.
In addition to the AER readout circuitry, the sensor comprises circuitry for pixel
biasing, region of interest (ROI) selection, and serial peripheral interface (SPI).
26
Figure 13: Visualization of the address-event Representation procedure (left) and
Prophesee event-based sensor circuitry (right) [26].
Figure 13 (right) shows the relation between the circuitry. The bias generator block
is responsible for ensuring the functioning of the event-driven pixel. Its biasing
parameters, such as contrast threshold and pixel bandwidth, can be tuned to set the
performance levels. Next up is the pixel ROI selection circuitry. This circuitry enables
the user to select multiple rectangular ROIs in the pixel array. Only pixels within this
ROI will send data. The SPI is embedded within the vision sensor. It enables the read
and write registers, which control most other blocks. For example, it controls the clock
frequency, clock duty cycle, and global luminance features.
Based on the sensor description, Prophesee themselves [26] indicate that the main
features of this Prophesee sensor are the independent pixels, low latency response,
low power, and data throughput dedicated to relevant data.
The camera lens mounted on the event-based vision sensor is the M0814-MP2. This
lens has a focal length of 8 mm and a backward focal length of 10.1 mm. The lens
has a manual aperture range from F1.4 extending up to F16.0. The working distance
ranges from 100 mm until infinity with an angle of view of 67.1 degrees diagonally.
Each parameter is used to derive the environment scaling parameters, see Section 3.6.
27
3.2 Frame-based camera
The provided conventional frame-based camera is the ArduCam 0135 (AR0135), as
shown in Figure 14.
This camera has a sensor size of 6 mm and a 1.2 Megapixel CMOS digital image sensor
with an active-pixel array of 1280 horizontal and 960 vertical [28]. The AR0135 is a
progressive-scan sensor that generates pixel data at a constant frame rate. The user
can interact with the sensor via a two-wire serial bus. This bus communicates with the
control register, which includes array control, the digital signal chain, and the analog
signal chain. The signal chains represent interconnected components that process,
analyze, or transmit digital and analog signals.
At the core of the AR0135 is the APS array uses a global shutter to capture moving
scenes in images accurately. The exposure of the arrays is controlled by the integration
time register setting, which can be programmed. With the global shutter technology,
all rows are simultaneously integrated before readout, as Section 2.3 explains. Figure
15 illustrates the relationship between the functional blocks.
Figure 15: Diagram of the frame-based Arducam 0135 working principles [28].
The sensor can operate either in default mode or be programmed by adjusting the
frame size, gain, and exposure [27]. In default mode, the output is a full-resolution
28
image at 54 frames per second (fps), either in video or frame trigger mode. However,
to enhance the functionality of the AR0135 camera in a space-simulated environment,
multiple parameters can be adjusted, including (1) windowing, (2) exposure control,
(3) gain, and (4) readout modes. The features of each parameter are described below:
1. Windowing. Using window control allows the user to configure the window
size and blanking times, which allows for choice within a range of frame rates
and resolutions.
2. Exposure control. The integrated automatic exposure control may ensure optimal
exposure and gain settings are computed and updated every other frame. The
integrated automatic exposure control is responsible for ensuring that optimal
settings of exposure and gain are computed and updated every other frame. The
exposure duration can be manually set by adjusting the course integration time.
3. Gain. In other words, the amplification of the signal from the Ar0135 sensor.
Digitally amplifying the signal also means amplifying the background noise
since the signal has already passed the A/D conversion. In contrast, the analog
gain increases the signal before the A/D conversion, reducing the background
noise.
4. Readout modes. The modes are called: row skipping and digital binning. The
latter means that all pixels in the field-of-view (FOV) contribute to the image
output, which could reduce artifacts and also improve low-light performance.
While row skipping uses selected rows from the FOV. In the skipping mode,
complete rows of pixels in the image are not sampled. This results in a
lower-resolution output image.
To achieve the desired frames per second of at least 36 fps and keep it constant means
trading off the features mentioned above. The trade-off is made during the initial test
of the setup. See Chapter 4.
The camera lens used on the AR0135 sensor is the BL-03618MP13IR, which has
a focal length of 3.6 mm and a back focal length of 6.59 mm. The lens has a fixed
aperture of F1.8. The minimum focus distance is 200 mm, and the lens has a field of
view of 79.6 degrees diagonally [29].
29
3.3 Comparison of camera parameters
It is necessary to know the camera and lens parameters to calculate the scaling
parameters for a scaled space environment. For that reason, the main camera parameters
of both cameras are stated side-by-side in the figure below (see Figure 16). These
parameters are used in the next section to calibrate the cameras and calculate the
scaled dimensions of the space environment and the object.
Figure 16: An overview of the main event-based and frame-based camera parameters
used in this work.
30
3.4 Image formation
The process of creating an image involves the projection of the 3D world onto the
2D surface. This transformation is called perspective projection and can be achieved
using camera models. Two main models can be distinguished: the pinhole and thin
lens models.
The lens model, shown in Figure 18, uses a lens instead of a pinhole. The lens refracts
all light rays from a scene and converges them to a single point on the image plane.
This means every lens has a specific distance for which an object appears to be in focus.
Thus, one will likely be blurry if two image points are taken at different distances.
Both the event-based and frame-based cameras, as described in Section 3.1 and 3.2,
use a lens to capture a scene. They both have specified focal points, also known as
convergence points. At this point where all light rays traveling parallel to the optical
axis are converted into one point. The distance between the focal points and the center
of the lens is referred to as the focal length (f). The focal length is determined by the
design of the lens and, therefore, is a constant parameter.
In addition to the focal point and focal length, the concepts of focusing and depth of
field are crucial to forming an image. In an ideal case, a group of light rays from a point
31
in the scene meet at a point in the image. When focusing the light rays incorrectly, the
rays will develop blurred rings around the object in focus called the airy disk. The airy
disk indicates the theoretical resolution limit in a camera [31]. The airy disk in pixels
can be calculated by: airy disk = (2.44 * ( 𝑓 /#) * 𝜆) divided by the pixel size, where
𝑓 /# is the f-stop and 𝜆 is the wavelength of light. The f-stop is an essential camera
parameter since it determines how much light enters the camera sensor and reaches
the image sensor. The aperture is measured in f-stops, which is the ratio of the lens’s
focal length relative to the aperture size. A low f-stop indicates a large aperture and a
large f-stop means a small aperture. By adjusting the aperture size, the airy disk can
be minimized.
In general, when the airy disk increases beyond the target resolution, adjustments to
the distance between the lens and the image plane is require to ensure proper focus on
the object. In the context of this thesis, a lenient target resolution of three pixels is
used. The range of distances between the near and far focus distances is called the
Depth of Field (DoF). The DoF is affected by the aperture size, focal length, and
working distance. If the DoF is large, the lens uses a small aperture setting. This way,
the lens recreates the pinhole model. However, this results in less light and noisier
images. On the other hand, if the DoF is small, the background will appear blurred.
This makes it more difficult to keep a subject in focus. In the context of this thesis,
the DoF is essential since the object model needs to fit within the DoF and remain in
focus when capturing it under low light conditions. Figure 19 shows the fundamental
parameters to capture images.
The working distance is another crucial parameter to remember when calculating the
DoF. The working distance is the distance between the camera lens and the captured
object. This parameter determines how close the camera can be to the object without
losing focus. The upcoming section introduces the space object type that will be
observed in this thesis.
32
3.5 Space-object type
A small solar body is chosen for this space observation simulation since they have
irregular surfaces and often rotates and moves through space, making it challenging
to track. A small solar system body is an object in the solar system that is not a
planet, dwarf planet, nor a natural satellite. It includes moons, asteroids, comets,
and icy objects in Asteroid and Kuiper Belt [33]. The asteroids are chosen for this
imaging simulation. Asteroids, also called minor planets, are rocky remnants from the
solar system’s formation. Most of the asteroids are irregularly shaped [34]. However,
some are nearly spherical, but all have crates, pits and boulders. The asteroid rotate,
sometimes quite erratically. In order words, they are tumbling as they go.
The selected asteroid for the visual navigation simulation using an event-based and
frame-based camera is Bennu. Bennu is selected for this simulation since it is one of
the better-researched and imaged asteroids together with asteroids Itokawa and Ryugu
(s/c Hayabusa 1 & 2). The research on Bennu has been carried out by the OSIRIS-REx
mission [35]. OSIRIS-Rex studied Bennu at close range for three months. From the
data gathered with the camera suit, a high-resolution 3D model has been created
revealing features on Bennu which are smaller than one meter [35]. These features
can be detected and tracked using a computer vision algorithm. In addition, Bennu is
one of the nearly spherical asteroids known and has a relatively slow rotation period
of 4.3 hours. It does include significant craters that can be used to track features. A
photograph taken during the OSIRIS-REx mission is shown in Figure 20.
Figure 20: Photograph of Bennu captured during the OSIRIS-REx mission [35].
33
One of the reasons to choose Bennu is that it is a potentially hazardous near-Earth
B-type asteroid with an approximate equatorial diameter of 525 meters. Its volume is
20 to 40 percent empty, and it has an increasing rotation period of 4.3 hours. The latter
means that the asteroid spins on its axis and completes one rotation in that period. A
B-type asteroid is a primitive, carbon-rich asteroid that is expected to have organic
compounds and water-bearing minerals. It completes a full orbit around the Sun every
1.2 years, and once every six years, it comes within 0.002 Astronomical Units (UA)
from Earth. This close encounter with Earth gives it a high probability (1-in-2700) of
impacting Earth in the late 22nd century (year: 2175-2199) [33].
𝐿 2 ×𝐷
𝑑× 𝑓
𝐹𝐹 = , (2)
𝐿2
𝑑× 𝑓 2
+ (𝐷 + 𝐿)
where L is the focal length, d is the diameter of the circle of confusion, f is the f-stop,
and D is the distance to the subject. The difference between Equation 1 and Equation
2 is in the denominator. The near-field focus is defined by subtracting the camera
hyper-focal distance parameter from the object distance, whereas the far-field focus
is determined by adding the hyper-focal distance to the object distance. Using the
near-field and far-field focus points, it is possible to calculate the DoF by Equation 3
𝐷𝑜𝐹𝑐𝑎𝑚𝑒𝑟𝑎 = 𝐹𝐹 − 𝑁 𝐹. (3)
Based on these three Equations (1, 2, 3), the following observation can be made. The
DoF increases when the distance to the object or the f-stop increases. On the other
hand, when the sensor size and focal length increase, the DoF decreases.
Knowing these three camera parameters (NF, FF, DoF), the first scaling variable
can be computed using Equation 4. This scaling factor (K) indicates which factor is
required for the asteroid DoF to fit into the camera DoF.
𝐷𝑜𝐹𝑇 𝑎𝑟𝑔𝑒𝑡
𝐾 𝐷𝑜𝐹 = . (4)
𝐷𝑜𝐹𝐶𝑎𝑚𝑒𝑟𝑎
34
The 𝐷𝑜𝐹𝑇 𝑎𝑟𝑔𝑒𝑡 is defined as the required range of distances the asteroid fits. The
equatorial diameter of Bennu and the camera’s DoF are essential to define the scaled-
down space environment. The equatorial diameter of Bennu is 525 meters. Regarding
DoF, only half of the sphere is required to be visible in the scene. Therefore, the target
𝐷𝑜𝐹𝑇 𝑎𝑟𝑔𝑒𝑡 is defined as 262.5 meters. To determine if the camera DoF is sufficient, the
scaling factor of the closest environment distance needs to exceed the scaling factor
that fits the asteroid in the camera DoF.
Knowing the scaled minimum distance from the camera to the object, it is possible to
use Equation 6 to calculate the ratio of the real-world minimum distance to the scaled
minimum distance
𝑅𝐷𝑖𝑚 𝑚𝑖𝑛
𝐾 𝑑𝑚𝑖𝑛 = . (6)
𝑆𝐶𝐷𝑖𝑚 𝑚𝑖𝑛
Here 𝑅𝐷𝑚𝑖𝑛𝑠𝑚𝑖𝑛 is the assumed real-world minimum distance to the object, while
the 𝑆𝐶𝐷𝑖𝑚 𝑚𝑖𝑛 is the scaled minimum distance to the scaled object. By applying the
inverse minimum distance scaling factor to the real diameter, Bennu’s scaled size of
Bennu can be derived through Equation 7.
𝐷 𝑜𝑏 𝑗 𝑒𝑐𝑡
𝑆𝑐𝑎𝑙𝑒𝑑𝑂𝑏 𝑗 𝑒𝑐𝑡𝑆𝑖𝑧𝑒 = . (7)
𝐾 𝑑𝑚𝑖𝑛
The equatorial diameter of Bennu labeled by 𝐷 𝑜𝑏 𝑗 𝑒𝑐𝑡 and 𝐾 𝑑𝑚𝑖𝑛 represents the scaling
factor of the minimum distance to Bennu during this observation mission. Applying
these imaging calculations on the test bench creates the simulated space environment.
Figure 21 presents the complete calculation results, including the above-mentioned
depth of field, working distance, distance scaling factors, and scaled object size. The
main parameters to capture the scene are the working distance of the cameras to the
object and the object size.
Thus, the optimal parameters are found by trial and error with the non-fixed variables
are displayed in Figure 21. It can be noticed that the scaled environment will be
created based on an assumed real-world minimum distance of 2400 m, a maximum
distance of 5000 m, and a camera working distance of 0.70 m. The scaled asteroid is
adjusted to fit within the camera depth of field by adjusting these three parameters.
According to the calculation above, the object size also increases when the working
distance increases over 70 cm. It would be ideal to produce the largest possible object
a 3D-printed could print at the smallest working distance. Due to the fact, that the
35
wide field of view of both the frame and event-based camera, the asteroid appears
smaller when farther away. Therefore, the largest model size together with the smallest
useable working distance, is found by trial and error. Resulting in a working distance
of 70 cm and a 3D model of 16.44 cm.
When using a working distance of 70 cm, adjusting the minimum and maximum
distance to the object directly impacts the object size. The object size will decrease
when the minimum distance increases. In contrast, decreasing the minimum distance
results in a lack of camera depth of field. Conversely, when the maximum distance
to the object is increased, the object will not fit in the camera’s depth of field, while
decreasing it leads to reducing the object size.
To summarize the findings, it can be concluded that the size of the asteroid as well as
the minimum and maximum distance to the object, are determined by the working
distance. The values used are (1) minimum distance to the object: 2400 m, (2)
maximum distance to the object: 5000 m, (3) working distance: 70 cm, (4) asteroid
scaling factor: 3192.66, (5) asteroid size: 16.44 cm.
Although these distance parameters are not necessarily creating a realistic observation
mission, it is chosen since it produces the best possible images and events to analyze.
Due to the variation in the camera parameter between the frame-based and event-based
sensors, the scaled asteroid model has two distinct values. However, since this project
mainly focuses on the performance of the event-based camera, the sizing is according
to those camera parameters. The depth of field for the event-based camera is 0.09 m,
which means reducing the working distance further will cause the depth of field to
become too shallow. As a result, the required half diameter of the asteroid does not fit
within the depth of field. In addition, it should be noted that using a minimum distance
to the object of 3100 m instead of 2400 m for the frame-based camera result in a scale
object diameter of 16.44 cm. Since these values are respectively close, a minimum
distance of 2400 m for the event-based camera and 3100 m for the frame-based camera
is chosen for this observation mission.
36
Figure 21: An overview of the imaging calculations performed in this work to
scale-down the space-environment.
The camera’s extrinsic parameters describe the position and orientation in the real
world. On the other hand, the intrinsic parameters are related to the internal camera
characteristics such as focal length, optical centers, distortion, and skew. These
parameters can be used to estimate the scene’s structure and to eliminate distortion
caused by the lens, thus generating an undistorted image. The focal length and the
optical centers, i.e., principal points, form the so-called intrinsic matrix. This matrix is
unique for each camera and does not depend on the environment. Further explanation
of these matrices is provided later.
For the calibration of the frame-based camera, OpenCV is used [36], while for
the event-based camera, a Metavision extension is used [37]. However, both these
calibrations follow the same calibration methodology described below.
37
The OpenCV calibration method uses a pinhole model, including an extension,
to find the distortion parameters for real lenses. The pinhole model describes the
transformation of real-world objects into pixels within camera images, as shown in
Figure 22. The figure displays three coordinate systems and their relation to each
other.
The first coordinate system is used to describe the environment and is referred to as
the world coordinate frame (𝑋𝑤 , 𝑌𝑤 , 𝑍 𝑤 ). Any environmental 3D point can be found
by measuring the distance of this point to the origin along the three axes.
Figure 22: Visualization of the projection of 3D point onto the 2D image plane [38].
The second coordinate system utilized in the calibration is the camera coordinate
system (𝑋𝑐 , 𝑌𝑐 , 𝑍 𝑐 ), which is attached to the camera itself. The location of the camera
coordinate system is translated and rotated with respect to the world coordinate system.
The third coordinate system is called the pixel coordinate system (x,y). The location of
the 3D point on the 2D image plane is obtained by applying the extrinsic matrix on the
3D points in the world coordinate frame. The extrinsic matrix is given by [R|t], where
R indicates the rotation matrix and t indicates the translation vector. The relations
between the coordinate systems are summarized in the figure below, Figure 23.
38
𝑂 𝑐 and the image plane is located at the focal length distance ( 𝑓𝑥 , 𝑓 𝑦 ) from the principle
point (𝑐 𝑥 ,𝑐 𝑦 ). The intrinsic matrix can be estimated by utilizing these two parameters:
⎡ 𝑢 ⎤ ⎡ 𝑓𝑥
⎢ ⎥ ⎢ 0 𝑐 𝑥 ⎤⎥ ⎡⎢ 𝑋𝑐 ⎤⎥
⎢𝑣 ⎥ = ⎢ 0 𝑓 𝑦 𝑐 𝑦 ⎥⎥ ⎢⎢ 𝑌𝑐 ⎥⎥ . (8)
⎢ ⎥ ⎢
⎢𝑤 ⎥ ⎢ 0 0 1 ⎥⎦ ⎢⎣ 𝑍 𝑐 ⎥⎦
⎣ ⎦ ⎣
The equation that relates the 3D point in the world coordinate frame to the 2D
projection in the image coordinate frame is given by Equation 9:
⎡ 𝑋𝑤 ⎤
⎡ 𝑢′ ⎤ ⎢ ⎥
⎢ ′ ⎥ [︁ ]︁ ⎢ 𝑌𝑤 ⎥
⎢𝑣 ⎥ = P ⎢ ⎥ , (9)
⎢ ′⎥ ⎢ 𝑍𝑤 ⎥
⎢𝑤 ⎥ ⎢ ⎥
⎣ ⎦ ⎢1⎥
⎣ ⎦
where P is the 3x4 camera matrix containing the intrinsic K and extrinsic matrix [R|t],
the latter combines the 3x3 rotation matrix and the 3x1 translation vector:
P = K × [R|t]. (10)
In summary, the camera calibration process involves capturing images of a checkerboard
pattern with known 2D image coordinates and corresponding 3D world coordinates.
After analyzing these images, the algorithms can determine the intrinsic and extrinsic
matrix.
39
The camera matrix of the pinhole model does not include lens distortion since it does
not include lenses. In order to represent a real camera, the pinhole model is expanded
to find the lens distortion parameters. This allows for a more accurate representation
of a camera system. The extended pinhole model includes radial and tangential lens
distortion. The radial distortion causes straight lines in the image to appear curved
and amplifies further away from the center of the image, as shown in Figure 25. The
distortions are mathematically expressed using Equations 11 and 12:
Conversely, the tangential distortion is due to manufacturing defects. It results from the
misalignment of the camera lens with the image plane. Because of the misalignment,
the image may look nearer than expected. The tangential distortion is characterized by
p1 and p2 in the following Equations 13 and 14:
𝑑𝑖𝑠𝑡𝑜𝑟𝑡𝑖𝑜𝑛𝑐𝑜𝑒 𝑓 𝑓 = (𝑘 1 , 𝑘 2 , 𝑝 1 , 𝑝 2 , 𝑘 3 ). (15)
The root mean squared reprojection error (𝑅𝑃𝐸 𝑅𝑀𝑆 ) is used to check the calibration’s
accuracy. It provides a qualitative evaluation of both the accuracy of feature detection
40
as well as how accurately the 3D coordinates are projected. The reprojection error
indicates the distance between a pattern keypoint detected in a calibrated image and a
corresponding world point that has been projected onto the same image. The extracted
key points are always very close to the ideal ones, usually below half a pixel [40]. In
mathematical terms:
√︃
𝑅𝑃𝐸 𝑅𝑀𝑆 = 1/𝑁 · Σ𝑖 Σ 𝑗 || −𝑝→ −→ 2
𝑖 𝑗 − 𝑞 𝑖 𝑗 || . (16)
To minimize the reprojection error, it is important to minimize the objective function
with is the sum of squares of the normalized lengths, i.e., Σ𝑖 Σ 𝑗 || −
𝑝→ −→ 2
𝑖 𝑗 − 𝑞 𝑖 𝑗 || , where
𝑝𝑖 𝑗 represents the measure coordinate point, 𝑞𝑖 𝑗 is the projected point, N is the total
number of frames, and Σ𝑖 Σ 𝑗 are the sums of the individual pose errors in each frame.
As mentioned before, the checkerboard is used since it is distinct and easy to detect in
an image. In order to calibrate the frame-based, the checkerboard is placed at multiple
different distances and angles, as illustrated in Figure 26 (left). The camera will detect
each corner position on the checkerboard and estimate the calibration parameters as
shown in Figure 26 (right). The corners of each checkerboard square are ideal for
localizing since they have sharp gradients in two directions. In addition, these corners
are situated at the intersection of lines, which increases the distinction ability. These
checkerboard characteristics are used to locate the corners of the squares robustly.
Figure 26: Calibration process using a checkerboard. The image’s left part indicates
the checkerboard’s different angles and distances to the camera. The right part indicates
the captured angles of the checkerboard by the camera [36].
41
shows the detected checkerboard corners, which are used by the Metavision calibration
module to estimate the calibration parameters.
It can be challenging to accurately estimate the key points defining the geometry
of a calibration pattern when the pattern is not parallel to the input event images is
challenging. The Metavision calibration module provides an extra refinement step that
uses the first camera calibration estimation to undistort the parallel input images, which
is then used to localize the next key points and re-estimate the camera parameters
precisely. The Metavision calibration module captures a 2D calibration pattern to
output the camera matrix [37], distortion coefficients, and RMS reprojection errors.
Chapter 6 specifies the intrinsic, extrinsic, and distortion parameters for both types of
cameras.
The image formats of the captured data are different for each camera. The event-based
camera saves the Metavision data as a RAW file, while the frame-based camera saves
it as a Bitmap (BMP) file. The event-based uncompressed event output stream (time,
location, polarity) is captured in RAW format without decoding or processing the
events. This allows for a more precise and accurate data representation and maximum
data integrity. To make the captured events integrate with the codebase, the captured
RAW file has been converted to a Comma Separated Values (CSV) files. This is a
file format that stores tabular data such as numbers and text in plain text format. On
the other hand, BMP files store images pixel by pixel, where each pixel is stored as
one byte as the panchromatic AR0135 sensor images the world in grayscale. The
disadvantage of BMP files is their large size since they are uncompressed.
42
4 Setup development
This chapter explains the development and integration of each component in the
setup. At first, the design and development constraints of the space object system are
formulated (4.1). After that, the electronic and mechanical developments are described
(4.2, 4.3), including iterations made over time (4.3.3). The software algorithm is
explained and implemented with the cameras (4.5). At the end of the chapter, an
overview of the simulated space environment is shown in Section 4.6.
Several different Arduino development board types are available, each with different
sizes, processing power and power consumption. In this project, an Arduino Uno is
selected due to its low costs, simplicity, and compatibility with a wide range of motor
43
shields and drivers. This board’s compatibility makes it possible to keep developing
and upgrading the proposed setup in future projects, for example, by adding Wi-Fi,
ethernet, and Bluetooth extensions.
The Arduino Uno cannot directly drive a motor through the available input and output
pins since the Arduino output pins are limited to 50 mA per pin [41]. Therefore, an
interface between the Arduino Uno and the motor is required to avoid damaging the
board. This interfaces is referred to as the motor driver. Therefore, a specific motor
must be chosen to narrow the search for a compatible motor driver. See the following
section.
4.2.2 Motor
In order to rotate the space object, a motor is required. The chosen motor needs to
be compact in size but provide enough torque and rotation speed to continuously
rotate the space object. Three main types of motors are commonly used in small-scale
setups: direct current (DC), servo, and stepper motors. The low-priced servo motors
are unsuitable for continuously rotating an object since the rotation angle is limited to
180 degrees.
The other options are the DC and the stepper motors. The DC motor is a two-wired,
continuous rotation motor. The motor spins when power is applied until the power
cuts off. The rotation speed is controlled by varying the voltage or current supplied
to the motor, which results in less accuracy, but high torque at high speeds. Due
to its inaccuracy, the DC motor would not be an ideal option for rotating the space
object. On the other hand, the stepper motor works with multiple electromagnets
around a central gear defining the position. Compared to the DC motor, the stepper
motor is relatively slow but includes precise rotation since the motor energizes each
electromagnet to rotate the motor shaft. Due to the holding torque of 0.5 Nm, its
ability for precise positioning, and its relatively small size, the stepper motor is chosen
for the application to rotate the space object. To be precise, the NEMA 17 bi-polar
stepper motor is chosen and made available by the "Markershop" at Aalto University
and is, therefore, free of charge. The motor has a step angle of 1.8 degrees with four
wires drawing 0.4 Ampere at 12 Volt with a holding torque of 0.4 Nm.
Once the development board and the motor have been selected, the motor driver can
be chosen. As mentioned above, the Arduino Uno development board and the Nema
17 stepper motor have been chosen. By choosing a stepper motor, the available motor
drives narrow down to motor drivers suitable for the NEMA 17 motor. These drivers
are (1) TB6612 and (2) A4988.
Both motor drivers have a continuous output current of one Ampere per phase [42],
[43]. The TB6612 has a higher maximum current per phase of 3 Ampere, compared
to 2 Amperes per phase for the A4988. On the other hand, the TB6612 has a smaller
44
micro step resolution of 1/32, as compared to 1/16 for the A4988. This smaller
micro stepping means a TB6112 motor driver achieves a smoother motion with higher
positioning accuracy than the A4988. However, the A4988 has a higher motor output
voltage than the TB6612 driver, a higher output voltage results in a higher maximum
rotation speed. Therefore, the A4988 will support a higher rotation speed compared to
the TB6612 driver.
Based on these parameters, the TB6612 motor driver for the NEMA 17 bi-polar motor
has been chosen. Since it has a higher output current, which results in a smoother
movement of the motor, this smoother movement is because of a higher current that
overcomes the detent torque of the electromagnets for movement. The smoother
movement has been given priority over a faster rotation speed (high voltage) since
capturing a continuously appearing motion is important for feature detection and
tracking algorithms. The pin connections of the TB6612 motor driver with Arduino
Uno and the NEMA 17 bi-polar motor are shown in Figure 28.
Figure 28: Pin diagram connecting the TB6612 with the Arduino Uno and NEMA 17
[44].
In summary, the selected electronic components are the Arduino Uno microcontroller,
the NEMA 17 bipolar stepper motor, and the TB6612 motor driver. The physical
assembling of these components is shown in Figure 29.
Before developing the hardware around the electronic design, the electronic design
is tested. In order to test the motor, the USB power is supplied to the Arduino Uno.
Arduino IDE, especially the Arduino stepper library, is used to program the rotation
of the motor. Initially, the motor rotated very slowly (10 rpm) when applying the USB
power supply. To resolve this issue, a 10 K Ohm resistor is applied to the standby
pin, which is connected to the voltage for the logic levels (Vcc), to prevent the motor
from going into a lower power state. In addition, a 1000 microfarad capacitor is added
to the output voltage to stabilize it. As the project progressed, it was found that the
amount of power supplied via the USB was insufficient. With the USB power supply,
the motor shaft could only be rotated at a maximum speed of 30 rpm. A 13 Volt
power supply adapter is used to achieve a higher rotation speed, which falls within the
45
Figure 29: Physical overview of the connection between the Arduino Uno, TB6612
motor driver and NEMA 17 stepper motor.
voltage range of the motor driver (13.5 Volt). With this voltage, the motor shaft can
rotate continuously at a maximum speed of 100 rpm. With the electronic components
functioning properly, the next step is to design and develop a mechanical system
supporting the electronics to rotate the object model. The design and development are
discussed in the following section.
46
4.3 Mechanical design
The mechanical design must comply with the previously mentioned constraints (Section
4.1), which indicate using a rapid prototyping manufacturing method in the form of
3D printing. Firstly, due to the rapid prototyping manufacturing method, the design
dimensions are limited by the allowable print volume of the 3D printer. In addition,
the outer casing should be manufactured using black Polylactic Acid (PLA) material
to create a black-out environment. The available 3D printer for this project is the
Ultimaker S3, which has a maximum build volume of 200x230x190 mm. Multiple
test prints were conducted throughout the design process to find potential errors with
the 3D printer. The main finding was the need for a 0.2 mm tolerance to ensure proper
component fitting on and within one another. Now the build volume is known, the
design of the components starts. In the following section, each of the components is
described.
The cameras are fixated in a case used in a previous drone project, as shown in Figure
30. As can be seen, the frame-based camera is located under the event-based camera.
This camera module has been directly integrated into this thesis. The cameras have
the freedom to tilt upward by several degrees. No additions to the camera casing have
been made.
Figure 30: Camera module enclosure, the blue casing fits the event-based camera
and the white fits the frame-based camera.
47
4.3.2 Space object module
The primary purpose of the mechanical design is to fit the electronics design and
ensure a stable rotation of the asteroid model. The following section describes each
component of the mechanical object setup, including the interface of the object model
and the motor. The technical drawings of each component designed for the space
object module is included in Appendix C. As stated in Section 3.5, the selected 3D
asteroid model is Bennu. According to the scaling calculations (Section 3.6), the
scaled asteroid model size is 16.44 cm. The scale model is manufactured using a 3D
printer and coated with a gray coating. The reason for coating the 3D-printed model is
to enhance the visibility of its features during capture, as shown in Figure 31.
Figure 31: A photograph of the scaled down model of asteroid Bennu, 3D-printed
during this thesis work.
An object assembly has been created to facilitate the rotation of the scale model, as
shown in Figure 32. The assembly consists of four components: a base plate, a middle
part, a holding cup for the motor, and a cover with a lid. The base plate component
is designed to fit the Arduino Uno board and the breadboard with the motor driver.
This base plate is interfaced to the middle part using a bolt and nut at each corner.
Since the base plate is not structurally loaded or has to deal with vibration, the design
is based on my engineering judgment.
The main component of the motor assembly is the white part in Figure 32. For the
sake of clarity. This component is referred to as the middle part. The component
consists of a chair-like design with a slit for the electronic cables and an adaptable
chair-like section that allows the motor cup to be placed at various angles, including 0,
30, 60, and 90 degrees.
48
The three components are connected with hex head bolts. However, as the motor cup
is narrower than the middle part of the assembly, a space is created between them.
Initially, the cap was meant to create a space to move in. However, this gap created too
much lateral movement and has been excluded by integrating multiple spacing rings,
as shown in Figure 32 (left).
The final component to be discussed is the enclosure box, see Figure 32 (right). The
design and manufacturing of the enclosure casing aim to ensure that the camera only
captures the moving object, which is essential since the event-based camera records
the brightness changes of moving objects. Thus minimizing the visibly moving parts
is required. For example, as the motor axis rotates, it will be constantly recorded
despite this being unnecessary data. For this reason, the design aims to cover the axis
subsequently, the axis is painted black to absorb light instead of reflecting it.
The case has a slit for the motor axis to move in. As the design allows for angling the
motor, the back of the casing is kept open as it will not be within the imaging field
of view. In addition, the open back is necessary for the motor to be able to angle at
60 degrees or higher. In addition, black PLA is chosen for the enclosure component
to create a space-simulated environment. The reason for this is to try and make the
whole assembly fade away in its dark surroundings.
Figure 32: Overview of the object system (left) including casing (right) developed in
this work.
49
4.3.3 Iterations
Figure 33: An image of the motor interface iteration. The motor connection on the
left beam is replaced by the blue interface.
The main problem occurred when 3D-printing the middle "chair-like" component. The
error was caused by the shrinkage of the PLA filaments with unnecessary designed
holes. The 3D printer utilizes structural support to continue printing after the holes.
However, the print kept stopping since the 3D-printer could not find the calculated
3D-printer slicing of the CURA 3D-printing software to print on. After finding this
problem, the holes were removed from the design. Even after removing the holes, the
printer struggled to form the component. The problem was due to the warping of the
3D print, which happens when the extruded filament layers on the 3D printer plate
50
cooled too quickly and shrunk. The result is a contraction of the PLA filament that
pulls away from the building. When this problem was detected, the fan speed and
printing settings were adjusted for this specific application in combination with glue
on the building plate.
The plan was to manufacture several asteroid models and experiment with them in the
setup. However, due to a significant amount of printing problems and time constraints,
only Bennu has been manufactured. A 16.44 cm size Bennu has been printed and
tested using the setup.
51
4.4 Light source selection
One crucial component is missing to create a space-simulated environment: the light
source. In order to simulate the Sun, a suitable light source is needed.
In order to find the optimal light source multiple factors are considered before choosing
one. The factors are [45]: (1) color temperature, (2) intensity, (3) consistency and
directionality, and (4) flickering. In addition, ease of usage and positioning (5) are
trade-off factors.
The color temperature of the light source affects the color accuracy of the camera
image. The Sun’s color temperature is approximately 5900 Kelvin. Secondly, the
intensity of the light source is essential since it determines how much light is available
for the camera to capture. In addition, consistency and directionality are crucial since
consistency determines the reproducibility of the captured images and events. In
contrast, directionality is important since the way light reflects off the model is caused
by this. Lastly, a light source can cause a flicker at a high frequency. Flickering refer to
rapid change in brightness. It is a phenomenon to account for since each flicker results
in all pixels in the event-based camera firing instantly. In addition, the frame-based
camera will capture darker and lighter images due to the flicker that might throw off
the visual navigating algorithm impacting its performance.
In general, light sources are expensive and immediately exceed this project’s budget.
Therefore, to stay within budget, the light source will be rented from Aalto Take-out,
which is the rental department of Aalto University. The available light sources at Aalto
Take-out were analyzed using a trade-off study utilizing the mentioned factors to find
the optical light source for this simulation.
From the trade-off analysis shown in Figure 34, it can be seen that two light sources
based on their factsheets would be suitable to simulate the Sun ([46], [47]). These
light sources are Ledgo LG-E268C and Eurolite PAR 56 pro. However, in this setup,
two of the five factors have more influence on the system. These are the intensity of
the light source and flickering, which could cause problems in computer vision tasks
such as feature detection and tracking.
According to Figure 34, the halogen light source, referred to as Eurolite PAR 56,
has been given a two instead of three for its flickering characteristics. In general,
52
halogen lighting is designed to be flicker-free when the light bulb operates at the same
frequency as the AC power supply. However, if this is not the case, or if a dimming
system is used, it can result in light flickering captured by the camera. In addition, the
Eurolite has a total power of 500 Watts resulting in a high light intensity with a color
temperature of 3000 Kelvin.
On the other hand, the Ledgo light source is a LED panel designed to be flicker-free
using a dimming system, allowing it to use color temperatures ranging from 3200 to
5600 Kelvin. The disadvantage of this light source is its low power and light intensity
power of 26.8 Watts. In addition to the low power usage, it generated a diffuse light
that scatters light in all directions. This is unwanted since diffuse light typically doesn’t
create shadows or highlight features, which is necessary for this project.
Therefore, based on these low intensity and diffuse light of the Ledgo light, the Eurolite
PAR 56 pro has been chosen to simulate the Sun, see Figure 35. The location of
the light source has been chosen while performing the initial tests and analyzing the
generated images and events, discussed in the following section. During the initial
testing, it was found that the only factors to be kept in mind were the height of the
light stand and locating the light source outside the field of view of the cameras while
still pointing it at the asteroid.
Figure 35: An image of the used Eurolite PAR 56 pro in this work on a stand include
grid hole.
53
4.5 Algorithm description
The software selection focuses on the computer vision algorithms that enable the
possibility to compare the event-based and frame-based camera. Computer vision
algorithms are generally divided into three vision levels: low, intermediate, and high
[48]. The low-level vision algorithms refer to the initial processing of image features
like edges and corners on the image plane, like optical flow and feature detection and
tracking. The intermediate vision level includes techniques on the 3D structure of a
scene, such as object recognition and 3D reconstruction. The generation of high-level
features involves the combination of low-level features with more complicated details
about an image scene. It includes a conceptual description of a scene, such as an
activity, intention, or behavior.
In this thesis, the low-level vision category is selected to evaluate the feature detection
and tracking capabilities of both cameras. This is because intermediate and high-
level vision algorithms require low-level vision. However, since the frame-based
feature detection and tracking algorithms do not directly apply to event-based data, an
event-based algorithm for feature detection and tracking must be selected.
The selected feature detection and tracking algorithm is called "Feature tracking
statistics" [49]. The main objective of this codebase is to compute the feature-tracking
performance metrics for both event-based and frame-based cameras, such as the
tracking length, re-projection error, and the rate of successfully triangulating tracked
features. The codebase implements Visual Odometry (VO) using bundle adjustment.
Visual odometry is an intermediate-level vision application that estimates the scene’s
motion and the 3D structure. It can do so without relying on external measurements
or ground truth. Given that generating an event-based ground truth stream is costly
[3], visual odometry using bundle adjustment is a cost-efficient and practical solution
to obtain accurate scene measurements [50]. The visual odometry optimized using
bundle adjustment gathers information on a sequence of images to estimate the camera
pose and the 3D structure of the scene by minimizing the difference between the
observed image features and their corresponding projection in the 3D space. The
bundle adjustment minimizes the re-projection error by adjusting the camera poses
and the 3D structure. Bundle adjustment aims to improve the structure of a 3D point
and its motion using non-linear optimization. The algorithm is run frame-by-frame to
ensure that the features which can not be triangulated or have a too-high re-projection
error can be discarded. This makes it possible to measure geometrically valid tracking
length used within the metric calculations.
The frame-based Harris corner detector is a widely used technique for detecting corner
features in frame-based vision. It is reliable, has low numerical complexity, and
invariant to image shifts [51]. A Harris corner is defined as a point whose local
54
neighborhood stands in two dominant and different edge directions, as seen in Figure
36.
To define whether a pixel is a corner, the Harris corner detector method utilizes a
window function 𝑤(𝑥, 𝑦) that is shifted by a small amount in different directions
(Δ𝑥, Δ𝑦). This is done to determine the average change in image intensity indicated
by the Error function, Equation 17 [52]. This change is computed by calculating the
sum of squared differences (𝑆𝑆𝐷) of the intensity (𝐼).
∑︁
𝐸 (Δ𝑥, Δ𝑦) = 𝑤(𝑥, 𝑦) [𝐼 (𝑥 + Δ𝑥, 𝑦 + Δ𝑦) − 𝐼 (𝑥, 𝑦) ] 2 . (17)
𝑥,𝑦
⏞ˉˉ⏟⏟ˉˉ⏞ ⏞ˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉ⏟⏟ˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉˉ⏞ ⏞ˉ⏟⏟ˉ⏞
window function shifted intensity intensity
The shifted intensity is denoted by 𝐼 (𝑥 + Δ𝑥, 𝑦 + Δ𝑦) and can be approximated using
the Taylor expansion, as shown in Equation 18. Here the intensities 𝐼𝑥 and 𝐼 𝑦 are the
partial derivative of 𝐼 in the x and y direction, respectively
)︁ 𝐼𝑥2 𝐼𝑥 𝐼 𝑦 Δ𝑥
[︃ ]︃ (︃ )︃
(︁
𝑓 (Δ𝑥, Δ𝑦) ≈ Δ𝑥 Δ𝑦 . (20)
𝐼𝑥 𝐼 𝑦 𝐼 𝑦2 Δ𝑦
⏞ˉˉˉˉˉˉˉˉˉ⏟⏟ˉˉˉˉˉˉˉˉˉ⏞
M
If the matrix 𝑀 in Equation 20, which contains the first intensity derivative, has
two large eigenvalues then a corner is detected. This means the major intensity
gradient centered around the pixel occurs in two directions. In mathematical terms,
55
the corner identification is given by the Harris scoring function 𝑅, where 𝜆 indicates
the eigenvalues of matrix M and k a constant:
Using the calculated eigenvalues 𝜆, the Harris detector algorithm can determine if the
feature is an edge, flat region, or a corner, as shown in Figure 37.
At this point, the "good features to track" function developed by Jianbo Shi and Tomasi
[53] can be implemented. This function finds the most prominent corners in a specific
image region that can be tracked based on the calculated eigenvalues from the Harris
scoring function. However, the corners below a predefined minimum eigenvalue are
rejected. The remaining corners will be sorted based on their eigenvalue score in
descending order. Eventually, only the best Harris corner is left and utilized by the
Kanade Lucas & Tomasi (KLT) tracker [54].
Figure 37: Visualisation of eigenvalues representing corner, edges, flat regions [53].
The KLT tracker is a method to track features based on the optical flow defined by
Lucas and Kanade [54]. The tracker detects a set of Harris or Shi Tomasi features in
the first keyframe of the sequence. The Harris detector is chosen to match the outputs
of the event-based detector discussed later and this frame-based detector. Then, for
each subsequent frame in the sequence, the KLT algorithm tracks the motion of these
features by computing the optical flow between them. The optical flow is estimated by
minimizing the sum of squared differences between the intensities of corresponding
pixels in the two frames. This is done using the Lucas-Kanade algorithm, which
estimates the optical flow using a local linear approximation of the image motion.
The KLT algorithm updates the position of each feature point in each frame based
56
on its estimated optical flow. The algorithm also updates the corresponding feature
descriptors, which match the feature points across frames. The feature descriptors are
typically computed using the image gradient at each feature point.
In summary, a combination of the KLT feature tracker and the Harris corner detection
is implemented. This approach is relatively simple, efficient, and reliable in detecting
and tracking corners. The frame-based code implements an image down-scaling by a
factor of two. In this way, both cameras have similar resolutions, which makes a fair
comparison possible.
The Harris algorithm adapted for event-driven processing can detect moving corners in
the scene. The event-based algorithm processes a fixed number of events, which must
be tuned according to the scene’s appearance. As discussed earlier, the DVS sensor
does not provide intensity measurements to which frame-based corner algorithms can
directly be applied. The stream of events, however, characterizes the pixel position
and timestamp at which it occurs. The event-based Harris algorithm creates a polarity
contrast map for each event depicting an edge, corner, or flat region, as shown in Figure
38. The black pattern in Figure 38 indicates the occurrence of an event. Figure 38 (left)
depicts an edge where events are generated along a one-pixel axis direction, while
Figure 38 (right) shows a corner with brightness changes occurring along two-pixel
axes. Using the Harris score function (Section 4.5.2) on these contrast maps, the
detector compares the information from the active pixel to that of the surrounding
pixels in order to find the best features to track using the later described HASTE
tracker.
57
Figure 38: A visualization of the local contrast map including an edge (left) and a
corner (right) [55].
As described in the frame-based feature detection and tracking section and shown in
Figure 37, the score function results from the calculated eigenvalues. In the case of
the event-based Harris detector, there are again three cases. A case where variation,
in contrast, is negligible leads to both eigenvalues being small as well as the score
function itself. The second case includes an edge with a high contrast variation in the
direction of an edge, which leads to one high eigenvalue and one small eigenvalue.
Therefore, the Harris score will get negative. The third case is when a patch includes
two edges with a high contrast variation along the two major axes. This results in two
high eigenvalues leading to a high Harris score results and detection of a corner.
After low-level feature detection, feature tracking comes into play. The feature
tracking used in this project is the HASTE: multi-Hypothesis Asynchronous Speeded-
up Tracking of Events [56]. HASTE is an feature tracker that operates solely on
events and processes each event asynchronously and individually as soon as it gets
generated without losing information due to fast scene dynamics. The event-based
Harris corner detector link is made based on the event-by-event processing method.
However, in generic cases, establishing a correspondence between features detected
by event-by-event strategy is not trivial.
58
4.5.4 Feature tracking performance metric
The codebase can calculate the tracking lengths, re-projection error. These two
parameters are later used to compare the cameras quantitatively.
• The tracking length indicates the number of keyframes a feature is being tracked
on.
The calculated reprojection error percentiles in pixels are described by the normal
distribution. The reprojection error indicates the distance between an image’s observed
feature and the algorithm’s corresponding predicted point. The normal probability
distribution is defined by its median and standard deviation, as shown in Figure 39.
The median shows the value below which 50% of the reprojection errors fall. The
standard deviation (sigma) measures the data spread and is determined by the average
between each data point and the median. The three reprojection errors are indicated
by the median (50%), sigma (84.1%), and 2*sigma (97.7%). The percentiles represent
the relative position of reprojection errors within the distribution.
Figure 39: Normal distribution, highlighted median, sigma, and 2*sigma [57].
Both the tracking length and reprojection error are quantitative metrics. However,
their interpretation gives qualitative insight into the feasibility of the event-based
camera in space. The reprojection error percentiles compare the cameras’ accuracy
and reliability during the feature tracking process. In other words, the percentiles
indicate how well the algorithm can estimate the correct location of tracked features
in the subsequent keyframes. In addition, the tracking length indicates the ability to
track features over longer periods.
59
4.6 Environment overview
Figure 40 shows the complete imaging setup. It consists of the camera module,
object module, light source including grid hole, and the main computer in a blackout
environment. The main computer records the images and events captured by the
cameras and processes the recorded data using the Feature Tracking Statistics codebase.
The schematics with the distances and angles related to each other are included in
Appendix D.
Figure 40: An image of the space simulated imaging setup developed in this work.
In order to simulate the pitch-black space environment, the secondary lighting and
reflection in the imaging facility need to be kept to a minimum. This uses black
absorbing fabrics covering the table and the camera-facing wall. The room used for the
experiments has no windows, which prevent sunlight or artificial light from coming in.
The darkroom (Odeion) has double doors to the hallway, which is ideal for keeping
secondary lighting from the hallway out of the room. Next to that, the wall is black,
and the floor is gray. Only the emergency exit sign gives small secondary lighting.
However, this is negligibly small.
Now that the theoretical and physical frameworks are finished, it is possible to start
testing the setup to identify flaws and conduct experiments with the cameras by placing
the light source at different positions. This can be achieved by positioning the light
source at different positions and adjusting the object module’s rotation speed.
60
5 Initial testing and planning
The purpose of the initial test is to identify and resolve camera related and light source
related issues. During the test, several unforeseen issues were found, and both the
camera calibration and camera parameters fine-tuning were carried out.
The two light source setup issues are the position of the light source and the distance of
the cameras to the wall curtains. It is essential to position the light source outside the
field of view of both cameras since only the object has to be captured. In addition, the
distance of the cameras to the wall is important since the object can cause a shadow on
the wall curtain within the camera’s frame. Shadows can introduce unwanted artifacts
and distortions in the captured images and events, potentially affecting the analysis
and interpretation of the data. The setup should be positioned as far from the wall as
possible to minimize the shadow. This way, the light can spread out more, reducing
shadow on the dark wall curtains.
During the initial test, three positions were chosen to simulate the asteroid orbiting
the Sun. To ensure the Eurolit PAR 56 did not flicker, it underwent testing before its
location was defined. These tests confirmed the absence of any flickering and thus
validated its suitability for use. The light source locations are arbitrarily determined
based on the requirements to locate the light sources outside the field of view and
minimize the object shadow. Fulfilling these requirements enhances the quality and
reliability of the experimental results obtained during the initial test phase. The
upcoming subsections will discuss camera calibration and camera parameter tuning
for the frame-based and event-based cameras (5.1, 5.2). The tuning of parameters
within the feature tracking statistic algorithm is described in Section 5.3, and the last
section of this chapter describes the experiment plan (5.4) used to conduct a thorough
experiment.
61
Figure 41: Chessboard-like pattern noise visible on the captured frame. Here a part
of the chessboard is displayed with the pattern noise covering the captured frame.
seen that the camera image is effect by barrel distortion, as explained in Section 3.7.
Now the captured images with the frame-based camera can be used to extract metric
information:
⎡ 𝑓𝑥
⎢ 0 𝑐 𝑥 ⎤⎥ ⎡⎢983.93 0 661.44⎤⎥
𝐾 = ⎢⎢ 0 𝑓𝑦 𝑐 𝑦 ⎥ = ⎢ 0
⎥ ⎢ 979.54 501.67⎥⎥ , (22)
⎢0 0 1 ⎥⎦ ⎢⎣ 0 0 1 ⎥⎦
⎣
where K is the intrinsic matrix of the 3x4 camera matrix (see Equation 10). The
distortion coefficients are estimated according to Equation 15, and are estimated for
the frame-based camera as:
Figure 42: An illustration of the camera calibration performed in this work. An image
(left) with camera distortion and (right) without camera distortion representing the
calibrated image.
62
Other parameters that have influenced the initial testing are the integration time and
frame skipping. The integration time refers to the time the camera sensor is exposed
to light for each frame. Longer exposure times result in brighter images, while shorter
ones give darker images. Additionally, the exposure time affects the frames per second
captured by the camera. Different exposure times are visible in Figure 43. The
integration time is a product of the coarse integration time in rows and the row time of
22.22 microseconds for the AR0135. Therefore, a coarse integration time of 60 rows
corresponds to 1.33 ms [27]. Based on visual evaluation, Figure 43b is chosen since it
includes shadow-defined features while not being overexposed.
Figure 43: An overview of different camera exposure times (ms) when the frame-
based camera observes the asteroid model Bennu.
An additional parameter that can be adjusted is frame skipping. This technique only
captures a subset of the available frames while skipping the remaining ones. The
63
Arducam USB2 interface module for the AR0135 only supports frame rates up to
about 36 Hz. The shortest integration time would lead to a frame rate of 54 Hz. Thus,
the frame rate had to be lowered using a frame skipping value of 2/3.
Starting with the contrast sensitivity threshold biases. As mentioned in Section 2.2,
the pixels of an event-based sensor is triggered by a change in illumination of a certain
defined contrast threshold. The minimum measured contrast change to trigger an event
is called the contrast threshold. The sensitivity of the vision sensor to the contrast
change can be adjusted using bias_diff_on and bias_diff_off.
The bias_diff_on adjusts the contrast threshold for a positive brightness change (ON
+1) events. The bias_diff_on determines the factor by which a pixel must get brighter
before an ON event occurs for that pixel. Therefore, it is used to change how many
events are put out during a significant change in illumination or to change the sensitivity
to small positive illumination changes.
In contrast, the bias_diff_off adjusts the contrast threshold for negative brightness
change (OFF(-1)) events. It determines the factor by which the pixel must get darker
before an off event happens for that pixel. So that variable is again used to change the
amount of event throughput or the sensitivity in case of a small negative illumination
difference.
64
5.2.2 Bandwidth biases
The second changeable bias category is the bandwidth bias. This bias controls
high-pass and low-pass filters of the pixel. Where the variable called bias_fo, adjusts
the low-pass filter.
The other parameter, bias_refr (dead time biases), is left unchanged since the sensor
manufacturer recommends not to change it [58]. The bias_pr drives the photoreceptor
bandwidth and the bias_ref sets the refractory period that controls the output data rate
of the sensor.
The steps used to tune the bias variable during this project were as follows and based
on trial and error. The steps are:
• Adjust the bias_hpf for background noise reduction, which can result by sec-
ondary light.
The selected bias settings for the experiments performed under space-simulated
conditions are shown in Figure 44. The bias settings are expressed in mV and are used
to tune the sensor’s performance for different applications.
Figure 44: An overview of the bias values, including the default values of the event-
based camera manufacturer and the selected bias values for this work after tuning.
The bias_fo has been set at 1451 mV. In general, increasing the bias_fo values
decreases the speed (bandwidth) and background noise of events while decreasing the
value resulting in increasing the bandwidth and background noise. The bias_fo values
of 1451 mV are chosen since values under 1350 mV result in an event sensor cut-off
and near 1420 mV significant amount of noise while going beyond 1500 decreases the
background noise and defining feature events.
After adjusting the low pass filter of the sensors, the high pass filter (bias_hpf) is
adjusted to reduce the background activity noise. In general, using low bias_hpf values
65
filter out high frequencies. However, when the value is too low, it affects the quality
of the signal. By trial and error, it was found that increasing the bias_hpf value over
1450 mV increased noise significantly. A value above 1500 increased the sputtering
of events, while a value below 1350 resulted in features disappearing due to a lack of
events. Therefore, a bias_hpf value of 1413 mV is chosen.
The next step is to adjust the contrast sensitivity threshold biases called bias_diff,
bias_diff_on, and bias_diff_off. These biases control the sensor’s light sensitivity so
that the individual pixels are triggered when a difference in brightness occurs. The
bias_diff is the threshold reference value and is, therefore, not adjustable. Increasing
the bias_diff_on with respect to bias_diff decreases the sensitivity to positive light
changes and vice versa. While decreasing the difference of bias_diff_off w.r.t.
bias_diff increases the sensitivity to negative light changes. Under the space-simulated
environment, the bias_diff_on value under 400 results in a significant amount of noise,
while above 430, the sensitivity to a positive brightness change is optimal. Due to
noise generation in retrospect with event generation, a value of 445 is chosen. On
the other hand, the bias_diff_off value between 185 and 200 is optimal since above it
increases the sputtering of events, and below, it reduces the generation of events.
These biases settings resulted in the event generation as shown in Figure 45.
Figure 45: Bias test images of the Bennu model. An image (left) without bias tuning
and (right) with bias tuning.
66
5.3 Feature tracking statistics implementation
The frame-based and event-based feature detection and tracking algorithms, discussed
in Section 4.5.3, are tuned for optimal results. The tracking parameters are left
unchanged since these are run via the HASTE tracker for the detected event-based
features (Section 4.5.3) and via the KLT-tracker for the frame-based features (Section
4.5.2). The Harris corner detector has been fine-tuned for both cameras. The quality
level of the corners is set to 0.02 for the frame-based camera and 0.04 for the event-
based camera. The Harris quality level ranges from 0 to 1, where 0 is a low-quality
level and 1 indicates a high-quality level. Therefore it can be said that the quality of
the detected corners is low. However, to detect a sufficient amount of corners for the
tracker to compute on, it is necessary to set the Harris quality level low.
In general, the corner quality of a Harris corner is the product of the quality level and
the quality measure defined by the minimal eigenvalues of the Harris function (Section
4.5.2). For example, if the quality measure is 1400, and the quality level is 0.02, then
all corners with the quality measure lower than 28 are rejected. Therefore, the higher
the quality level, the more robust and distinct corners are used in the algorithm [51].
Another parameter that can be tuned is the blockSize. The blockSize specifies the
region used to calculate the quality of a corner. Adjusting this parameter controls the
scale and level of detail at which corners are detected. By empirical experimentation,
the parameter for the event-based script is set to 5 by 5 pixels, and for the frame-based
script, it is set to 7 by 7 pixels. The Harris detector free parameter (k) applied in
Equation 21 (Section 4.5.2), is set at 0.02 for both cameras instead of the initial values
of 0.01. The k controls the sensitivity of detected corners. A smaller k allows for more
line-like features, while a higher k indicates a sharp corner.
67
5.4 Experiment plan
The experiments are carried out with the setups shown in the Figures 46 & 47 & 48
below. The location of the light source defines the experiments. Three different light
source locations are used with respect to the camera and object model. Figure 46
shows the setup referred to as the side setup since the light source is placed sideways
from the object and the cameras, while Figure 47 is referred to as the frontal setup and
Figure 48 as the behind setup.
Figure 46: Setup experiment 1 (side). Figure 47: Setup experiment 2 (frontal).
Each light source location (frontal, side, behind) indicates an experimental setup.
During each experiment, the object will rotate with three different object rotation
speeds. The object is rotated with a speed of 30, 55, and 100 revolutions per minute
(rpm). The designed object model can be in an angle (0, 30, 60, and 90 degrees) with
respect to simulate the asteroid tumbling. However, due to time constraints and a
capturing error using 30 degrees, only data with the object under 0 degrees is used.
Each experiment gathers 60 seconds of images and events for the feature statistics
algorithm to process.
68
6 Results
In this chapter the feature detection and tracking results are discussed for the event-
based and frame-based cameras. The findings are interpreted and compared to indicate
the feasibility of the event-based camera under space conditions. Three experiments
are carried out. Experiment one uses the side setup (see Figure 46) with the 3D model
rotating at 30, 55, and 100 rpm, (6.1). Experiment two uses the frontal setup (6.2),
see Figure 47. Experiment three uses the setup with the light source sideways behind
the 3D model (6.3). A complete overview of the quantitative results is included in
Appendix E.
Figure 49: Experiment 1 (the side setup, Figure 50: Experiment 1 (the side setup,
30 rpm): An event-based camera image. 30 rpm): An frame-based camera image.
69
6.1.1 Experiment 1.1: 30 rpm
The gathered data is processed using the feature tracking statistics algorithms (Section
4.5.4). The algorithm shows the event surface of the asynchronous event stream,
including the tracked feature in blue in Figure 51, while the tracked features on the
frames are shown in Figure 52.
Figure 51: Experiment 1.1 (30 rpm): Figure 52: Experiment 1.1 (30 rpm):
Event-based feature tracks. Frame-based feature tracks.
The event-based camera tracked a total of 1396 features, while the frame-based camera
tracked 2749 features. Thus, the frame-based camera tracks 1.97 times more features
than the event-based camera tracks in the used feature detection algorithms. However,
the amount of keyframes the frame-based features are tracked on is small (see Figure
53). Around 73% of the total detected features can not be tracked for at least one
keyframe. Therefore, 27%, i.e., 749 features, of the total detected features are tracked
for at least one keyframe. From these 749 features, 486 features are tracked for at
least four keyframes. Even though the percentage of tracked features for at least four
keyframes is low (17%), the absolute number is sufficient (n=468).
On the other hand, the event-based features tracked are less than the frame-based
camera but are tracked for more keyframes, as shown in Figure 54. Around 43% of
the total features were not tracked for at least a keyframe. This means that over half
of the features are tracked for at least one keyframe. Like the frame-based tracking
length distribution, around 18% of the features are tracked for at least four keyframes.
To compare the accuracy of both tracking algorithms, the reprojection error (RPE) is
used. The RPE indicates how the 3D points correspond to the 2D projections in the
70
Figure 53: Experiment 1.1 (30 rpm): Track length distribution for the frame-based
camera.
Figure 54: Experiment 1.1 (30 rpm): Track length distribution for the event-based
camera.
image. Both RPE Gaussian distribution plots show a right skew indicating that the
data is concentrated toward the lower values, and the tail of the distribution extends
toward the higher values. The distribution median for the frame-based camera tracking
is 1.78 times larger than the event-based RPE. The median of the event-based camera
is 0.684 pixels, while for the frame-based camera, it is 1.190 pixels. As can be seen
in Figure 55, the event-based RPE curve is narrower than the frame-based one. This
indicates that the event-based RPE has less variability. Therefore the error is more
71
centered around the mean. At one sigma from the median, the event-based RPE is
1.478 pixels, while the frame-based error is 3.001 pixels.
Additionally, the computation load shows significant differences. The file size of
the captured frames is 7.7 times greater than the event-based counterpart. However,
the frame-based algorithm computes the feature tracking 3.72 times faster than the
event-based algorithm. The frame-based algorithms takes 107 seconds, while the
event-based algorithm requires 399 seconds. Thus, the frame-based algorithm is more
efficient than the event-based algorithm.
Figure 55: Experiment 1.1 (30 rpm): (top) RPE density curve for the event-based
camera, (bottom) RPE density curve for the event-based camera.
72
6.1.2 Experiment 1.2: 55 rpm
The gathered events and frames are processed using the feature tracking statistics
algorithms. The algorithms show the event surface, including the tracked feature in
blue in Figure 56, while the tracked features on the frames are shown in Figure 57. The
amount of frame-based features detected and tracked is somewhat misrepresented since
an unknown amount of features are tracked on the motor interface and the connection
between the rotating axis and the 3D model.
Figure 56: Experiment 1.2 (55 rpm): Figure 57: Experiment 1.2 (55 rpm):
Event-based feature tracks Frame-based feature tracks
In total, the features detected using the captured frames are 2920, while the event-based
algorithm detected significantly fewer features, i.e., 1361 features. However, 75% of
the detected frame-based features are not tracked for at least one keyframe (see Figure
58), while the event-based features can not track 55% of the 1361 detected features
(see Figure 59). Thus percentage-wise, the event-based feature can track 45% of the
detected feature for at least one keyframe. Therefore, the absolute amount of features
tracked for at least one keyframe is comparable. The number of event-based tracked
features for more than one keyframe is 605, while the number of frame-based features
is 724. Most event-based features are tracked for one or two keyframes, combined 34%.
In comparison, most frame-based features are tracked for four or more keyframes.
However, as mentioned before, a certain amount of frame-based features are tracked on
the motor interface and the connection between the rotating axis with the 3D model.
Therefore, non-object related features could influence the total number of features
tracked for more than four keyframes and mislead the performance indicator of the
feature tracker.
73
Figure 58: Experiment 1.2 (55 rpm): Track length distribution for the event-based
camera features.
Figure 59: Experiment 1.2 (55 rpm): Track length distribution for the frame-based
camera features.
The accuracy of this experiment is analyzed using the RPE density curves as shown
in Figure 60. It can be seen that the event-based curve has a wider peak indicating
that fewer data is concentrated around the median RPE of approximately 0.8 pixels.
Compared to a frame-based median RPE of approximately 1.5 pixels. In addition, the
higher initial start of the curve and the wider peak indicate that the event-based tracking
data has a significant amount of extreme RPE, which can be seen by the 2*sigma for
the event-based RPE of 14.5 pixels compared to 6.4 pixels for the frame-based tracker.
In addition to the tracking lengths and reprojection errors, the frame-based computation
74
Figure 60: Experiment 1.2 (55 rpm): (top) RPE density curve for the event-based
camera, (bottom) RPE density curve for the event-based camera.
time is 3.75 times faster than the event-based computation time. Even though the
captured frames’ file size is 3.9 times larger than the event stream captured (see
Appendix D), therefore, it can be stated that the computation of events for feature
detection and tracking in the event-based camera is inefficient in the used algorithm.
75
6.1.3 Experiment 1.3: 100 rpm
During experiment 1.3, the side setup is used with the object rotating at 100 rpm.
Figure 61 shows that the feature tracks using detected event corners are mainly located
at the right half of the asteroid, where the negative change in brightness genarates
events. In comparison, the frame-based camera features are found over the whole
asteroid.
Figure 61: Experiment 1.3 (100 rpm): (left) Event-based feature tracks, (right)
frame-based feature tracks.
The tracking length distribution for both cameras shows that a majority of the detected
features are not tracked for at least one keyframe. The total amount of features detected
using the frame-based algorithms is 2602, while the event-based algorithms detect
significantly fewer features, 695 in total. From these total detected features, 29% of the
event-based features and 21% of the frame-based features are tracked for more than
one keyframe (see Figure 62 & 63). Therefore, both cameras show similar results in
tracking percentage-wise, but the absolute number of tracked features for at least one
keyframe is 2.8 times higher for the frame-based camera. In addition, the computation
time of the frame-based algorithm (116 s) is 5.44 times faster than the event-based
algorithms (748 s). Despite the fact that the file size of the frame-based camera is 2.21
times larger than the event-based camera.
76
Figure 62: Experiment 1.3 (100 rpm): Track length distribution for the frame-based
camera.
Figure 63: Experiment 1.3 (100 rpm): Track length distribution for the event-based
camera.
The RPE median indicates that the event-based detection and tracking is more accurate
than the frame-based one. The RPE median of the events is 1.35 pixels, while the
frame-based RPE median is 2.38 pixels. However, the standard deviation for both
cameras is similar and high, approximately six pixels. The RPE and amount of features
tracked for at least one keyframe indicate that both cameras are unsuitable for tracking
features accurately, while the 3D model spins with 100 rpm.
77
6.2 Experiment 2: frontal setup
In this subsection, the results of experiment two will be shown and discussed.
Experiment two uses the imaging setup with the light source frontally pointed at
the object. Figure 64 shows the processed data using the feature tracking statistic
algorithms. The left image shows the event surface made out of the event stream, and
the right image shows the frame taken by the frame-based camera. The algorithms
detect and track the features on top of this data.
Figure 64: Experiment 2 (front light, 30 rpm): (left) event-based camera and (right)
frame-based camera images.
During this experiment, the details on the 3D model were sufficient for the frame-based
camera to detect and track features (see Figure 64 (right)). However, the event-based
camera was not able to track features over periods of time. Figure 64 left shows that
the event-based detected features are all generated at the outer edges of the 3D model.
This could indicate that the event-based camera is saturated and unable to detect
features on the asteroid’s body. The saturation could be caused by the used aperture
and/or bias setting. Reducing the aperture size would decrease the amount of light
entering the event-based camera, thereby reducing the saturation. On the other hand,
adjusting the contrast sensitivity threshold might also mitigate the saturation.
In contrast, the frame-based feature tracking algorithm detects and tracks features.
However, approximately 80% of the total detect features can not be tracked for at least
one keyframe. When the 3D model rotates with a speed of 30 rpm, 4199 features
are detected, and 786 features are tracked for longer than one keyframe. The results
when the object rotates at 55 and 100 rpm are similar. In the case of 55 rpm, a total of
78
4573 features are detected, and 912 features are tracked for more than one keyframe.
Experiment 2.3, when the 3D model rotates with a speed of 100 rpm, results in 4233
features detected and features tracked for at least one keyframe.
Figure 65: Experiment 2: Frame-based track length results for the frontal setup for
three rotations; 30, 55 and 100 rpm.
79
6.3 Experiment 3: behind setup
The feature tracking algorithms used during experiment 3 with the light source sideways
behind the 3D model (see Figure 48) have been unable to detect and track any features.
This is caused by the lack of area for the algorithms to track features. The lack of area
for the algorithm to detect and track features is shown in Figure 66. Additionally, the
lack of feature contrast can be a reason for the algorithm not being able to detect and
track features. The lack of features contrast can be a result of a lack of detail on the
3D model.
Figure 66: Experiment 3 (behind light, 30 rpm): (left) event-based camera and (right)
frame-based camera images.
In Figure 66, it can be seen that the event-based camera collects its events at the
edge where the light source hits before it turns to dark, while the event-based camera
captures the spinning asteroid when its turns from dark to bright. When combining
the gathered information, a larger area could be captured.
80
6.4 Discussion of results and closing remarks
The experiments discussed in this chapter show that the frame-based algorithm can
detect and track features under more different circumstances than the event-based
algorithm. Both algorithms work on the data gathered with the side setup, while only
the frame-based algorithm worked with the frontal setup. None of the algorithms
work with the data collected with the behind setup. Analyzing the results when
using side setup data shows that the frame-based algorithm detects more features and
has a higher absolute number of tracked features for at least one keyframe than the
event-based algorithm. However, in percentage and reprojection error-wise (accuracy),
the event-based algorithm shows better results than the frame-based feature tracking
algorithm. In addition, both camera algorithms show that increasing the object’s
rotation speed resulted in fewer features tracked for longer than one keyframe and less
accurately.
With the generated event stream of the frontal setup, the algorithm could compute
the event surface. However, it could not detect features on the model other than the
outer edge of the 3D model. The event-based algorithm could not track the detected
features on the asteroid’s body since they disappeared on the model surface. This lack
of features detected on the asteroid’s body resulted from the saturation of pixels in the
event-based camera. Thus, due to saturation, the event-based algorithm was unable to
track the captured features over the entire 3D model.
In contrast, the frame-based camera algorithm could detect and track features on the
frontal setup frames. The algorithm detected a significant amount of features using
the frontal setup experiment but was, for approximately 80% of the features, unable
to track the feature for at least one keyframe. Additionally, the fact that features
were detected in the non-object area makes the quantitative data unreliable to base
conclusions on.
Both the frame-based algorithms and event-based algorithm failed to detect and track
with the "behind" setup because of the lack of surface area and time for the algorithm
to compute tracking.
In addition to the tracking length, amount of features detected, and reprojection error,
the computation time and file size are analyzed. The computation time is significantly
shorter using the frame-based algorithm compared to the event-based algorithm.
However, the event-based file size is significantly smaller than the frame-based file
size. The side setup processed data shows that the computation time of the frame-
based algorithm for each experiment is at least 3.7 times shorter than the event-based
algorithm. Therefore, it can be concluded that the event-based algorithm is more
inefficient than the traditional frame-based algorithm and requires further development.
An additional observation has been made during the experiments. The location of the
detected and tracked event features are mainly focused around the dark region when
the 3D model turns towards the shadow side. In comparison, the features detected
81
and tracked using the captured frames were located in the bright region and not in the
dark region before the asteroid turned into the shadow. Therefore, combining the two
cameras could increase the total feature detection and tracking area since you could
track features on both the frames and events.
Based on the absolute number of tracked features, computational time, and robustness
in various scenarios, it can be concluded that the used frame-based algorithm surpasses
the event-based algorithm. However, based on percentages of tracked features and
file size, the event-based camera shows to be more efficient than the frame-based
camera. Therefore, tracking features solely on events is not superior to frame-based
feature tracking and the frame-based camera indicates to be more advantageous in
space-simulated conditions investigated in this work. Thus, the Prophesee Gen 3.1
event-based camera would not be able to replace the ArduCam 0135 frame-based
camera in space-simulated conditions. It does however show potential to work
alongside the frame-based camera to capture and track features over a larger area and
more efficiently.
82
7 Conclusions
In this thesis, a Hardware-in-the-Loop (HiL) imaging testbench was designed and
manufactured. The developed imaging testbench generated an event-based and frame-
based camera data set to conduct a comparative analysis using a visual navigation
algorithm. The results from the comparative analysis were used to determine the
feasibility of the event-based camera in a simulated near-space space object observation
mission.
The project consisted of two main parts. The first part included the development of
the HiL imaging setup in space-simulated conditions.These conditions are created by
observing a spinning scale model of asteroid Bennu inside a dark room under bright
illumination. The second part consisted of testing the imaging systems and generating
a data set to be computed using feature detection and tracking algorithms.
The main objective for the development of the HiL was to design and manufacture
a setup that enables using both frame-based and event-based camera under space-
simulated conditions. The design is based on optics calculations. Using these
calculations, three modules were designed: a camera module, a space object system
module, and a light source module. The camera module contains the camera hardware
that is under testing in an encapsulated 3D-printed component. The space object
module consisted of an electronic and hardware design to enable the space object to
rotate at different speeds within a designed 3D-printed mechanical system. A five
factors trade-off analysis was conducted to find the light source to simulate the Sun.
The second part of the thesis consisted of testing the HiL imaging setup and generating
data sets for three different experiments. The generated data sets were used to compare
the frame-based and event-based feature detection and tracking algorithms. Both
algorithms used the Harris corner detector to detect features. However, the tracking
part for the frame-based algorithm used the Kanade-Lucas Tomasi tracker, while the
event-based algorithm used HASTE.
By analyzing the outputs of the algorithms, considering factors such as the absolute
number of tracked features, computation time, and robustness in various scenarios,
the comparative results show that the frame-based camera algorithm outperforms the
event-based camera algorithm. This indication is based on the experiment that used a
light source sideways in front of the object. Unfortunately, the event-based algorithm
did not provide results for the other two experiments. However, when considering the
percentages of tracked features in relation to the total detected features, the event-based
algorithm tracked a significant higher percentage of features for at least one key frame
than the frame-based camera algorithm.
Additionally, the size of the event-based data sets was significantly smaller than that of
the frame-based data sets. This means that the event-based camera captures a dynamic
scene more efficiently. Furthermore, upon close observation of the asteroid turning so
that the bright regions move into the shadow side, it was noticed that the location of
83
feature detection and tracking differed between the two cameras. The frame-based
camera better captured the bright region, resulting in the features being detected and
tracked in this area. In contrast, the event-based camera captured the dark region better
before turning to the shadow side. This resulted in most event-based features being
detected and tracked in the dark region.
The second part of the thesis consisted of testing and experimenting with the HiL
imaging setup. This part was used to answer the second and third research questions,
which were:
84
The factors that influence the generation of the data sets are the calibration and distortion
parameters, frame-based camera parameters, and event-based camera biases. First,
calibration of the camera hardware is an essential step to extract metric information
from a 2D image. The calibration parameters and lens distortion parameters are
required to undistort the images and events to analyze the data set accurately. The
camera-specific factors that influence the generation of the data set are the frame-based
camera parameters and event-based camera biases. The frame-based parameters that
influenced the captured frames were the analog and digital gains to resolve the pattern
noise, as well as the exposure time and frame skipping to generate and optimize the
captures frame-per-second with the brightness levels. Another factor that influences
the generation of data sets in space-simulated conditions is the flickering of the light
source. This should be avoided since a flicker results in all event-based sensor pixel
firing, and the frame-based camera capture rapid bright and dark frames throwing off
any visual navigation algorithm. The steps taken to generate and process a valid data
set were: calibrating both cameras using an extended pinhole model, including lens
distortions, adjusting the analog and digital gain to minimize pattern noise as well as
the coarse integration time for the frame-based camera. The event-based camera is
calibrated, and the three bias categories, contrast sensitivity threshold, bandwidth, and
dead time, are tuned using empirical experimentation.
The feasibility of the event-based camera using only events for the used visual navigation
algorithms is low compared to the frame-based camera. However, it shows potential
to work alongside the frame-based camera to capture and track features over a larger
area. This results from the fact that the frame-based feature detection and tracking
algorithm has a higher absolute number of track features, shorter computational time,
and is more robust in various scenarios than the event-based algorithm. However,
based on percentages of tracked features and file size, the event-based camera shows
to be more efficient than the frame-based camera. Therefore, tracking features solely
on events is not superior to frame-based feature tracking, and the frame-based camera
is more advantageous in space-simulated conditions. Therefore, the Prophesee Gen
3.1 event-based camera would not be able to replace the ArduCam 0135 frame-based
camera in the tested space-simulated conditions.
85
7.2 Proposed future work
Given the theoretical potential and still immature research field of event-based vision
sensors in space, extending this project would be recommended. Four main focus
areas were identified for the future work in this topic:
Exploring two recently released feature detection and tracking algorithms could be the
starting point of this research. The first is an implementation of the feature detection
and tracking released by Prophesee Metavision [59]. The second algorithm is a
data-driven feature tracker for event-based cameras [60], demonstrating significantly
longer event-based feature tracks compared to the HASTE tracker used in this project.
Consider including Prophesee Gen 4.1 event-based camera or a DAVIS camera. This
allows for an increased resolution and reduced noise in the event-based sensor. As
discussed in the Literature review (Section 2.2.3), the DAVIS camera combines the
APS frame with the event stream enabling more combining the benefits of both frames
and events.
Include more textured models for the event-based camera to capture more brightness
changes on the model, resulting in more detected and possibly tracked features. In
addition, manufacturing a variety of objects can provide insight into the adaptability
of the event-based technology in simulated space conditions.
86
References
[1] O. Sikorski, D. Izzo, and G. Meoni, “Event-based spacecraft landing us-
ing time-to-contact,” in Proceedings of the IEEE/CVF Conference on Com-
puter Vision and Pattern Recognition, pp. 1941–1950, 2021. DOI:
10.1109/CVPRW53098.2021.00222.
[2] F. Mahlknecht, D. Gehrig, J. Nash, F. M. Rockenbauer, B. Morrell, J. Delaune,
and D. Scaramuzza, “Exploring event camera-based odometry for planetary
robots,” IEEE Robotics and Automation Letters, vol. 7, no. 4, pp. 8651–8658,
2022. DOI:10.48550/arXiv.2204.05880.
[3] G. Gallego, T. Delbrück, G. Orchard, C. Bartolozzi, B. Taba, A. Censi,
S. Leutenegger, A. J. Davison, J. Conradt, K. Daniilidis, et al., “Event-based vi-
sion: A survey,” IEEE transactions on pattern analysis and machine intelligence,
vol. 44, no. 1, pp. 154–180, 2020. DOI: 10.1109/TPAMI.2020.3008413.
[4] H. Krüger and S. Theil, “Tron-hardware-in-the-loop test facility for lunar descent
and landing optical navigation,” IFAC Proceedings Volumes, vol. 43, no. 15,
pp. 265–270, 2010. DOI: 10.3182/20100906-5-JP-2022.00046.
[5] P. Panicucci and F. Topputo, “The tinyv3rse hardware-in-the-loop vision-
based navigation facility,” Sensors, vol. 22, no. 23, p. 9333, 2022. DOI:
10.3390/s22239333.
[6] F. Piccolo, M. Pugliatti, P. Panicucci, F. Topputo, et al., “Toward verification and
validation of the milani image processing pipeline in the hardware-in-the-loop
testbench tinyv3rse,” in 44th AAS Guidance, Navigation and Control Conference,
pp. 1–21, 2022. DOI: 10.3390/s22239333.
[7] H. Benninghoff, F. Rems, E.-A. Risse, and C. Mietner, “European proxim-
ity operations simulator 2.0 (epos)-a robotic-based rendezvous and docking
simulator,” Journal of large-scale research facilities JLSRF, 2017. DOI:
10.17815/jlsrf-3-155.
[8] M. Mahowald, An analog VLSI system for stereoscopic vision, vol. 265. Springer
Science & Business Media, 1994. ISBN: 978-1-4615-2724-4.
[9] K. A. Zaghloul and K. Boahen, “Optic nerve signals in a neuromorphic chip i:
Outer and inner retina models,” IEEE Transactions on biomedical engineering,
vol. 51, no. 4, pp. 657–666, 2004. DOI: 10.1109/tbme.2003.821039.
[10] P. Lichtsteiner, C. Posch, and T. Delbruck, “A 128 times 128 120 db 15 𝑚𝑢 s
latency asynchronous temporal contrast vision sensor,” IEEE journal of solid-state
circuits, vol. 43, no. 2, pp. 566–576, 2008. DOI: 10.1109/JSSC.2007.914337.
[11] G. Gallego, “Event-based vision course.” https://sites.google.com/v
iew/guillermogallego/research/event-based-vision?authuser=0.
Date accessed: 27th October 2022.
87
[12] T. Delbruck and M. Lang, “Robotic goalie with 3 ms reaction time at 4% cpu
load using event-based dynamic vision sensor,” Frontiers in neuroscience, vol. 7,
p. 223, 2013. DOI: 10.3389/fnins.2013.00223.
[13] T. Delbruck and C. A. Mead, “Time-derivative adaptive silicon photoreceptor
array,” in Infrared Sensors: Detectors, Electronics, and Signal Processing,
vol. 1541, pp. 92–99, SPIE, 1991. DOI: 10.1117/12.49323.
[14] B. Kueng, E. Mueggler, G. Gallego, and D. Scaramuzza, “Low-latency visual
odometry using event-based feature tracks,” in 2016 IEEE/RSJ International
Conference on Intelligent Robots and Systems (IROS), pp. 16–23, IEEE, 2016.
DOI: 10.1109/IROS.2016.7758089.
[15] R. Berner, C. Brandli, M. Yang, S.-C. Liu, and T. Delbruck, “A 240× 180
10mw 12us latency sparse-output vision sensor for mobile applications,” in
2013 Symposium on VLSI Circuits, pp. C186–C187, IEEE, 2013. ISBN:
978-1-4673-5531-5.
[16] C. Brandli, R. Berner, M. Yang, S.-C. Liu, and T. Delbruck, “A 240 × 180 130 db
3 µs latency global shutter spatiotemporal vision sensor,” IEEE Journal of Solid-
State Circuits, vol. 49, pp. 2333–2341, 2014. DOI: 10.1109/JSSC.2014.2342715.
[17] T.-J. Chin, S. Bagchi, A. Eriksson, and A. Van Schaik, “Star tracking us-
ing an event camera,” in Proceedings of the IEEE/CVF Conference on Com-
puter Vision and Pattern Recognition Workshops, pp. 0–0, 2019. DOI:
10.48550/arXiv.1812.02895.
[18] F. Mahlknecht, D. Gehrig, J. Nash, F. M. Rockenbauer, B. Morrell, J. Delaune,
and D. Scaramuzza, “Exploring event camera-based odometry for planetary
robots,” IEEE Robotics and Automation Letters, vol. 7, no. 4, pp. 8651–8658,
2022. DOI: 10.48550/arXiv.2204.05880.
[19] H. Titus, “Imaging sensors,” The journal of applied sensing technology, 2001.
[20] N. Waltham, “Ccd and cmos sensors,” Observing Photons in Space: A Guide to
Experimental Space Astronomy, pp. 423–442, 2013.
[21] A. . Wednesday, “How to choose between a ccd and scmos scientific-grade
camera.” https://www.americanlaboratory.com/914-Application-N
otes/. Date accessed: 10th November 2022.
[22] D. Litwiller, “Ccd vs. cmos: facts and fiction,” Photonics Spectra, vol. 35,
pp. 154–158, 01 2001.
[23] H. Kim, S. Leutenegger, and A. J. Davison, “Real-time 3d reconstruction and
6-dof tracking with an event camera,” in Computer Vision–ECCV 2016: 14th
European Conference, Amsterdam, The Netherlands, October 11-14, 2016,
Proceedings, Part VI 14, pp. 349–364, Springer, 2016. ISBN: 978-3-319-46466-
4.
88
[24] M. Imaging, “Silkyevcam hd.” https://www.mavis-imaging.com/media
/pdf/13/c4/d3/Leaflet_SilkyEvCam_V1-0SJnVLt0r4PUYd.pdf, 2019.
Date accessed: 28th April 2023.
[27] “1/3-inch cmos digital image sensor ar0134 developer guide, rev. c.” https:
//cdn.hackaday.io/files/21966939793344/AR0134_DG_C.PDF, Jun
2014. Date accessed: 19th November 2022.
[30] K. Hata and S. Savarese, “Cs231a course notes 1: Camera models.” https:
//web.stanford.edu/class/cs231a/course_notes/01-camera-mod
els.pdf, 2017. Date accessed: 16th November 2022.
[31] G. Hollows and N. James, “The airy disk and diffraction limit.” https://www.
edmundoptics.eu/knowledge-center/application-notes/imaging
/limitations-on-resolution-and-contrast-the-airy-disk/. Date
accessed: 11th December 2021.
[34] M. K. Shepard, Asteroids: Relics of Ancient Time, pp. i–iv. Cambridge University
Press, 2015. ISBN: 978-1107061446.
89
[36] G. Bradski and A. Kaehler, Learning OpenCV: Computer vision with the OpenCV
library. O’Reilly Media Inc., 2008. ISBN: 978-0596516130.
[38] P. I. Corke and O. Khatib, Robotics, vision and control: fundamental algorithms
in MATLAB, vol. 73. Springer, 2011. ISBN: 978-3-319-54413-7.
[40] Z. Zhang, “A flexible new technique for camera calibration,” IEEE Transactions
on pattern analysis and machine intelligence, vol. 22, no. 11, pp. 1330–1334,
2000. DOI: 10.1109/34.888718.
[45] D. Martin, “A practical guide to machine vision lighting,” Midwest Sales and
Support Manager, Adv Illum2007, pp. 1–3, 2007.
90
[50] M. O. Aqel, M. H. Marhaban, M. I. Saripan, and N. B. Ismail, “Review of
visual odometry: types, approaches, challenges, and applications,” SpringerPlus,
vol. 5, pp. 1–26, 2016. DOI: 10.1186/s40064-016-3573-7.
[53] J. Shi et al., “Good features to track,” in 1994 Proceedings of IEEE conference
on computer vision and pattern recognition, pp. 593–600, IEEE, 1994. DOI:
10.1109/CVPR.1994.323794.
[55] V. Vasco, A. Glover, and C. Bartolozzi, “Fast event-based harris corner detection
exploiting the advantages of event-driven cameras,” in 2016 IEEE/RSJ inter-
national conference on intelligent robots and systems (IROS), pp. 4144–4149,
IEEE, 2016. DOI: 10.1109/IROS.2016.7759610.
[61] A. Lamprou and D. Vagiona, “Success criteria and critical success factors in
project success: a literature review,” RELAND: International Journal of Real
Estate & Land Planning, vol. 1, pp. 276–284, 2018. DOI: 10.26262/re-
land.v1i0.6483.
91
[62] B. N. Baker, D. C. Murphy, and D. Fisher, Factors affecting project success.
Wiley Online Library, 1997. DOI: 10.1002/9780470172353.ch35.
[64] Z. Zainal, “Case study as a research method,” Jurnal kemanusiaan, vol. 5, no. 1,
2007. DOI: 10.1186/1471-2288-11-100.
92
A Appendix: A complete overview of the optical scal-
ing calculations
Figure 67: A complete overview of the imaging calculations performed to scale the
space-environment.
93
B Appendix: Arduino IDE developed code
Figure 68: The developed Arduino IDE code to rotate the NEMA 17 stepper motor.
94
C Appendix: Technical drawings of the object mod-
ule components
Figure 69: Visualization of the SolidWorks space object module assembly designed
in this work.
95
Figure 70: Technical drawing of the designed middle part.
96
Figure 71: Technical drawings of the designed enclosure top part.
97
Figure 72: Technical drawings of the designed enclosure box.
98
Figure 73: Technical drawings of the designed ground component for the Arduino
Uno and breadboard.
99
Figure 74: Technical drawings of the designed Nema 17 enclosure cup.
100
D Appendix: Experiment setup schematics
The light source locations from a top-view perspective, including the height of each
module. The angles are calculated using the cosine rule.
Figure 75: Top-view schematic of the light source locations used in this work.
101
E Appendix: Overview of quantitative tracking re-
sults
Figure 76: Quantitative tracking results of Experiment 1 (the side light setup).
102
Figure 77: Quantitative tracking results of Experiment 2 (the front light setup).
103
Figure 78: Quantitative tracking results of Experiment 3 (the behind light setup).
104