0% found this document useful (0 votes)
106 views

Comp4010 Lecture4 AR Tracking and Interaction

This document discusses augmented reality tracking and interaction technologies. It begins with an overview of augmented reality definitions and components. It then examines example AR display technologies like Magic Leap and Varjo XR-1 head mounted displays. Different types of displays and optical designs for see-through displays are described. Key aspects of AR like tracking, registration, and reducing dynamic errors are analyzed. Finally, optical tracking technologies like marker-based tracking with ARToolkit and markerless tracking are explored in more detail.

Uploaded by

Yujie Wang
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
106 views

Comp4010 Lecture4 AR Tracking and Interaction

This document discusses augmented reality tracking and interaction technologies. It begins with an overview of augmented reality definitions and components. It then examines example AR display technologies like Magic Leap and Varjo XR-1 head mounted displays. Different types of displays and optical designs for see-through displays are described. Key aspects of AR like tracking, registration, and reducing dynamic errors are analyzed. Finally, optical tracking technologies like marker-based tracking with ARToolkit and markerless tracking are explored in more detail.

Uploaded by

Yujie Wang
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 123

AR TRACKING AND INTERACTION

COMP 4010 Lecture Four

Mark Billinghurst
August 17th 2021
[email protected]
REVIEW
Augmented Reality Definition
• Combines Real and Virtual Images
• Both can be seen at the same time
• Interactive in real-time
• The virtual content can be interacted with
• Registered in 3D
• Virtual objects appear fixed in space
Augmented Reality Technology
• Combines Real and Virtual Images
• Needs: Display technology
• Interactive in real-time
• Needs: Input and interaction technology
• Registered in 3D
• Needs: Viewpoint tracking technology
Example: MagicLeap ML-1 AR Display
• Display
• Multi-layered Waveguide display
• Tracking
• Inside out SLAM tracking
• Input
• 6DOF wand, gesture input
AR Display Technologies
• Classification (Bimber/Raskar 2005)
• Head attached
• Head mounted display/projector
• Body attached
• Handheld display/projector
• Spatial
• Spatially aligned projector/monitor
Display Taxonomy

Bimber, O., & Raskar, R. (2005). Spatial augmented reality: merging real and virtual worlds. CRC press.
Types of Head Mounted Displays
Occluded
See-thru

Multiplexed
Optical see-through Head-Mounted Display
Virtual images
from monitors

Real
World
Optical
Combiners
Optical Design – Curved Mirror

▪ Reflect off free-space curved mirror


Video see-through HMD
Video
cameras Video
Graphics

Monitors Combiner
Example: Varjo XR-1
• Wide field of view
• 87 degrees

• High resolution
• 1920 x 1080 pixel/eye
• 1440 x 1600 pixel insert

• Low latency stereo cameras


• 2 x 12 megapixel
• < 20 ms delay

• Integrated Eye Tracking


Varjo XR-1 Image Quality
Multiplexed Display

Virtual Image ‘inset’ into Real World


Example: Google Glass
Spatial Augmented Reality

• Project onto irregular surfaces


• Geometric Registration
• Projector blending, High dynamic range
• Book: Bimber, Rasker “Spatial Augmented Reality”
Video Monitor AR
Video Stereo
cameras Monitor glasses

Video

Graphics Combiner
Magic Mirror AR Experience

• See AR overlay of an image of yourself


AR Requires Tracking and Registration

• Registration
• Positioning virtual object wrt real world
• Fixing virtual object on real object when view is fixed

• Calibration
• Offline measurements
• Measure camera relative to head mounted display

• Tracking
• Continually locating the user’s viewpoint when view moving
• Position (x,y,z), Orientation (r,p,y)
Sources of Registration Errors
• Static errors
• Optical distortions (in HMD)
• Mechanical misalignments
• Tracker errors
• Incorrect viewing parameters

• Dynamic errors
• System delays (largest source of error)
• 1 ms delay = 1/3 mm registration error
Dynamic errors
Application Loop

x,y,z
Tracking Calculate Render Draw to
r,p,y
Viewpoint Scene Display
Simulation

20 Hz = 50ms 500 Hz = 2ms 30 Hz = 33ms 60 Hz = 17ms

• Total Delay = 50 + 2 + 33 + 17 = 102 ms


• 1 ms delay = 1/3 mm = 33mm error
Reducing dynamic errors (1)

• Reduce system lag


• Faster components/system modules
• Reduce apparent lag
• Image deflection
• Image warping
Reducing dynamic errors (2)
• Match video + graphics input streams (video AR)
• Delay video of real world to match system lag
• User doesn’t notice
• Predictive Tracking
• Inertial sensors helpful

Azuma / Bishop 1994


Tracking Technologies
§ Active
• Mechanical, Magnetic, Ultrasonic
• GPS, Wifi, cell location
§ Passive
• Inertial sensors (compass, accelerometer, gyro)
• Computer Vision
• Marker based, Natural feature tracking
§ Hybrid Tracking
• Combined sensors (eg Vision + Inertial)
Tracking Types

Magnetic Inertial Ultrasonic Optical Mechanical


Tracker Tracker Tracker Tracker Tracker

Specialize Marker-Based Markerless


d Tracking Tracking Tracking

Edge-Based Template- Interest Point


Tracking Based Tracking Tracking
OPTICAL TRACKING
https://www.youtube.com/watch?v=OtG-FNYhDv0
Why Optical Tracking for AR?

• Many AR devices have cameras


• Mobile phone/tablet, Video see-through display

• Provides precise alignment between video and AR overlay


• Using features in video to generate pixel perfect alignment
• Real world has many visual features that can be tracked from

• Computer Vision is a well established discipline


• Over 40 years of research to draw on
• Old non real time algorithms can be run in real time on todays devices
Common AR Optical Tracking Types
• Marker Tracking
• Tracking known artificial markers/images
• e.g. ARToolKit square markers

• Markerless Tracking
• Tracking from known features in real world
• e.g. Vuforia image tracking

• Unprepared Tracking
• Tracking in unknown environment
• e.g. SLAM tracking
Visual Tracking Approaches
• Marker based tracking with artificial features
• Make a model before tracking

• Model based tracking with natural features


• Acquire a model before tracking

• Simultaneous localization and mapping


• Build a model while tracking it
Marker Tracking
• Available for more than 20 years
• Several open-source solutions exist
• ARToolKit, ARTag, ATK+, etc
• Fairly simple to implement
• Standard computer vision methods
• A rectangle provides 4 corner points
• Enough for pose estimation!
Demo: ARToolKit
Key Problem: Finding Camera Position

Known image Image in Camera view Overlay AR content

• Need camera pose relative to marker to render AR graphics


Goal: Find Camera Pose

• Knowing:
• Position of key points in on-screen video image
• Camera properties (focal length, image distortion)
Coordinates for Marker Tracking
Coordinates for Marker Tracking
2:
3: Ideal
Marker
Marker
1: Camera Screen
IdealObserved
Observed
Camera Screen
Screen
Screen
•Nonlinear
•Final Goalfunction
modelof(barrel
•Correspondence
•Perspective shape)
4 vertices
•Obtained
•Real
•Rotation
•Obtained from
timefrom
&imageCamera Calibration
processing
Translation
Camera Calibration
37

Marker Tracking – General Principle


1. Capturing image with known camera
2. Search for quadrilaterals
3. Pose estimation
1 2
from homography
4. Pose refinement
Minimize nonlinear
projection error
5. Use final pose
5 4 3

Image: Daniel Wagner


Marker Based Tracking: ARToolKit

https://github.com/artoolkit
Marker Tracking – Fiducial Detection

• Threshold the whole image to black and white


• Search scanline by scanline for edges (white to black)
• Follow edge until either
• Back to starting pixel
• Image border
• Check for size
• Reject fiducials early that are too small (or too large)
Marker Tracking – Rectangle Fitting
• Start with an arbitrary point “x” on the contour
• The point with maximum distance must be a corner c0
• Create a diagonal through the center
• Find points c1 & c2 with maximum distance left and right of diag.
• New diagonal from c1 to c2
• Find point c3 right of diagonal with maximum distance
Marker Tracking – Pattern checking
• Calculate homography using the 4 corner points
• “Direct Linear Transform” algorithm
• Maps normalized coordinates to marker coordinates
(simple perspective projection, no camera model)
• Extract pattern by sampling and check
• Id (implicit encoding)
• Template (normalized cross correlation)
Marker tracking – Pose estimation
• Calculates marker pose relative to the camera
• Initial estimation directly from homography
• Very fast, but coarse with error
• Jitters a lot…
• Iterative Refinement using Gauss-Newton method
• 6 parameters (3 for position, 3 for rotation) to refine
• At each iteration we optimize on the error
• Iterate
Outcome: Camera Transform
• Transformation from Marker to Camera
• Rotation and Translation

TCM : 4x4 transformation matrix


from marker coord. to camera coord.
Tracking challenges in ARToolKit

Occlusion Unfocused camera, Dark/unevenly lit Jittering


(image by M. Fiala) motion blur scene, vignetting (Photoshop illustration)

False positives and inter-marker confusion Image noise


(image by M. Fiala) (e.g. poor lens, block
coding /
compression, neon tube)
Other Marker Tracking Libraries
But - You can’t cover world with ARToolKit Markers!
Markerless Tracking
• No more Markers! èMarkerless Tracking

Magnetic Inertial Ultrasonic Optical Mechanica


Tracker Tracker Tracker Tracker l Tracker

Specialized Marker-Based Markerless


Tracking Tracking Tracking

Edge-Based Template-Based Interest Point


Tracking Tracking Tracking
https://www.youtube.com/watch?v=ANEB-DhuTSA
Visual Tracking Approaches
• Marker based tracking with artificial features
• Make a model before tracking
• Model based tracking with natural features
• Acquire a model before tracking
• Simultaneous localization and mapping
• Build a model while tracking it
Natural Feature Tracking
Features Points
• Use Natural Cues of Real Elements Contours
• Edges
• Surface Texture
• Interest Points
• Model or Model-Free
• No visual pollution

Surfaces
Tracking Image: Martin Hirzer51

Natural Features

• Detect salient interest points in image


• Must be easily found
• Location in image should remain stable
when viewpoint changes
• Requires textured surfaces
• Alternative: can use edge features (less discriminative)

• Match interest points to tracking model database


• Database filled with results of 3D reconstruction
• Matching entire (sub-)images is too costly
• Typically interest points are compiled into “descriptors”

Image: Gerhard Reitmayr


Texture Tracking
Demo: Vuforia Texture Tracking

https://www.youtube.com/watch?v=1Qf5Qew5zSU
Tracking by Keypoint Detection Camera Image

• This is what most trackers do…

Recognition
Keypoint detection
• Targets are detected every frame
• Popular because tracking and detection Descriptor creation

are solved simultaneously and matching

Outlier Removal

Pose estimation
and refinement

Pose
Detection and Tracking
Start

Tracking target
detected

Tracking target Incremental Incremental


Detection
not detected tracking tracking ok

+ Recognize target type Tracking target + Fast


+ Detect target lost + Robust to blur, lighting changes
+ Initialize camera pose + Robust to tilt

• Tracking and detection are complementary approaches.


• After successful detection, the target is tracked incrementally.
• If the target is lost, the detection is activated again
What is a Keypoint?

• It depends on the detector you use!


• For high performance use the FAST corner detector
• Apply FAST to all pixels of your image
• Obtain a set of keypoints for your image
• Describe the keypoints

Rosten, E., & Drummond, T. (2006, May). Machine learning for high-speed corner detection.
In European conference on computer vision (pp. 430-443). Springer Berlin Heidelberg.
FAST Corner Keypoint Detection
Example: FAST Corner Detection

https://www.youtube.com/watch?v=fevfxfHnpeY
Descriptors
• Describe the Keypoint features
• Can use SIFT
• Estimate the dominant keypoint
orientation using gradients
• Compensate for detected
orientation
• Describe the keypoints in terms
of the gradients surrounding it

Wagner D., Reitmayr G., Mulloni A., Drummond T., Schmalstieg D.,
Real-Time Detection and Tracking for Augmented Reality on Mobile Phones.
IEEE Transactions on Visualization and Computer Graphics, May/June, 2010
Database Creation
• Offline step – create database of known features
• Searching for corners in a static image
• For robustness look at corners on multiple scales
• Some corners are more descriptive at larger or smaller scales
• We don’t know how far users will be from our image
• Build a database file with all descriptors and their
position on the original image
Real-time Tracking Camera Image

• Search for known keypoints in the video

Recognition
Keypoint detection

• Create the descriptors


• Match the descriptors from the Descriptor creation
and matching
live video against those in the database
• Brute force is not an option
• Need the speed-up of special data structures Outlier Removal

Pose estimation
and refinement

Pose
NFT – Outlier removal

• Removing outlier features


• Several removal techniques
Rotation Invariant
• Simple geometric tests
• Is the keypoint rotation invariant?
• Do keypoints remain relative to each other?
• Homography-based tests
NFT – Pose refinement
• Pose from homography makes good
starting point
• Use Gauss-Newton iteration
• Try to minimize the re-projection error
of the keypoints
• Typically, 2-4 iterations are enough..
NFT – Real-time tracking Camera Image

Recognition
• Search for keypoints in the video image Keypoint detection

• Create the descriptors


• Match the descriptors from the Descriptor creation
and matching
live video against those in the database
• Remove the keypoints that are outliers Outlier Removal

• Use the remaining keypoints


to calculate the pose of the camera Pose estimation
and refinement

Pose
Example

Target Image Feature Detection AR Overlay


https://www.youtube.com/watch?v=O8XH6ORpBls
Edge Based Tracking
• Example: RAPiD [Drummond et al. 02]
• Initialization, Control Points, Pose Prediction (Global Method)
Demo: Edge Based Tracking
Line Based Tracking
• Visual Servoing [Comport et al. 2004]
3D Model Based Tracking
• Tracking from 3D object shape
• Align detected features to 3D object model
• Examples
• SnapChat Face tracking
• Mechanical part tracking
• Vehicle tracking
• Etc..
Typical Model Based Tracking Algorithm
Example: Vuforia Model Tracker

• Uses pre-captured 3D model for tracking


• On-screen guide to line up model
Model Tracking Demo

https://www.youtube.com/watch?v=6W7_ZssUTDQ
Taxonomy of Model Based Tracking

Lowney, M., & Raj, A. S. (2016). Model based tracking for augmented reality on mobile devices.
Marker vs. Natural Feature Tracking
• Marker tracking
• Usually requires no database to be stored
• Markers can be an eye-catcher
• Tracking is less demanding
• The environment must be instrumented
• Markers usually work only when fully in view
• Natural feature tracking
• A database of keypoints must be stored/downloaded
• Natural feature targets might catch the attention less
• Natural feature targets are potentially everywhere
• Natural feature targets work also if partially in view
Visual Tracking Approaches
• Marker based tracking with artificial features
• Make a model before tracking
• Model based tracking with natural features
• Acquire a model before tracking
• Simultaneous localization and mapping
• Build a model while tracking it
https://www.youtube.com/watch?v=uQeOYi3Be5Y
Tracking from an Unknown Environment
• What to do when you don’t know any features?
• Very important problem in mobile robotics - Where am I?

• SLAM
• Simultaneously Localize And Map the environment
• Goal: to recover both camera pose and map structure
while initially knowing neither.
• Mapping:
• Building a map of the environment which the robot is in
• Localisation:
• Navigating this environment using the map while keeping
track of the robot’s relative position and orientation
Parallel Tracking and Mapping

New keyframes

Tracking Mapping

+ Estimate camera pose Map updates + Extend map


+ For every frame + Improve map
+ Slow updates rate

Parallel tracking and mapping uses two


concurrent threads, one for tracking and one
for mapping, which run at different speeds
Parallel Tracking and Mapping

Video stream

Simultaneous
FAST SLOW
New frames localization and mapping
(SLAM)
in small workspaces

Tracking Map updates Mapping Klein/Drummond, U.


Cambridge

Tracked local pose


Visual SLAM

• Early SLAM systems (1986 - )


• Computer visions and sensors (e.g. IMU, laser, etc.)
• One of the most important algorithms in Robotics

• Visual SLAM
• Using cameras only, such as stereo view
• MonoSLAM (single camera) developed in 2007 (Davidson)
Example: Kudan MonoSLAM
How SLAM Works

• Three main steps


1. Tracking a set of points through successive camera frames
2. Using these tracks to triangulate their 3D position
3. Simultaneously use the estimated point locations to calculate
the camera pose which could have observed them
• By observing a sufficient number of points can solve for both
structure and motion (camera path and scene structure).
Evolution of SLAM Systems
• MonoSLAM (Davidson, 2007)
• Real time SLAM from single camera
• PTAM (Klein, 2009)
• First SLAM implementation on mobile phone
• FAB-MAP (Cummins, 2008)
• Probabilistic Localization and Mapping
• DTAM (Newcombe, 2011)
• 3D surface reconstruction from every pixel in image
• KinectFusion (Izadi, 2011)
• Realtime dense surface mapping and tracking using RGB-D
Demo: MonoSLAM
LSD-SLAM (Engel 2014)

• A novel, direct monocular SLAM technique


• Uses image intensities both for tracking and mapping.
• The camera is tracked using direct image alignment, while
• Geometry is estimated as semi-dense depth maps
• Supports very large-scale tracking
• Runs in real time on CPU and smartphone
Demo: LSD-SLAM
Direct Method vs. Feature Based

• Direct uses all information in image, cf feature based approach


that only use small patches around corners and edges
Applications of SLAM Systems
• Many possible applications
• Augmented Reality camera tracking
• Mobile robot localisation
• Real world navigation aid
• 3D scene reconstruction
• 3D Object reconstruction
• Etc..

• Assumptions
• Camera moves through an unchanging scene
• So not suitable for person tracking, gesture recognition
• Both involve non-rigidly deforming objects and a non-static map
Hybrid Tracking Interfaces

• Combine multiple tracking technologies together


• Active-Passive: Magnetic, Vision
• Active-Inertial: Vison, inertial
• Passoive-Inertial: Compass, inertial
Combining Sensors and Vision
• Sensors
• Produces noisy output (= jittering augmentations)
• Are not sufficiently accurate (= wrongly placed augmentations)
• Gives us first information on where we are in the world,
and what we are looking at
• Vision
• Is more accurate (= stable and correct augmentations)
• Requires choosing the correct keypoint database to track from
• Requires registering our local coordinate frame (online-
generated model) to the global one (world)
Outdoor AR Tracking System

You, Neumann, Azuma outdoor AR system (1999)


www.augmentedrealitybook.org Tracking 93

Types of Sensor Fusion


• Complementary
• Combining sensors with different degrees of freedom
• Sensors must be synchronized (or requires inter-/extrapolation)
• E.g., combine position-only and orientation-only sensor
• E.g., orthogonal 1D sensors in gyro or magnetometer are complementary

• Competitive
• Different sensor types measure the same degree of freedom
• Redundant sensor fusion
• Use worse sensor only if better sensor is unavailable
• E.g., GPS + pedometer
• Statistical sensor fusion
Example: Outdoor Hybrid Tracking
• Combines
• computer vision
• inertial gyroscope sensors
• Both correct for each other
• Inertial gyro
• provides frame to frame prediction of camera
orientation, fast sensing
• drifts over time
• Computer vision
• Natural feature tracking, corrects for gyro drift
• Slower, less accurate
Robust Outdoor Tracking

• Hybrid Tracking
• Computer Vision, GPS, inertial
• Going Out
• Reitmayr & Drummond (Univ. Cambridge)

Reitmayr, G., & Drummond, T. W. (2006). Going out: robust model-based tracking for outdoor augmented reaity. In Mixed and
Augmented Reality, 2006. ISMAR 2006. IEEE/ACM International Symposium on (pp. 109-118). IEEE.
Handheld Display
Demo: Going Out Hybrid Tracking
ARKit – Visual Inertial Odometry
• Uses both computer vision + inertial sensing
• Tracking position twice
• Computer Vision – feature tracking, 2D plane tracking
• Inertial sensing – using the phone IMU
• Output combined via Kalman filter
• Determine which output is most accurate
• Pass pose to ARKit SDK

• Each system compliments the other


• Computer vision – needs visual features
• IMU - drifts over time, doesn’t need features
ARKit – Visual Inertial Odometry

• Slow camera
• Fast IMU
• If camera drops out IMU takes over
• Camera corrects IMU errors
ARKit Demo

• https://www.youtube.com/watch?v=dMEWp45WAUg
Conclusions
• Tracking and Registration are key problems
• Registration error
• Measures against static error
• Measures against dynamic error
• AR typically requires multiple tracking technologies
• Computer vision most popular
• Research Areas:
• SLAM systems, Deformable models, Mobile outdoor tracking
More Information
Fua, P., & Lepetit, V. (2007). Vision based 3D tracking
and pose estimation for mixed reality. In Emerging
technologies of augmented reality: Interfaces and
design (pp. 1-22). IGI Global.
3: AR INTERACTION
Augmented Reality technology
• Combines Real and Virtual Images
• Needs: Display technology
• Interactive in real-time
• Needs: Input and interaction technology
• Registered in 3D
• Needs: Viewpoint tracking technology
How Do You Design an Interface for This?
AR Interaction
• Designing AR Systems = Interface Design
• Using different input and output technologies

• Objective is a high quality of user experience


• Ease of use and learning
• Performance and satisfaction
Typical Interface Design Path
1/ Prototype Demonstration
2/ Adoption of Interaction Techniques from
other interface metaphors Augmented Reality
3/ Development of new interface metaphors
appropriate to the medium Virtual Reality
4/ Development of formal theoretical models
for predicting and modeling user actions
Desktop WIMP
Interacting with AR Content
• You can see spatially registered AR..
how can you interact with it?
Different Types of AR Interaction
• Browsing Interfaces
• simple (conceptually!), unobtrusive
• 3D AR Interfaces
• expressive, creative, require attention
• Tangible Interfaces
• Embedded into conventional environments
• Tangible AR
• Combines TUI input + AR display
AR Interfaces as Data Browsers
• 2D/3D virtual objects are
registered in 3D
• “VR in Real World”
• Interaction
• 2D/3D virtual viewpoint control
• Applications
• Visualization, training
AR Information Browsers
• Information is registered
to
real-world context
• Hand held AR displays
• Interaction
• Manipulation of a window
into information space
• Applications
• Context-aware information Rekimoto, et al. 1997
displays
NaviCam Demo (1997)
Navicam Architecture
Current AR Information Browsers
• Mobile AR
• GPS + compass
• Many Applications
• Wikitude
• Yelp
• Google maps
•…
Example: Google Maps AR Mode

• AR Navigation Aid
• GPS + compass, 2D/3D object placement
Advantages and Disadvantages

• Important class of AR interfaces


• Wearable computers
• AR simulation, training
• Limited interactivity
• Modification of virtual
content is difficult
Rekimoto, et al. 1997
3D AR Interfaces

• Virtual objects displayed in 3D


physical space and manipulated
• HMDs and 6DOF head-tracking
• 6DOF hand trackers for input
• Interaction
• Viewpoint control
• Traditional 3D user interface Kiyokawa, et al. 2000
interaction: manipulation, selection,
etc.
AR 3D Interaction (2000)
Example: AR Graffiti

www.nextwall.net
Advantages and Disadvantages
• Important class of AR interfaces
• Entertainment, design, training
• Advantages
• User can interact with 3D virtual
object everywhere in space
• Natural, familiar interaction
• Disadvantages
• Usually no tactile feedback
• User has to use different devices for
virtual and physical objects
Oshima, et al. 2000
www.empathiccomputing.org

[email protected]

@marknb00

You might also like