Comp4010 Lecture4 AR Tracking and Interaction
Comp4010 Lecture4 AR Tracking and Interaction
Mark Billinghurst
August 17th 2021
[email protected]
REVIEW
Augmented Reality Definition
• Combines Real and Virtual Images
• Both can be seen at the same time
• Interactive in real-time
• The virtual content can be interacted with
• Registered in 3D
• Virtual objects appear fixed in space
Augmented Reality Technology
• Combines Real and Virtual Images
• Needs: Display technology
• Interactive in real-time
• Needs: Input and interaction technology
• Registered in 3D
• Needs: Viewpoint tracking technology
Example: MagicLeap ML-1 AR Display
• Display
• Multi-layered Waveguide display
• Tracking
• Inside out SLAM tracking
• Input
• 6DOF wand, gesture input
AR Display Technologies
• Classification (Bimber/Raskar 2005)
• Head attached
• Head mounted display/projector
• Body attached
• Handheld display/projector
• Spatial
• Spatially aligned projector/monitor
Display Taxonomy
Bimber, O., & Raskar, R. (2005). Spatial augmented reality: merging real and virtual worlds. CRC press.
Types of Head Mounted Displays
Occluded
See-thru
Multiplexed
Optical see-through Head-Mounted Display
Virtual images
from monitors
Real
World
Optical
Combiners
Optical Design – Curved Mirror
Monitors Combiner
Example: Varjo XR-1
• Wide field of view
• 87 degrees
• High resolution
• 1920 x 1080 pixel/eye
• 1440 x 1600 pixel insert
Video
Graphics Combiner
Magic Mirror AR Experience
• Registration
• Positioning virtual object wrt real world
• Fixing virtual object on real object when view is fixed
• Calibration
• Offline measurements
• Measure camera relative to head mounted display
• Tracking
• Continually locating the user’s viewpoint when view moving
• Position (x,y,z), Orientation (r,p,y)
Sources of Registration Errors
• Static errors
• Optical distortions (in HMD)
• Mechanical misalignments
• Tracker errors
• Incorrect viewing parameters
• Dynamic errors
• System delays (largest source of error)
• 1 ms delay = 1/3 mm registration error
Dynamic errors
Application Loop
x,y,z
Tracking Calculate Render Draw to
r,p,y
Viewpoint Scene Display
Simulation
• Markerless Tracking
• Tracking from known features in real world
• e.g. Vuforia image tracking
• Unprepared Tracking
• Tracking in unknown environment
• e.g. SLAM tracking
Visual Tracking Approaches
• Marker based tracking with artificial features
• Make a model before tracking
• Knowing:
• Position of key points in on-screen video image
• Camera properties (focal length, image distortion)
Coordinates for Marker Tracking
Coordinates for Marker Tracking
2:
3: Ideal
Marker
Marker
1: Camera Screen
IdealObserved
Observed
Camera Screen
Screen
Screen
•Nonlinear
•Final Goalfunction
modelof(barrel
•Correspondence
•Perspective shape)
4 vertices
•Obtained
•Real
•Rotation
•Obtained from
timefrom
&imageCamera Calibration
processing
Translation
Camera Calibration
37
https://github.com/artoolkit
Marker Tracking – Fiducial Detection
Surfaces
Tracking Image: Martin Hirzer51
Natural Features
https://www.youtube.com/watch?v=1Qf5Qew5zSU
Tracking by Keypoint Detection Camera Image
Recognition
Keypoint detection
• Targets are detected every frame
• Popular because tracking and detection Descriptor creation
Outlier Removal
Pose estimation
and refinement
Pose
Detection and Tracking
Start
Tracking target
detected
Rosten, E., & Drummond, T. (2006, May). Machine learning for high-speed corner detection.
In European conference on computer vision (pp. 430-443). Springer Berlin Heidelberg.
FAST Corner Keypoint Detection
Example: FAST Corner Detection
https://www.youtube.com/watch?v=fevfxfHnpeY
Descriptors
• Describe the Keypoint features
• Can use SIFT
• Estimate the dominant keypoint
orientation using gradients
• Compensate for detected
orientation
• Describe the keypoints in terms
of the gradients surrounding it
Wagner D., Reitmayr G., Mulloni A., Drummond T., Schmalstieg D.,
Real-Time Detection and Tracking for Augmented Reality on Mobile Phones.
IEEE Transactions on Visualization and Computer Graphics, May/June, 2010
Database Creation
• Offline step – create database of known features
• Searching for corners in a static image
• For robustness look at corners on multiple scales
• Some corners are more descriptive at larger or smaller scales
• We don’t know how far users will be from our image
• Build a database file with all descriptors and their
position on the original image
Real-time Tracking Camera Image
Recognition
Keypoint detection
Pose estimation
and refinement
Pose
NFT – Outlier removal
Recognition
• Search for keypoints in the video image Keypoint detection
Pose
Example
https://www.youtube.com/watch?v=6W7_ZssUTDQ
Taxonomy of Model Based Tracking
Lowney, M., & Raj, A. S. (2016). Model based tracking for augmented reality on mobile devices.
Marker vs. Natural Feature Tracking
• Marker tracking
• Usually requires no database to be stored
• Markers can be an eye-catcher
• Tracking is less demanding
• The environment must be instrumented
• Markers usually work only when fully in view
• Natural feature tracking
• A database of keypoints must be stored/downloaded
• Natural feature targets might catch the attention less
• Natural feature targets are potentially everywhere
• Natural feature targets work also if partially in view
Visual Tracking Approaches
• Marker based tracking with artificial features
• Make a model before tracking
• Model based tracking with natural features
• Acquire a model before tracking
• Simultaneous localization and mapping
• Build a model while tracking it
https://www.youtube.com/watch?v=uQeOYi3Be5Y
Tracking from an Unknown Environment
• What to do when you don’t know any features?
• Very important problem in mobile robotics - Where am I?
• SLAM
• Simultaneously Localize And Map the environment
• Goal: to recover both camera pose and map structure
while initially knowing neither.
• Mapping:
• Building a map of the environment which the robot is in
• Localisation:
• Navigating this environment using the map while keeping
track of the robot’s relative position and orientation
Parallel Tracking and Mapping
New keyframes
Tracking Mapping
Video stream
Simultaneous
FAST SLOW
New frames localization and mapping
(SLAM)
in small workspaces
• Visual SLAM
• Using cameras only, such as stereo view
• MonoSLAM (single camera) developed in 2007 (Davidson)
Example: Kudan MonoSLAM
How SLAM Works
• Assumptions
• Camera moves through an unchanging scene
• So not suitable for person tracking, gesture recognition
• Both involve non-rigidly deforming objects and a non-static map
Hybrid Tracking Interfaces
• Competitive
• Different sensor types measure the same degree of freedom
• Redundant sensor fusion
• Use worse sensor only if better sensor is unavailable
• E.g., GPS + pedometer
• Statistical sensor fusion
Example: Outdoor Hybrid Tracking
• Combines
• computer vision
• inertial gyroscope sensors
• Both correct for each other
• Inertial gyro
• provides frame to frame prediction of camera
orientation, fast sensing
• drifts over time
• Computer vision
• Natural feature tracking, corrects for gyro drift
• Slower, less accurate
Robust Outdoor Tracking
• Hybrid Tracking
• Computer Vision, GPS, inertial
• Going Out
• Reitmayr & Drummond (Univ. Cambridge)
Reitmayr, G., & Drummond, T. W. (2006). Going out: robust model-based tracking for outdoor augmented reaity. In Mixed and
Augmented Reality, 2006. ISMAR 2006. IEEE/ACM International Symposium on (pp. 109-118). IEEE.
Handheld Display
Demo: Going Out Hybrid Tracking
ARKit – Visual Inertial Odometry
• Uses both computer vision + inertial sensing
• Tracking position twice
• Computer Vision – feature tracking, 2D plane tracking
• Inertial sensing – using the phone IMU
• Output combined via Kalman filter
• Determine which output is most accurate
• Pass pose to ARKit SDK
• Slow camera
• Fast IMU
• If camera drops out IMU takes over
• Camera corrects IMU errors
ARKit Demo
• https://www.youtube.com/watch?v=dMEWp45WAUg
Conclusions
• Tracking and Registration are key problems
• Registration error
• Measures against static error
• Measures against dynamic error
• AR typically requires multiple tracking technologies
• Computer vision most popular
• Research Areas:
• SLAM systems, Deformable models, Mobile outdoor tracking
More Information
Fua, P., & Lepetit, V. (2007). Vision based 3D tracking
and pose estimation for mixed reality. In Emerging
technologies of augmented reality: Interfaces and
design (pp. 1-22). IGI Global.
3: AR INTERACTION
Augmented Reality technology
• Combines Real and Virtual Images
• Needs: Display technology
• Interactive in real-time
• Needs: Input and interaction technology
• Registered in 3D
• Needs: Viewpoint tracking technology
How Do You Design an Interface for This?
AR Interaction
• Designing AR Systems = Interface Design
• Using different input and output technologies
• AR Navigation Aid
• GPS + compass, 2D/3D object placement
Advantages and Disadvantages
www.nextwall.net
Advantages and Disadvantages
• Important class of AR interfaces
• Entertainment, design, training
• Advantages
• User can interact with 3D virtual
object everywhere in space
• Natural, familiar interaction
• Disadvantages
• Usually no tactile feedback
• User has to use different devices for
virtual and physical objects
Oshima, et al. 2000
www.empathiccomputing.org
@marknb00