0% found this document useful (0 votes)
232 views

Exploring Augmented Reality With Python

Augmented reality requires tracking the user's position and orientation in real-time using keypoints, or distinctive points in the environment, and SLAM algorithms. Keypoint detection finds points, while description gives each a unique fingerprint. Together they allow pose estimation as the viewpoint changes. Tracking a database of keypoints enables error correction and pose refinement. Occlusion, or hiding virtual objects properly behind real ones, remains challenging due to limitations of current depth sensors and difficulty reconstructing 3D geometry in real-time from 2D images alone.

Uploaded by

Ibrahim Isleem
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
232 views

Exploring Augmented Reality With Python

Augmented reality requires tracking the user's position and orientation in real-time using keypoints, or distinctive points in the environment, and SLAM algorithms. Keypoint detection finds points, while description gives each a unique fingerprint. Together they allow pose estimation as the viewpoint changes. Tracking a database of keypoints enables error correction and pose refinement. Occlusion, or hiding virtual objects properly behind real ones, remains challenging due to limitations of current depth sensors and difficulty reconstructing 3D geometry in real-time from 2D images alone.

Uploaded by

Ibrahim Isleem
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 23

Exploring Augmented Reality

with Python
AR: Defining Characteristics
• Blending of the real with the
imaginary
• Interacting real-time with
virtual content
• Virtual objects are either
fixed or have predictable
movements
Doing proper AR in real world is chaotic.
A lot can change while you're using an AR app:
• Camera angle/perspective
• Rotation
• Scale
• Lightning
• Blur from motion or focusing
• General image noise
So what makes AR possible?
Update understanding

AR requires tracking to really work well.

Continually locating/updating the user's viewpoint when in

motion involves:
Positional tracking: (x, y, z) coordinates
Rotational tracking: roll, pitch & yaw - (r, p, у)

Known as Pose Estimation in XR speak.


A common co-ordinate system is important!
Update understanding

Shouldn't there be some reference points to do all this?

YES!

Keypoints are the key here:

distinctive locations in images - corners, blobs or T-junctions.

Together, they describe features of the environment.


Properties:
• Reliability - a feature should be reliable always.
• Invariance -same point in different views.
Update understanding
Some SLAM algorithms used to identify the key points to track a feature
reliably:

• SIFT - Scale Invariant Feature Transform


• SURF - Speeded up robust features
• BRISK - Binary Robust Invariant Scalable Keypoints
Update understanding
Two parts to any algorithm:
• Keypoint detection : detect sufficient key points to understand
environment well
• Keypoint description : give unique fingerprint to each keypoint
detected.
Key points == Spatial Anchors
Should happen each frame!
Update understanding
For BRISK:
Keypoint detection:
At least 9 pixels
should be brighter
or darker than p

Keypoint description :
Create binary
stringwith 512 bits,
withcomparison results
Update understanding
Update understanding
Update understanding
Update understanding
For tracking, create database of all keypoints
Look at corners at multiple scales:
varied scale => more descriptive!
Key points : more the better.
Expensive to track too!
Update understanding
• Error correction is an important
step in maintaining the pose.
• Removal of outliers helps in
pose refinement:
• Simple geometry-based or
maybe homography-based (2D-
3D relationships)
• Use remaining key points to
calculate the pose
So what makes AR possible?
Update understanding - other factors
There are some other important parts to enabling a good AR
experience:
• Lighting of the surrounding environment
• User interactions (hit testing, raycasts etc. ) with the virtual objects
• Oriented points - to place objects on angled surfaces
• Occlusion
Exclusive behind the scenes from your AR
app...
On launching an AR app:
• Check for pre-downloaded keypoint maps and initialize a new map if
needed
• Update map with movement in scene
• Use this data to create experience
Bigger maps = more computations to manage!
Bit of a problem child.
Current SOTA
Now to the tricky part - problems!
Some major issues in current AR technologies:
• Improper Occlusion - objects are not hidden when they are supposed to
• Depth distortions - unreliable depth sensor data
• Inaccurate tracking data - drifting objects
• Performance drops - too data much to process
Could have potential workarounds, rarely work in real scenarios.
An example of improper occlusion
(Attempting to) Solve occlusion
To handle Occlusion, do either of these:
• Use depth cameras to get a 3D location of each pixel (trivial?)
• Aggregate the depth info across frames to generate a 3D geometry of
the scene
The catch: most phones don't have a depth sensor to begin with.
(Attempting to) Solve Occlusion
3D reconstruction can be split into two sub-problems:
• Generate 3D depth information from 2D images (dense point
clouds/depth images )
• Integrate this over several frames to generate meshes in real time
Point cloud -> 3D mesh -> Occlude!
Still a hard task with more performance drops!
Thank you

You might also like