0% found this document useful (0 votes)
13 views36 pages

Aragog 2025 Spider ML

The document provides an overview of computer vision (CV) and machine learning (ML) techniques, including edge detection, shape detection, object detection, and motion detection. It discusses various challenges in these areas and introduces tools like Mediapipe and YOLO for implementing CV solutions. Additionally, it includes a code demonstration link and instructions for running the code in a Python environment.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views36 pages

Aragog 2025 Spider ML

The document provides an overview of computer vision (CV) and machine learning (ML) techniques, including edge detection, shape detection, object detection, and motion detection. It discusses various challenges in these areas and introduces tools like Mediapipe and YOLO for implementing CV solutions. Additionally, it includes a code demonstration link and instructions for running the code in a Python environment.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

Aragog 2025

Spider ML
Intro to ML and CV
Types of AI
Computer Vision?¿
What is (Computer) Vision?
Human vs Computer Vision
Applications of CV
What do we do in CV?
Techniques in CV

Edge Detection
Shape Detection
Object Detection
Motion Detection
Scene Understanding
Edge detection

Edges are basically boundaries between


different regions in an image. Detecting them
helps computers understand the shapes, objects,
and structure within an image. It’s just like how
we recognize outlines of things in a photo — our
brains are wired for it, and edge detection lets
machines do the same.
Sobel Filters

Sobel filters use simple convolution operations to


find edges in two directions:

● Horizontal (X-direction)

● Vertical (Y-direction)

The Sobel operator computes an approximation of

the gradient of an image


Canny Edge detection
Shape Detection Using CV

WHAT IS IT?
Identifying and classifying
geometric shapes in an image by
detecting closed contours.
Typically done using edge
detection and contour
approximation.

WHY IS IT NEEDED?
It is widely used in applications
such as object recognition,
industrial inspection, autonomous
navigation, and gesture-based
interfaces.
Basic Steps In Shape Detection
Load the image Preprocessing Edge detection

Classify Shapes (Accurate) Shape classification Contour approximation


Code Demonstration
Link to access the code: https://colab.research.google.com/drive/1679e1OZjra-
bGyGZsno8lXJmbHELoCSp?usp=sharing

Follow the instructions mentioned on google colab and run the code
Challenges In Shape Detection

Noisy Images Similar Shapes (e.g., Square


vs. Diamond)
Random variations or grainy
textures in images can create Shapes like squares and
false edges, leading to diamonds have the same
incorrect or excessive shape number of sides and angles, but
detection. different orientations.

Distorted Shapes Poor Lighting Detecting Insignificant Shapes

Shapes may appear warped due Low or uneven lighting causes When all contours are
to perspective angles or lens shadows and highlights that alter considered, even tiny
distortion, making it hard for the appearance of edges, imperfections or noise can be
algorithms to match them with resulting in incomplete or detected as shapes, cluttering
ideal geometric templates. inaccurate contour detection. the result with irrelevant outlines.
Object Detection

According to the image, object detection can recognize:

● Objects: e.g., tree, bench, deck, carousel

● Scenes/Locations: amusement park, Cedar Point,


Lake Erie

● Activities: people waiting in line, people sitting on ride

● Structures/Rides: Ferris wheel, The Wicked Twister,


maxair

● Text & Signs: 12 E

● Motions/Emotions: pedestrians walking, waiting


Object Detection

Approach #2: Network-based (Deep


Approach #1: Feature-based
Learning)
● This is a more traditional method.
● This is the modern method using neural
● It works by:
networks.
○ Finding specific features in an image (like
● Steps:
edges, corners).
○ Train a deep network with many
○ Modeling objects based on these labeled images (with objects
features. marked).

○ Searching for matching patterns in the ○ The model learns to recognize


image and locate objects on its own.

Think of it like finding pieces of a puzzle and Think of it like teaching a child with lots of
matching them. examples until they learn to recognize
things themselves.
Motion Detection with
Computer Vision
Motion detection is a crucial area in computer vision. It involves identifying
movement in a scene over time.
Core CV Models for Motion Detection

Deep Learning
Object detection plus tracking across frames.

Optical Flow
Estimates motion vectors between frames.
Several CV models excel at motion detection.

Background Subtraction
Models static backgrounds to detect deviations.
Frames
Key Challenges in Motion Detection
Several factors complicate motion detection.

Illumination Changes

Occlusion

Dynamic Backgrounds
Depth Estimation and Motion
Detection
Depth data enhances motion understanding.

Stereo Vision RGB-D Sensors


Uses triangulation and disparity Provides motion analysis using
maps. depth.

Deep Learning
Monocular depth prediction.
Intro to Mediapipe :

Mediapipe offers cross-platform,


customizable ML solutions for live
and streaming media.

Face Object Pose


detection detection detection
MediaPipe Python Framework:

Mediapipe key features :


● Modularity
● Pre-built Solutions
● Cross-Platform:
● Real-Time Performance:

To install the module :


Pip install mediapipe
MediaPipe Syntax:

● Mp.solutions.hands :This is the MediaPipe module that contains all hand-related processing
solutions.
● The Hands class is the main hand detection and tracking model.
MediaPipe Syntax:
● hands.process() is a core method that analyzes images/frames to detect and track hands.

Hand detection in the image,Landmark detection (21 points per hand),Hand type classification (left/right),3D pose
estimation

Returns all this information in a structured format


MediaPipe Syntax:
To access the landmarks
Why do we need both mediapipe and cv :

● MediaPipe handles the core ML-based hand detection and landmark estimation, OpenCV
provides the essential utilities for getting the video input, performing basic image manipulations,
drawing custom annotations (like gesture names), and displaying the final output video stream .

● MediaPipe tells you what the gesture is (or gives you the landmarks to figure it out), but OpenCV
helps you display that information visually on the screen.
Intro to YOLO :

YOLO (You Only Look Once): Object


a popular object detection and image
segmentation model, was developed detection
by Joseph Redmon and Ali Farhadi at
the University of Washington.

Image Pose
classification detection
Mediapipe vs YOLO :
Model flow:
What to do next?

● Click the link here to access the code, or use the one sent in the WhatsApp group.

● Ensure to do the following steps – Install Python 3.10.11 , Unzip the code, Open the folder in VS Code, Download using pip, run the code.

● Incase of any queries, feel free to contact any one of the ML seniors.

You might also like