Inspiration
Parking garages rely heavily on hardware sensors, gated systems, or manual review to track vehicles entering and exiting. In practice, many garages already have cameras, but turning raw camera images into reliable vehicle tracking is far from trivial.
This project was inspired by a deceptively simple challenge. Given a shuffled list of parking garage images where each vehicle appears exactly twice. Once at entry and once at exit. Can we correctly identify which two images belong to the same vehicle?
At first glance, this sounds straightforward. In reality, the problem quickly becomes difficult when you see how similar vehicles look, how noisy the images are, and how unreliable any single signal can be. This challenge pushed me to think carefully about robust computer vision system design, not just model choice.
What I Built
I built an AI-powered vehicle re-identification pipeline that matches entry and exit images of the same car using a combination of computer vision techniques rather than relying on a single model or heuristic.
The core idea was to treat this as a global matching problem, not a subset or sequential one. Any image could match with any other image in the dataset, so the system needed to reason across the full image set.
At a high level, the pipeline works as follows:
Vehicle Detection and Cropping I first used YOLO-based object detection to crop vehicles from each image. This step was critical because full-frame embeddings were dominated by background features like garage walls and lighting rather than the vehicle itself.
Vehicle-Level Visual Embeddings For each cropped vehicle, I computed deep visual embeddings using a pretrained CNN. These embeddings captured high-level features such as shape, color distribution, and structural cues. This helped group visually similar cars together but was not sufficient on its own, especially for common colors like black or white.
License Plate Localization and OCR I then attempted to detect license plates and extract text using OCR. When plate text was successfully extracted, it became a strong signal for high-confidence matching. Exact plate matches were paired conservatively, while noisy or ambiguous plate readings were handled carefully.
Constraint-Based Pairing Instead of greedy matching, I applied strict constraints. One-to-one pairing, similarity thresholds, and mutual agreement rules. This reduced false positives and ensured that most accepted pairs were correct, even if it meant leaving some images unmatched.
The final system prioritized precision over recall, which is often the correct tradeoff in real-world operational systems.
Challenges Faced
This project surfaced several real-world challenges that are easy to underestimate:
Visual ambiguity: Many vehicles were nearly indistinguishable. Similar models, colors, and angles made pure visual similarity unreliable.
Noisy OCR: License plates were often blurry, partially occluded, or poorly lit. OCR results were inconsistent and required conservative filtering.
Background bias: Early experiments showed that embeddings were unintentionally learning the garage environment instead of the car. Cropping proved essential.
Scalability: Pairing thousands of images globally requires careful pruning and constraint design to avoid combinatorial explosion.
Each failure mode forced iterative refinement of the pipeline and reinforced the importance of system-level thinking over single-model accuracy.
What I Learned
This project taught me that robust vision systems are built, not trained.
Strong results came not from one powerful model, but from:
Careful preprocessing
Multiple weak-to-strong signals
Conservative decision rules
Visual validation at every stage
I also learned how to reason about uncertainty. Knowing when not to make a match is just as important as making one. This mindset is crucial for production-grade ML systems.

Log in or sign up for Devpost to join the conversation.