I was genuinely curious about the AI hype and wanted hands-on experience with cutting-edge computer vision. When I tested pre-trained YOLOv8 models on my desk objects, the results were laughably bad - monitors being classified as refrigerators, mechanical pencils as toothbrushes! That's when I decided to build something from the ground up. The project became this fascinating journey of recording 5 training videos from different angles (birds-eye, front, horizontal, left, right), manually labeling 351 images by hand, and implementing a complete pipeline from JSON annotations to YOLO format. Training for 84 epochs over 4.1 hours was like watching my model learn in real-time - seeing the loss curves converge and the fitness score reach 0.7944 was incredibly satisfying. The most exciting moment was when I finally got all 9 objects detected in my test video after tweaking the confidence threshold from 0.3 to 0.1. It's amazing how much you can accomplish with just 351 well-curated images and a systematic approach. This project taught me that domain-specific training isn't just a buzzword - it's the difference between a model that thinks your monitor is a refrigerator and one that can actually find your lost AirPods!
clutchdev.apps@gmail.com
949-910-7879
© 2025 Clutch Studio. All rights reserved.