Object Detection using YOLOv3

YOLOv3 is the latest variant of a popular object detection algorithm YOLO — You Only Look Once. The published model recognizes 80 different objects in images and videos, but most importantly it is super fast and nearly as accurate as Single Shot MultiBox (SSD).

Starting with OpenCV, you can easily use YOLOv3 models in your own OpenCV application.

How does YOLO work ?

We can think of an object detector as a combination of a object locator and an object recognizer.

In traditional computer vision approaches, a sliding window was used to look for objects at different locations and scales. Because this was such an expensive operation, the aspect ratio of the object was usually assumed to be fixed.

Early Deep Learning based object detection algorithms like the R-CNN and Fast R-CNN used a method called Selective Search to narrow down the number of bounding boxes that the algorithm had to test.

Another approach called Overfeat involved scanning the image at multiple scales using sliding windows-like mechanisms done convolutionally.

This was followed by Faster R-CNN that used a Region Proposal Network (RPN) for identifying bounding boxes that needed to be tested. By clever design the features extracted for recognizing objects, were also used by the RPN for proposing potential bounding boxes thus saving a lot of computation.

YOLO on the other hand approaches the object detection problem in a completely different way. It forwards the whole image only once through the network. SSD is another object detection algorithm that forwards the image once though a deep learning network, but YOLOv3 is much faster than SSD while achieving very comparable accuracy. YOLOv3 gives faster than realtime results on a M40, TitanX or 1080 Ti GPUs.

Lets see how YOLO detects the objects in a given image.

First, it divides the image into a 13×13 grid of cells. The size of these 169 cells vary depending on the size of the input. For a 416×416 input size that we used in our experiments, the cell size was 32×32. Each cell is then responsible for predicting a number of boxes in the image.

For each bounding box, the network also predicts the confidence that the bounding box actually encloses an object, and the probability of the enclosed object being a particular class.

Most of these bounding boxes are eliminated because their confidence is low or because they are enclosing the same object as another bounding box with very high confidence score. This technique is called non-maximum suppression.

The authors of YOLOv3, Joseph Redmon and Ali Farhadi, have made YOLOv3 faster and more accurate than their previous work YOLOv2. YOLOv3 handles multiple scales better. They have also improved the network by making it bigger and taking it towards residual networks by adding shortcut connections.




BackEnd Dev, Django

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Are Singaporeans negative Nancies? A sentiment analysis of social media comments using BERT

Almighty Opensource project about machine learning you should try out

Exploring the power of transferred learning in Pytorch

Wine Classifier Using Supervised Learning with 98% Accuracy

Deep Learning-based Visual Odometry and SLAM

Introduction to image recognition

Understanding the Basics of Digital Image Processing and Computer Vision using OpenCV

Reconstructing High Altitude Data From Low-Earth-Orbit Data Using Machine Learning

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Nima dorostkar

Nima dorostkar

BackEnd Dev, Django

More from Medium

Understanding how YOLO algorithm for object detection works

Real-Time Object Recognition From your Screen Desktop

Obstruction detection and tracking using OpenCV-Python

Simple YOLOv5 Part 1 : Deploy YOLOv5 on Windows