Hello everyone. This is going to be a tutorial on flag detection usingCNN. This article is titled “Fifa worldcup 2018 Round of 16 flag detection using CNN” and in this tutorial, we will learn how to prepare dataset and train CNN to classify images. With the Fifa worldcup being the buzzword everywhere right now, in this tutorial, we are going to implement CNN for a rather interesting problem. We are going to use CNN to classify flags of countries that qualified for Fifa Worldcup 2018 round of 16. The list of countries (in alphabetical order) being:
The key thing to understand here is that the model we are building now can be trained on any type of class or any number of labels you want. We are using flags right now only to keep things interesting. For example, if there are any doctors reading this, after completing this article they will be able to build and train neural networks that can take a brain scan as an input and predict if the scan contains which type of tumor. Or if there are any botanist reading this, after completing this article, they will be able to build and train neural networks that can take an image of a leaf as an input and predict which type of plant it is. The possibilities are endless limited only by your imagination. So, let’s begin.
We are going to be using Keras library on top of tensorflow for building our CNN model. Keras is a high-level neural networks API, written in Python and capable of running on top of TensorFlow, CNTK, or Theano. It was developed with a focus on enabling fast experimentation. However, for our purpose, we will be using tensorflow backend on python 3.6. We already have a bunch of tutorials on tensorflow so if you want to check those, you can follow this link.
Let’s begin. First of all, we will need to gather training image dataset i.e., the image for flags of the countries. For this, we will use a chrome extension called Bulk image downloader. It allows us to download multiple images at once. Add the extension to chrome. Afterward, simply, search for the image you need (Eg: Spain National Flag) on google image search and click on the bulk image downloader icon at the top right-hand corner. Click on “Current tab” and select the images that you want to download. Repeat this for all 16 teams. A sample image dataset can be downloaded from this google drive link. It contains roughly 35-40 images for flags of each team. It is recommended to have a higher number of input data for the training set but as we will later see, even this amount is enough for our purpose.
Without further ado, let’s get started. First, let’s create a virtual environment and install all the necessary dependencies.
Create Virtualenv and install necessary dependencies
Why python virtual environment is needed has already been discussed in this another post here so I’m not going to do that here. Let’s get started with how we can set up virtualenv and install necessary dependencies in python 3.6. The easiest way is installing through python pip package. To install virtualenv through pip, simply type:
pip3 install --upgrade virtualenv
Once the virtualenv is installed, you can create separate virtual environments for each of your projects. Simply go to the project directory and type:
You will see a message in your terminal like:
Installing setuptools, pip, wheel…done.
In a newly created virtualenv there will be an activate shell script. This resides in /bin/, so you can run:
Now, we are ready to install necessary dependencies. The list of dependencies we will be needing for our project are as follows:
- tensorflow (1.5.0)
- Keras (2.1.4)
- OpenCV (3.4.1)
- sklearn (0.19.1)
You can install these all at the same time using the command:
pip3 install tensorflow keras opencv-python sklearn
Computation is much faster if you have a GPU but you’ll need to use GPU version of tensorflow. If you plan on using tensorflow-gpu instead, you can follow our other article here to learn how to install it.
Our other required dependencies such as scipy, numpy etc. should automatically be installed while installing these dependencies.
Introduction to Convolutional Neural Network (CNN)
Now, we are ready to build a Convolutional Neural Network (CNN) to classify MNIST handwritten digits. But first, we must understand what a CNN is. We will only be covering the basic theory of CNN in this article. I highly recommend you refer to materials of course CS231n, if you want a deeper understanding of how CNN works.
In machine learning, a Convolutional Neural Network (CNN, or ConvNet) is a class of deep, feed-forward artificial neural networks that have successfully been applied to analyzing visual imagery. Convolutional Neural Networks are a type of neural network that makes the explicit assumption that the inputs are images, which allows us to encode certain properties into the architecture. There are three main types of layers to build ConvNet architectures: Convolutional Layer, Pooling Layer, and Fully-Connected Layer. We will stack these layers to form a full ConvNet architecture.
Image source: Wikipedia
CONV layer will compute the output of neurons that are connected to local regions in the input, each computing a dot product between their weights and a small region they are connected to in the input volume, POOL layer will perform a downsampling operation along the spatial dimensions (width, height) and FC (i.e. fully-connected) layer will compute the class scores, ( resulting in volume of size [1x1x16] in our case), where each of the 16 numbers correspond to a class score, such as among the 16 categories of the flag labels. All this may seem very confusing to you right now. So I highly recommend you refer to materials of course CS231n if you want a deeper understanding of how CNN works. However, for now, all we need to understand is that CNNs are one of the best available tools for machine vision and we will be using it for our purpose for classification of Fifa worldcup 2018 Round of 16 flags.