1 of 59

Deep Learning

Deep Learning Notes

Asmaa Mirkhan's notes (and codes) on deep learning

🎤 About

🕸 My notes about Artificial Neural Networks, Convolutional Neural Networks and Recurrent Neural Networks with theoretical details
🦋 I will share new details as I learn new concepts in this context

📑 Table of Contents

💉 Extensions

🚀 Other Version

Turkish version of this project is

🙌 Quote

"Your learning algorithm has two main sources of knowledge; one is the data and other is whatever you hand design" 🤔🚀

⭐ Please..

✨ Help me to improve and to increase the content by opening a pull request
👓 Tell me your suggestions by sending me an or opening an issue

👜 Contact & Support

Find me on and feel free to mail me,

Practical Tools

💼 Useful tools in the context of Deep Learning

👷‍♀️ Network Visualization Tool

Visualize the graph of the network
Netron ✨✨

💫 CNN Input / Output Visualization Tool

Watch the inputs and outputs of each layer in your CNN
Tensorspace 🎉

🖼️ OpenImages Downloading Tool

🚀 Download images by class
🔗 OID

🔗 Bulk Link Downloading Tool

💁‍♀️ Download bulk links by one click
👩‍💻 Google Chrome extension
⚓ Tab Save

Concepts of Neural Networks

👩‍🏫 Concepts of neural network with theoric details

Introduction

👩‍🏫 Concepts of neural network with theoric details

🔎 Definition

A neural network is a type of machine learning which models itself after the human brain. This creates an artificial neural network that via an algorithm allows the computer to learn by incorporating new data.

Neural networks are able to perform what has been termed deep learning. While the basic unit of the brain is the neuron, the essential building block of an artificial neural network is a perceptron which accomplishes simple signal processing, and these are then connected into a large mesh network.

📑 Types of NNs

There are many types of neural networks, choosing a type is due to the problem that we are trying to solve, for example

Type

Description

Application

👼 Standard NN

We input some features and estimate the output

Online Advertising, Real Estate

🎨 CNN

We add convolutions for feature extraction

Photo Tagging

🔃 RNN

Suitable for sequence data

Machine Translation, Speech Recognition

🤨 Custom NN / Hybrid

For complex problems

Autonomous Driving

🎨 Types of Data in Supervised Learning

🚧 Structured Data
- Such as tables
- We have input fields and an output field
🤹‍♂️ Unstructured Data
- Such as images, audio and texts
- We need to use feature extraction algorithms to build our model

🧐 References

Introduction to Artificial Neural Networks (ANN)

The Problem in General

Given a dataset like:

$[(x^{1},y^{1}), (x^{2},y^{2}), ...., (x^{m},y^{m})]$

We want:

$\hat{y}^{(i)} \approx y^{(i)}$

📚 Basic Concepts and Notations

Concept

Description

m

Number of examples in dataset

ith example in the dataset

ŷ

Predicted output

Loss Function 𝓛(ŷ, y)

A function to compute the error for a single training example

Cost Function 𝙹(w, b)

The average of the loss functions of the entire training set

Convex Function

A function that has one local value

Non-Convex Function

A function that has lots of different local values

Gradient Descent

An iterative optimization method that we use to converge to the global optimum of Cost Function

In other words: The Cost Function measures how well our parameters w and b are doing on the training set, so the best w and b are the values that minimize 𝙹(w, b) as possible

📉 Gradient Descent

General Formula:

$w:=w-\alpha\frac{dJ(w,b)}{dw}$

$b:=b-\alpha\frac{dJ(w,b)}{dw}$

α (alpha) is the Learning Rate

🥽 Learning Rate

It is a positive scalar determining the size of the step of each iteration of gradient descent due to the corresponded estimated error each time the model weights are updated, so, it controls how quickly or slowly a neural network model learns a problem.

🎀 Good Learning Rate

💢 Bad Learning Rate

🧐 References

Activation Functions

The main purpose of Activation Functions is to convert an input signal of a node in an ANN to an output signal by applying a transformation. That output signal now is used as a input in the next layer in the stack.

📃 Types of Activaiton Functions

📈 Linear Activation Function (Identity Function)

Formula:

Graph:

It can be used in regression problem in the output layer

🎩 Sigmoid Function

Formula:

Graph:

🎩 Tangent Function

Almost always strictly superior than sigmoid function

Formula:

Shifted version of the Sigmoid function 🤔

Graph:

Activation functions can be different for different layers, for example, we may use tanh for a hidden layer and sigmoid for the output layer

🙄 Downsides on Tanh and Sigmoid

If z is very large or very small then the derivative (or the slope) of these function becomes very small (ends up being close to 0), and so this can slow down gradient descent 🐢

🎩 Rectified Linear Activation Unit (Relu ✨)

Another and very popular choice

Formula:

Graph:

So the derivative is 1 when z is positive and 0 when z is negative

Disadvantage: derivative=0 when z is negative 😐

🎩 Leaky Relu

Formula:

Graph:

Or: 😛

🎀 Advantages of Relu's

A lot of the space of z the derivative of the activation function is very different from 0
NN will learn much faster than when using tanh or sigmoid

🤔 Why Do NNs Need non-linear Activation Functions

Well, if we use linear function then the NN is just outputting a linear function of the input, so no matter how many layers out NN has 🙄, all it is doing is just computing a linear function 😕

❗ Remember that the composition of two linear functions is itself a linear function

👩‍🏫 Rules For Choosing Activation Function

If the output is 0 or 1 (binary classification) ➡ sigmoid is good for output layer
For all other units ➡ Relu ✨

We can say that relu is the default choice for activation function

Note:

If you are not sure which one of these functions work best 😵, try them all 🤕 and evaluate on different validation set and see which one works better and go with that 🤓😇

🧐 Read More

👩‍🔧 NN Regularization

Preventing overfitting

Briefly: A technique to prevent overfitting -and reduce variance-

🙄 Problem

In over-fitting situation, our model tries to learn too well the details and the noise from the training data, which ultimately results in poor performance on the unseen data (test set).

The following graph describes better:

👩‍🏫 Better Definition for Regularization

It is a technique which makes slight modifications to the learning algorithm such that the model generalizes better. This in turn improves the model’s performance on the unseen data as well.

🔨 Regularization Techniques

🔩 L2 Regularization (Weight decay)

The most common type of regularization, given by following formula:

$J=Loss+\frac{\lambda}{2m}-\sum ||w||^{2}$

Here, lambda is the regularization parameter. It is the hyperparameter whose value is optimized for better results. L2 regularization is also known as weight decay as it forces the weights to decay towards zero (but not exactly zero)

🔩 Dropout

Another regularization method by eliminating some neurons in a specific ratio randomly

Simply: For each node of probability p, don’t update its input or output weights during backpropagation (Just drop it 😅)

Better visualiztion:

An NN before and after dropout

It is commonly used in computer vision, but its downside is that Cost function J is no longer well defined

🤡 Data Augmentation

The simplest way to reduce overfitting is to increase the size of the training data, it is not always possible since getting more data is too costly, but sometimes we can increase our data based on our data, for example:

Doing transformations on images can maximize our data set

🛑 Early Stopping

It is a kind of cross-validation strategy where we keep one part of the training set as the validation set. When we see that the performance on the validation set is getting worse, we immediately stop the training on the model. This is known as early stopping.

🧐 Read More

Long Story Short 😅: Overfitting and Regularization in Neural Networks

Softmax Regression

Multi class problems

We can learn it by likening it to logistic regression: 😋

Recall that logistic regression produces a decimal between 0 and 1.0. For example, a logistic regression output of 0.8 from an email classifier suggests an 80% chance of an email being spam and a 20% chance of it being not spam. Clearly, the sum of the probabilities of an email being either spam or not spam is 1.0.

Softmax extends this idea into the MULTI-CLASS world. That is, Softmax assigns decimal probabilities to each class in a multi-class problem. Those decimal probabilities must add up to 1.0.

Its other name is Maximum Entropy (MaxEnt) Classifier

We can say that softmax regression generalizes logistic regression

Logistic regression is a special status of softmax where C = 2 🤔

📚 Notation

C = number of classes = number of units of the output layer So, $\hat{y}_j$ is a (C, 1) dimensional vector.

🎨 Softmax Layer

Softmax is implemented through a neural network layer just before the output layer. The Softmax layer must have the same number of nodes as the output layer.

💥 Softmax Activation Function

$Softmax(x_i)\frac{exp(x_i)}{\sum_{j}exp(x_j)}$

🔨 Hard Max function

Takes the output of softmax layer and convert it into 1 vs 0 vector (as I called it 🤭) which will be our ŷ

For example:

t = 0.13  ==> ̂y = 0
    0.75          1
    0.01          0
    0.11          0

And so on 🐾

🔎 Loss Function

$L(\hat{y},y)=-\sum_{j=1}^{c}y_jlog(\hat{y}_j)$

Y and ŷ are (C,m) dimensional matrices 👩‍🔧

🧐 Read More

Long story short from Google documentation

🏃‍♀️ Introduction to Tensorflow

Brief Introduction to Tensorflow

🚩 Main flow of programs in Tensorflow

Create Tensors (variables) that are not yet executed/evaluated.
Write operations between those Tensors.
Initialize your Tensors.
Create a Session.
Run the Session. This will run the operations you'd written above.

To summarize, remember to initialize your variables, create a session and run the operations inside the session. 👩‍🏫

👩‍💻 Code Example

To calculate the following formula:

$loss=L(\hat{y},y)=(\hat{y}^{(i)}-y^{(i)})^2$

# Creating tensors and writing operations between them 
y_hat = tf.constant(36, name='y_hat')
y = tf.constant(39, name='y')
loss = tf.Variable((y - y_hat)**2, name='loss')

# Initializing tensors
init = tf.global_variables_initializer()

# Creating session
with tf.Session() as session: 
    # Running the operations
    session.run(init) 

    # printing results
    print(session.run(loss))

When we created a variable for the loss, we simply defined the loss as a function of other quantities, but did not evaluate its value. To evaluate it, we had to use the initializer.

❗ Değişken Başlatma (initalization) Hakkında Not

For the following code:

a = tf.constant(2)
b = tf.constant(10)
c = tf.multiply(a,b)
print(c)

🤸‍♀️ The output is

Tensor("Mul:0", shape=(), dtype=int32)

As expected, we will not see 20 🤓! We got a tensor saying that the result is a tensor that does not have the shape attribute, and is of type "int32". All we did was put in the 'computation graph', but we have not run this computation yet.

📦 Placeholders in TF

A placeholder is an object whose value you can specify only later. To specify values for a placeholder, we can pass in values by using a feed dictionary.
Below, a placeholder has been created for x. This allows us to pass in a number later when we run the session.

x = tf.placeholder(tf.int64, name = 'x')
print(sess.run(2 * x, feed_dict = {x: 3}))
sess.close()

🎀 More examples

Computing sigmoid function with TF

def sigmoid(z):
    """
    Computes the sigmoid of z

    Arguments:
    z -- input value, scalar or vector

    Returns: 
    results -- the sigmoid of z
    """

    # Creating a placeholder for x. Naming it 'x'.
    x =  tf.placeholder(tf.float32, name = 'x')

    # computing sigmoid(x)
    sigmoid = tf.sigmoid(x)

    # Creating a session, and running it.
    with tf.Session() as sess:
        # Running session and call the output "result"
        result = sess.run(sigmoid, feed_dict = {x: z})

    return result

Computing cost function with TF

def cost(logits, labels):
    """
    Computes the cost using the sigmoid cross entropy

    Arguments:
    logits -- vector containing z, output of the last linear unit (before the final sigmoid activation)
    labels -- vector of labels y (1 or 0) 

    Returns:
    cost -- runs the session of the cost function
    """

    # Creating the placeholders for "logits" (z) and "labels" (y)
    z = tf.placeholder(tf.float32, name = 'z')
    y = tf.placeholder(tf.float32, name = 'y')

    # Using the loss function
    cost = tf.nn.sigmoid_cross_entropy_with_logits(logits = z,  labels = y)

    # Creating a session
    sess = tf.Session()

    # Running the session 
    cost = sess.run(cost, feed_dict = {z: logits, y: labels})

    # Closing the session
    sess.close()

    return cost

🙋‍♀️ Hello World of Deep Learning with Neural Networks

Introduction

👩‍💻 Intro to Neural Networks Coding

Like every first app we should start with something super simple that gives us an idea about the whole methodology.

✨ What is Keras?

Keras is a high-level neural networks API, written in Python and capable of running on top of TensorFlow, CNTK, or Theano.

📚 Important Terms

Term

Description

Dense

A layer of neurons in a neural network

Loss Function

A mathematical way of measuring how wrong your predictions are

Optimizer

An algorithm to find parameter values which correspond to minimum value of loss function

👩‍🔬 The Simplest Neural Network

It contains one layer with one neuron.

👩‍💻 Code Example

# initialize the model
model = Sequential()

# add a layer with one unit and set the dimension of input 
model.add(Dense(units=1, input_shape=[1]))

# set functional properties and compile the model
model.compile(optimizer='sgd', loss='mean_squared_error'

After building out neural network we can feed it with our sample data 😋

👩‍💻 Code Example

xs = np.array([-1.0,  0.0, 1.0, 2.0, 3.0, 4.0], dtype=float)
ys = np.array([-3.0, -1.0, 1.0, 3.0, 5.0, 7.0], dtype=float)

Then we have to start training process 🚀

👩‍💻 Code Example

model.fit(xs, ys, epochs=500)

Every thing is done 😎 ! Now we can test our neural network with new data 🎉

👩‍💻 Code Example

print(model.predict([10.0]))

👩‍💻 My Code

Full source code is here 🐾
Tensorflow.js in browser here 🐾

🔃 Traditional Programming vs Machine Learning

🧐 References

CNNs In Browser

Notes on Implementing CNNs In The Browser

To implement our CNN based works in the Browser we need to use Tensorflow.JS 🚀

👷‍♀️ Workflow

🚙 Import Tensorflow.js
👷‍♀️ Create models
👩‍🏫 Train
👩‍⚖️ Do inference

🚙 Importing Tensorflow.js

We can import Tensorflow.js in the way below

    <script 
        src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs@latest">
    </script>

👷‍♀️ Creating The Model

😎 Same as we did in Python:

🐣 Decalre a Sequential object
👩‍🔧 Add layers
🚀 Compile the model
👩‍🎓 Train (fit)
🐥 Use the model to predict

// create sequential 
const model = tf.sequential();

// add layer(s)
model.add(tf.layers.dense({units: 1, inputShape: [1]}));

// set compiling parameters and compile the model
model.compile({loss:'meanSquaredError', 
                optimizer:'sgd'});

// get summary of the mdoel
model.summary();

// create sample data set
const xs = tf.tensor2d([-1.0, 0.0, 1.0, 2.0, 3.0, 4.0], [6, 1]);
const ys = tf.tensor2d([-3.0, -1.0, 2.0, 3.0, 5.0, 7.0], [6, 1]);

// train
doTraining(model).then(() => {
    // after training
    predict = model.predict(tf.tensor2d([10], [1,1]));
    predict.print();
});

([-1.0, 0.0, 1.0, 2.0, 3.0, 4.0], [6, 1])
[-1.0, 0.0, 1.0, 2.0, 3.0, 4.0]: Data set values
[6, 1]: Shape of input

👁‍🗨 Attention

🐢 Training is a long process so that we have to do it in an asynchronous function

async function doTraining(model){
  const history = 
  await model.fit(xs, ys, 
      { epochs: 500,
          callbacks:{
              onEpochEnd: async(epoch, logs) =>{
                  console.log("Epoch:" 
                      + epoch 
                      + " Loss:" 
                      + logs.loss);

              }
          }
      });
}

👩‍💻 Full Code

🐾 Here

Introduction to Computer Vision

🚪 Beginning to solve problems of computer vision with Tensorflow and Keras

Introduction

🚪 Beginning to solve problems of computer vision with Tensorflow and Keras

👗 What is MNIST?

The MNIST database: (Modified National Institute of Standards and Technology database)

🔎 Fashion-MNIST is consisting of a training set of 60,000 examples and a test set of 10,000 examples
🎨 Types:
- 🔢 MNIST: for handwritten digits
- 👗 Fashion-MNIST: for fashion
📃 Properties:
- 🌚 Grayscale
- 28x28 px
- 10 different categories
- Repo

📚 Important Terms

Term

Description

➰ Sequential

That defines a SEQUENCE of layers in the neural network

⛓ Flatten

Flatten just takes that square and turns it into a 1 dimensional set (used for input layer)

🔷 Dense

Adds a layer of neurons

💥 Activation Function

A formula that introduces non-linear properties to our Network

✨ Relu

An activation function by the rule: If X>0 return X, else return 0

🎨 Softmax

An activation function that takes a set of values, and effectively picks the biggest one

The main purpose of activation function is to convert a input signal of a node in a NN to an output signal. That output signal now is used as a input in the next layer in the stack 💥

💫 Notes on performance

Values in MNIST are between 0-255 but neural networks work better with normalized data, so we can divide every value by 255 so the values are between 0,1.
There are multiple criterias to stop training process, we can specify number of epochs or a threshold or both
- Epochs: number of iterations
- Threshold: a threshold for accuracy or loss after each iteration
- Threshold with maximum number of epochs

We can check the accuracy at the end of each epoch by Callbacks 💥

👩‍💻 My Codes

🧐 References

Concepts of Convolutional Neural Networks

Introduction

✨ Improving Neural Networks used in Computer Vision problems

This folder contains theoric details about CNNs

📚 Important Terms

💫 Notes on performance

Training speed of a CNN is too slower than plain NN because of its computational complexity 🐢

🧐 References

Common Concepts

📚 Important Terms

🎀 Convolution Example

🤔 How did we find -7?

We did element wise product then we get the sum of the result matrix; so:

And so on for other elements 🙃

👼 Visualization of Calculation

🔎 Edge Detection

An application of convolution operation

🔎 Edge Detection Examples

Result: horizontal lines pop out

Result: vertical lines pop out

🙄 What About The Other Numbers

There are a lot of ways we can put number inside elements of the filter.

For example Sobel filter is like:

Scharr filter is like:

Prewitt filter is like:

So the point here is to pay attention to the middle row

And Roberts filter is like:

✨ Another Approach

We can tune these numbers by ML approach; we can say that the filter is a group of weights that:

By that we can get -learned- horizontal, vertical, angled, or any edge type automatically rather than getting them by hand.

🤸‍♀️ Computational Details

If we have an n*n image and we convolve it by f*f filter the the output image will be n-f+1*n-f+1

😐 Downsides

🌀 If we apply many filters then our image shrinks.
🤨 Pixels at corners aren't being touched enough, so we are throwing away a lot of information from the edges of the image .

💡 Solution

We can the image 💪

🧐 References

Advanced Concepts

Important Terms

🙌 Padding

Adding an additional one border or more to the image so the image is n+2 x n+2 and after convolution we end up with n x n image which is the original size of the image

p = number of added borders

For convention: it is filled by 0

🤔 How much to pad?

For better understanding let's say that we have two concepts:

🕵️‍♀️ Valid Convolutions

It means no padding so:

n x n * f x f ➡ n-f+1 x n-f+1

🥽 Same Convolutions

Pad so that output size is the same as the input size.

So we want that 🧐:

n+2p-f+1 = n

Hence:

p = (f-1)/2

For convention f is chosen to be odd 👩‍🚀

👀 Visualization

🔢 Strided Convolution

Another approach of convolutions, we calculate the output by applying filter on regions by some value s.

👀 Visualization

🤗 To Generalize

For an n x n image and f x f filter, with p padding and stride s; the output image size can be calculated by the following formula

🚀 Convolutions Over Volume

To apply convolution operation on an RGB image; for example on 10x10 px RGB image, technically the image's dimension is 10x10x3 so we can apply for example a 3x3x3 filter or fxfx3 🤳

Filters can be applied on a special color channel 🎨

👀 Visualization

🤸‍♀️ Multiple Filters

🎨 Types of Layer In A Convolutional Network

👩‍🏫 Usually when people report number of layers in an NN they just report the number of layers that have weights and params
Convention: CONV1 + POOL1 = LAYER1

🤔 Why Convolotions?

Better performance since they decrease the parameters that will be tuned 💫

🧐 References

Visualization

Visualization of concepts explained in P1 and P2 to wrap them up 👩‍🎓

💫 Convolution

Applying a filter to extract features 🤗

Problem 😰: Images are shrinking 😱

😏 Take A Look At Padding

Images Are Too Large, Performance is Down 😔

😉 Let's See Pooling

🙄 Well, I have an RGB image

Filters must have depth that is equal to number of color channels

🤡 Ok, now I want to apply `n` filters

Depth of the output will be equal to n

🤗 Check Your Understanding With A Full Example

🧐 References

DeepLearning series: Convolutional Neural Networks (😍✨✨✨)

Classic Networks

🔢 LeNet-5

LeNet-5 is a very simple network - By modern standards -. It only has 7 layers;

among which there are 3 convolutional layers (C1, C3 and C5)
2 sub-sampling (pooling) layers (S2 and S4)
1 fully connected layer (F6)
Output layer

👀 Visualization of the network

🙌 Summary of the network

🛸 AlexNet

Too similar to LeNet-5
It has more filters per layer
It uses ReLU instead of tanh
SGD with momentum
Uses dropout instead of regularaization

👀 Visualization of the network

🔎 More Detailed

🙌 Summary of the network

🌱 VGG-16

👀 Visualization of the network

🙌 Summary of the network

🔎 More Detailed

😐 Drawbacks

It is painfully slow to train (It has 138 million parameters 🙄)

👩‍🔧 Implementation

🧐 Read More

Other Approaches

🔄 Residual Networks

🙄 Problem

During each iteration of training a neural network, all weights receive an update proportional to the partial derivative of the error function with respect to the current weight. If the gradient is very small then the weights will not be change effectively and it may completely stop the neural network from further training 🙄😪. The phenomenon is called vanishing gradients 🙁

Simply 😅: we can say that the data is disappearing through the layers of the deep neural network due to very slow gradient descent

The core idea of ResNet is introducing a so-called identity shortcut connection that skips one or more layers, like the following

🙌 Plain Nets vs ResNets

👀 Visualization

🤗 Advantages

Easy for one of the blocks to learn an identity function
Can go deeper without hurting the performance
- In the Plain NNs, because of the vanishing and exploding gradients problems the performance of the network suffers as it goes deeper.

1️⃣ One By One Convolutions

Propblem (Or motivation 🤔)

We can reduce the size of inputs by applying pooling and various convolution, these filteres can reduce the height and the width of the input image, what about color channels 🌈, in other words; what about the depth?

🤸‍♀️ Solution

We know that the depth of the output of a CNN is equal to the number of filters that we applied on the input;

In the example above, we applied 2 filters, so the output depth is 2

How can we use this info to improve our CNNs? 🙄

Let's say that we have a 28x28x192 dimensional input, if we apply 32 filters at 1x1x192 dimension and padding our output will become 28x28x32 ✨

🧐 Read More

👩‍💻 Works and Notes on CNNs

🔦 Convolutional Neural Networks Codes

Introduction

🔦 Convolutional Neural Networks Codes

This section will be filled by codes and notes gradually

👩‍💻 Codes

👶 Basic CNNs
👀 CNN Visualization
👨‍👩‍👧‍👧 Human vs Horse Classifier with CNN
🐱 Dog vs Cat Classifier with CNN
🎨 Multi-Class Classification
🌐 Tensorflow.js based hand written digit recognizer

✋ RPS Dataset

Rock Paper Scissors is an available dataset containing 2,892 images of diverse hands in Rock/Paper/Scissors poses.
Rock Paper Scissors contains images from a variety of different hands, from different races, ages and genders, posed into Rock / Paper or Scissors and labelled as such.

🔎 All of this data is posed against a white background. Each image is 300×300 pixels in 24-bit color

🐛 CNN Debugging

We can get info about our CNN by

model.summary()

And the output will be like:

Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_18 (Conv2D)           (None, 26, 26, 64)        640       
_________________________________________________________________
max_pooling2d_18 (MaxPooling (None, 13, 13, 64)        0         
_________________________________________________________________
conv2d_19 (Conv2D)           (None, 11, 11, 64)        36928     
_________________________________________________________________
max_pooling2d_19 (MaxPooling (None, 5, 5, 64)          0         
_________________________________________________________________
flatten_9 (Flatten)          (None, 1600)              0         
_________________________________________________________________
dense_14 (Dense)             (None, 128)               204928    
_________________________________________________________________
dense_15 (Dense)             (None, 10)                1290      
=================================================================

👩‍💻 For code in the notebook:

Here 🐾

🔎 The original dimensions of the images were 28x28 px
1️⃣ 1st layer: The filter can not be applied on the pixels on the edges
- The output of first layer has 26x26 px
2️⃣ 2nd layer: After applying 2x2 max pooling the dimensions will be divided by 2
- The output of this layer has 13x13 px
3️⃣ 3rd layer: The filter can not be applied on the pixels on the edges
- The output of this layer has 11x11 px
4️⃣ 4th layer: After applying 2x2 max pooling the dimensions will be divided by 2
- The output of this layer has 5x5 px
5️⃣ 5th layer: The output of the previous layer will be flattened
- This layer has 5x5x64=1600 units
6️⃣ 6th layer: We set it to contain 128 units
7️⃣ 7th layer: Since we have 10 categories it consists of 10 units

😵 😵

👀 Visualization

The visualization of the output of each layer is available here 🔎

👷‍♀️ Network Visualization Tool

Netron ✨✨

🧐 References

Popular Strategies of Deep Learning

Introduction

🥽 Popular Strategies Used In the Context of Deep Learning

📚 Popular Terms

Term

Description

🚙 Transfer Learning

Learning form one task and applying knowledge to seperate tasks 🛰🚙

➰ Multi-Task Learning

Starting simultaneously trying to have one NN do several things at same time and then each of these tasks helps all of the other tasks 🚀

🏴 End to End Deep Learning

Breaking the big task into sub smaller tasks with the same NN ✂

👩‍💻 My Codes

🚙 Transfer Learning, Dog vs Cat 🐶🐱

👷‍♀️ Network Visualization Tool

Netron ✨✨

Transfer Learning

Applying a knowledge to separate tasks

In short: Learning from one task and applying knowledge to separate tasks 🛰🚙

❓ What is Transfer Learning?

🕵️‍♀️ Transfer learning is a machine learning technique where a model trained on one task is re-purposed on a second related task.
🌟 In addition, it is an optimization method that allows rapid progress or improved performance when modeling the second task.
🤸‍♀️ Transfer learning only works in deep learning if the model features learned from the first task are general.

Long story short: Rather than training a neural network from scratch we can instead download an open-source model that someone else has already trained on a huge dataset maybe for weeks and use these parameters as a starting point to train our model just a little bit more with the smaller dataset that we have ✨

💫 Traditional ML vs Transfer Learning

🙄 Problem

Layers in a neural network can sometimes end up having similar weights and possible impact each other leading to over-fitting. With a big complex model it's a risk. So if you can imagine the dense layers can look a little bit like this.

We can drop out some neurons that has similar weights with neighbors, so that overfitting is being removed.

🔃 Comparison

🤸‍♀️ An NN before and after dropout

✨ Accuracy before and after dropout

🤔 When is it practical?

It is practical when we have a lot of data for problem that we are transferring from and usually relatively less data for the problem we are transferring to 🕵️‍

More accurately:

For task A to task B, it is sensible to do transfer learning from A to B when:

🚩 Task A and task B have the same output x
⭐ We have a lot more data for task A than task B
🔎 Low level features from task A could be helpful for learning task B

🧐 References

Other Strategies

Other Strategies of Deep Learning

➰ Multi-Task Learning

In short: We start simultaneously trying to have one NN do several things at same time and then each of these tasks helps all of the other tasks 🚀

In other words: Let's say that we want to build a detector to detect 4 classes of objects, instead of building 4 NN for each class, we can build one NN to detect the four classes 🤔 (The output layer has 4 units)

🤔 When Is It Practical?

🤳 Training on a set of tasks that could benefit from having shared lower level features
⛱ Amount of data we have for each task is quite similar (sometimes) ⛱
🤗 Can train a big enough NN to do well on all the tasks (instead of building a separate network fır each task)

👓 Multi task learning is used much less than transfer learning

👀 Visualization

🏴 End to End Deep Learning

Briefly, there have been some data processing systems or learning systems that requires multiple stages of processing,
End to end learning can take all these multiple stages and replace it with just a single NN

👩‍🔧 Long Story Short: breaking the big task into sub smaller tasks with the same NN

➕ Pros:

🦸‍♀️ Shows the power of the data
✨ Less hand designing of components needed

➖ Cons:

💔 May need large amount of data
🔎 Excludes potentially useful hand designed components

🚩 Guideline to Make Decision to Use It

Key question: do you have sufficient data to learn a function of the complexity needed to map x to y?

🔃 End to End Learning vs Transfer Learning

Image Augmentation

🤸‍♀️ Notes on Applied Machine Learning

Introduction

👷‍♀️ Guidelines for Structuring Machine Learning Projects

👩‍🎓 Orthogonalisation

One of the challenges with building machine learning systems is that there are so many things we could try. Including, for example, so many hyperparameters we could tune. The art of knowing what parameter to tune to get what effect, is called orthogonalisation.

What should we pay attention to while evaluating an ML project? How to optimize it? How to speed up? Since there are a lot of parameters how to know where to fix and which parameter to tune? 🤔🤕

Before answering these questions let's take a look at the whole process 🧐

⛓ Chain of assumptions in ML

The model should:

Fit training set well on cost function (Human level performance ❌❌)

⬇

Fit dev set well on cost function

⬇

Fit test set well on cost function

⬇

Perform well in real world ✨

Figuring out what is exactly wrong can help us to choose a suitable solution and then to fix that part without affecting the whole project 👩‍🔧

👩‍🔧 Notes on Structuring Machine Learning Projects

Make your training procedure more effective

✨ How to effectively set up evaluation metrics?

While looking to precesion P and recall R (for example) we may be not able to choose the best model correctly
- So we have to create a new evaluation metric that makes a relation between P and R
- Now we can choose the best model due to our new metric 🐣
- For example: (as a popular associated metric) F1 Score is:

To summarize: we can construct our own metrics due to our models and values to be able to get the best choice 👩‍🏫

📚 Types of Metrics

For better evaluation we have to classify our metrics as the following:

Technically, If we have N metrics we have to try to optimize 1 metric and to satisfice N-1 metrics 🙄

🙌 Clarification: we tune satisficing metrics due to a threshold that we determine

🚀 How to set up datasets to maximize the efficiency

It is recommended to choose the dev and test sets from the same distribution, so we have to shuffle the data randomly and then split it.
As a result, both test and dev sets have data from all categories ✨

👩‍🏫 Guideline

We have to choose a dev set and test set - from same distribution - to reflect data we expect to get in te future and consider important to do well on

🤔 How to choose the size of sets

If we have a small dataset (m < 10,000)
- 60% training, 20% dev, 20% test will be good
If we have a huge dataset (1M for example)
- 99% trainig, %1 dev, 1% test will be acceptable
  And so on, considering these two statuses we can choose the correct ratio 👮‍

🙄 When to change dev/test sets and metrics

Guideline: if doing well on metric + dev/test set and doesn't correspond to doing well in the real world application, we have to change our metric and/or dev/test set 🏳

👩‍🏫 Implementation Guidelines

Implementation guidelines and error anlysis

📚 Common Terms

I did my best, my project is still doing bad, what shall I do? 😥

Well, in this stage we have a criteria, is your model doing worse than humans (Because humans are quite good at a lot of tasks 👩‍🎓)? If yes, you can:

👩‍🏫 Get labeled data from humans
👀 Gain insight from manual error analysis; (Why did a person get this right? 🙄)
🔎 Better analysis of bias / variance 🔍

🤔 Note: knowing how well humans can do on a task can help us to understand better how much we should try to reduce bias and variance

🧐 Is your model doing better than humans?

Processes are less clear 😥

Suitable techniques will be added here

🤓 Study case

Let's assume that we have these two situations:

Even though training and dev errors are same we will apply different tactics for better performance

In Case1, We have High Bias so we have to focus on bias reduction techniques 🤔, in other words we have to reduce the difference between training and human errors the avoidable error
- Better algorithm, better NN structure, ......
In Case2, We have High Variance so we have to focus on variance reduction techniques 🙄, in other words we have to reduce the difference between training and dev errors
- Adding regularization, getting more data, ......

We call this procedure of analysis Error analysis 🕵️‍

👀 Error Types Visualization

In computer vision issues, human-level-error ≈ bayes-error because humans are good in vision tasks

🤗 Problems that ML surpasses human level performance

Online advertising
Product recommendations
Logistics
Loan approvals
.....

🤸‍♀️ It is recommended to

When we have a new project it is recommended to produce an initial model and then iterate over it until you get the best model, this is more practical than spending time building model theoretical and thinking about the best hyperparameter -which is almost impossible 🙄-

So, just don't overthink! (In both ML problems and life problems 🤗🙆‍)

🕵️‍♀️ Basics of Object Detection

🕵️‍♀️ Popular Object Detection Techniques

Introduction

🕵️‍♀️ Popular Object Detection Techniques

📚 Common Terms

📑 More Detailed

✨ Popular Detection CNNs

R-CNN (Regional Based Convolutional Neural Networks)
Fast R-CNN (Regional Based Convolutional Neural Networks)
Faster R-CNN (Regional Based Convolutional Neural Networks)
RFCN (Region Based Fully Connected Convolutional Neural Networks)
YOLO (You Only Look Once)
- YOLO V1
- YOLO V2
- YOLO V3
SSD (Single Shot Detection)

🤸‍♀️ Object Detection Series

SSD and YOLO

Single Shot Detectors and You Only Look Once

😉 You Only Look Once

💥 The approach involves a single neural network trained end to end
- It takes an image as input and predicts bounding boxes and class labels for each bounding box directly.
😕 The technique offers lower predictive accuracy (e.g. more localization errors) Compared with region based models
➗ YOLO divides the input image into an S×S grid. Each grid cell predicts only one object

👷‍♀️ Long Story Short: The system divides the input image into an S × S grid. If the center of an object falls into a grid cell, that grid cell is responsible for detecting that object.

🎀 Advantages

🚀 Speed
🤸‍♀️ Feasible for real time applications

🙄 Disadvantages

😕 Poor performance on small-sized objects
- It tends to give imprecise object locations.

TODO: Compare versions of YOLO

🤸‍♀️ SSD

💥 Predicts objects in images using a single deep neural network.
🤓 The network generates scores for the presence of each object category using small convolutional filters applied to feature maps.
✌ This approach uses a feed-forward CNN that produces a collection of bounding boxes and scores for the presence of certain objects.
❗ In this model, each feature map cell is linked to a set of default bounding boxes

👩‍🏫 Details

🖼️ After going through a certain of convolutions for feature extraction, we obtain a feature layer of size m×n (number of locations) with p channels, such as 8×8 or 4×4 above.
- And a 3×3 conv is applied on this m×n×p feature layer.
📍 For each location, we got k bounding boxes. These k bounding boxes have different sizes and aspect ratios.
- The concept is, maybe a vertical rectangle is more fit for human, and a horizontal rectangle is more fit for car.
💫 For each of the bounding boxes, we will compute c class scores and 4 offsets relative to the original default bounding box shape.

🤓 Long Story Short

The SSD object detection algorithm is composed of 2 parts:

Extract feature maps
Apply convolution filters to detect objects.

🕵️‍♀️ Evaluation

Better accuracy compared to YOLO
Better speed compared to Region based algorithms

👀 Visualization

🚫 SSD vs YOLO

🧐 References

Model Debugging

🙄 Problems that we can face while training custom object detection

Model is not doing well on test set
Model is doing well on test set but doing bad on real world images

In case that model is not doing well on test set you can try one or more from the followings:

Add dropout to .config file

Replace fixed_shape_resizer with keep_aspect_ratio_resizer, example:

👮‍♀️ You have to choose these values due to your model

Sequence Models In Deep Learning

Introduction

⛓ ‍Basics of Sequence Models

⛓ Sequence Models In General

Sequences are data structures where each example could be seen as a series of data points, for example 🧐:

Since we have labeled data X and Y so all of these tasks are addressed as Supervised Learning 👩‍🏫
Even in Sequence-to-Sequence tasks lengths of input and output can be different ❗

🤔 Why Do We Need Sequence Models?

Machine learning algorithms typically require the text input to be represented as a fixed-length vector 🙄
Thus, to model sequences, we need a specific learning framework able to:
- ✔ Deal with variable-length sequences
- ✔ Maintain sequence order
- ✔ Keep track of long-term dependencies rather than cutting input data too short
- ✔ Share parameters across the sequence (so not re-learn things across the sequence)

👩‍💻 My Codes

General Concepts

General Concepts of Sequence Models

👩‍🏫 Notation

In the context of text processing (e.g: Natural Language Processing NLP)

🚀 One Hot Encoding

A way to represent words so we can treat with them easily

🔎 Example

Let's say that we have a dictionary that consists of 10 words (🤭) and the words of the dictionary are:

Car, Pen, Girl, Berry, Apple, Likes, The, And, Boy, Book.

Our $$X^{(i)}$$ is: The Girl Likes Apple And Berry

So we can represent this sequence like the following 👀

By representing sequences in this way we can feed out data to neural networks ✨

🙄 Disadvantage

If our dictionary consists of 10,000 words so each vector will be 10,000 dimensional 🤕
This representation can not capture semantic features 💔

Mixed Info On NLP

Mixed Info On Natural Language Processing

🤸‍♀️ Applications

🔠 Neural Machine Translation

A machine translation model is similar to a language model except it has an encoder network placed before.
It is sometimes referred as a conditional language model.

🕵️‍♀️ Neural Machine Translation with Attention

If you had to translate a book's paragraph from French to English, you would not read the whole paragraph, then close the book and translate 😅
Even during the translation process, you would read/re-read and focus on the parts of the French paragraph corresponding to the parts of the English you are writing down 🤔
The attention mechanism tells a Neural Machine Translation model where it should pay attention to at any step 👩‍🏫

🔊 Speech Recognition

Converting an audio (x-input) to text (y-output)
- By measuring air pressure 🙄
Sequence-to-Sequence model

TODO: Add details

NLP

Introduction

Under development 🚧

👶The Growth of NLP

Rule based systems
Probabilistic systems
End to end systems

Applied NLP

🙌🏻 Handling texts

Handling texts using Python's built-in functions

📕 Notebooks

💠 Python built-in functions

📏 Length of a string

🔢 Number of characters

text = "Beauty always reserved in details, don't let the big picture steal your attention!"
len(text)
# 82

🧾 Number of words

text = "Beauty always reserved in details, don't let the big picture steal your attention!"
words = text.split(' ')
len(words)
# 13

4️⃣ Getting words have length greater than 4

text = "Beauty always reserved in details, don't let the big picture steal your attention!"
words = text.split(' ')
moreThan4 = [w for w in words if len(w) > 4]
# ['Beauty', 'always', 'reserved', 'details,', "don't", 'picture', 'steal', 'attention!']

🎒 Words properties

🔠 Getting capitalized words

text = "Beauty Always reserved in details, Don't let the big picture steal your attention!"
words = text.split(' ')
capitalized = [w for w in words if w.istitle()]
# ['Beauty', 'Always']
# "Don't" is not found 🙄

🔚 Getting words end with specific end

or specific start .startswith()

text = "You can hide whatever you want to hide but your eyes will always expose you, eyes never lie."
words = text.split(' ')
endsWithEr = [w for w in words if w.endswith('er')]
# ['whatever', 'never']

🐥 Upper and lower

"ESMA".isupper() # True
"Esma".isupper() # False
"esma".isupper() # False

"esma".islower() # True
"ESMA".islower() # False
"Esma".islower() # False

🤵 Membership test

'm' in 'esma' # True
'es' in 'esma' # True
'ed' in 'esma' # False

🕵️‍♀️ Unique Words

🔍 Case sensitive

text = "To be or not to be"
words = text.split(' ')
unique = set(words)
# {'be', 'To', 'not', 'or', 'to'}

✖️ 🔍 Ignore case

text = "To be or not to be"
words = text.split(' ')
unique = set(w.lower() for w in words)
# {'not', 'or', 'be', 'to'}

👮‍♀️ Checking Ops

Is Digit?

'17'.isdigit() # True
'17.7'.isdigit() # False

Is Alphabetic?

'esma'.isalpha() # True
'esma17'.isalpha() # False

Is alphabetic or number?

'17esma'.isalnum() # True
'17esma;'.isalnum() # False

🔤 String Ops

"Esma".lower() # esma
"Esma".upper() # ESMA
"EsmA".title() # Esma

🧵 Split & Join

Split due to specific character

text = "Beauty,Always,reserved,in,details,Don't,let,the,big,picture,steal,your,attention!"
words = text.split(',')
# ['Beauty', 'Always', 'reserved', 'in', 'details', "Don't", 'let', 'the', 'big', 'picture', 'steal', 'your', 'attention!']

Join by specific character

text = "Beauty,Always,reserved,in,details,Don't,let,the,big,picture,steal,your,attention!"
words = text.split(',')
joined = " ".join(words)
# Beauty Always reserved in details Don't let the big picture steal your attention!

Quick Visual Info

👀 Visual materials to give lots of information in short time

Materials will be divided into different files (or categories) as they increase 👮‍

📚 Types of Machine Learning

👓 Supervised Learning vs Unsupervised Learning

🕶 Machine Learning vs Deep Learning

🧠 Machine Learning Mind Map

Good Sources That Must Be Followed

Instagram AI Machine Learning

PDFs that I found and recommend

List of useful PDFs that I recommend

PDFs will be categorized as they increase 👩‍🔧

📂 Table of Contents

TensorFlow Object Detection API

Training Custom Object Detector Step by Step

🌱 Introduction

✨ Tensorflow object detection API is a powerful tool that allows us to create custom object detectors depending on pre-trained, fine tuned models even if we don't have strong AI background or strong TensorFlow knowledge.
💁‍♀️ Building models depending on pre-trained models saves us a lot of time and labor since we are using models that maybe trained for weeks using very strong machines, this principle is called Transfer Learning.
🗃️ As a data set I will show you how to use OpenImages data set and converting its data to TensorFlow-friendly format.
🎀 You can find this article on Medium too.

🤕 While you are applying the instructions if you get errors you can check out 🐞 Common Issues section at the end of the article

👩‍💻 Environment Preparation

🔸 Environment Info

💻 Platform

🏷️ Version

Python version

3.7

TensorFlow version

1.15

🥦 Conda env Setting

🔮 Create new env

🥦 Install Anaconda
💻 Open cmd and run:

# conda create -n <ENV_NAME> python=<REQUIRED_VERSION>
conda create -n tf1 python=3.7

▶️ Activate the new env

# conda activate <ENV_NAME>
conda activate tf1

🔽 Install Packages

💥 GPU vs CPU Computing

🚙 CPU

🚀 GPU

Brain of computer

Brawn of computer

Very few complex cores

hundreds of simpler cores with parallel architecture

single-thread performance optimization

thousands of concurrent hardware threads

Can do a bit of everything, but not great at much

Good for math heavy processes

🚀 Installing TensorFlow

conda install tensorflow-gpu=1.15

conda install tensorflow=1.15

📦 Installing other packages

conda install pillow Cython lxml jupyter matplotlib

conda install -c anaconda protobuf

🤖 Downloading models repository

🤸‍♀️ Cloning from GitHub

A repository that contains required utils for training and evaluation process
Open CMD and run in E disk and run:

# note that every time you open CMD you have 
# to activate your env again by running: 
# under E:\>
conda activate tf1
git clone https://github.com/tensorflow/models.git
cd models/research

🧐 I assume that you are running your commands under E disk,

🔃 Compiling Protobufs

# under (tf1) E:\models\research>
for /f %i in ('dir /b object_detection\protos\*.proto') do protoc object_detection\protos\%i --python_out=.

# under /models/research
$ protoc object_detection/protos/*.proto --python_out=.

📦 Compiling Packages

# under (tf1) E:\models\research>
python setup.py build
python setup.py install

🚩 Setting Python Path Temporarily

# under (tf1) E:\models\research> or anywhere 😅
set PYTHONPATH=E:\models\research;E:\models\research\slim

# under /models/research
$ export PYTHONPATH=`pwd`:`pwd`/slim

👮‍♀️ Every time you open CMD you have to set PYTHONPATH again

👩‍🔬 Installation Test

🧐 Check out that every thing is done

💻 Command

# under (tf1) E:\models\research>
python object_detection/builders/model_builder_tf1_test.py

🎉 Expected Output

Ran 17 tests in 0.833s

OK (skipped=1)

🖼️ Image Acquiring

👮‍♀️ Directory Structure

🏗️ I suppose that you created a structure like:

E:
|___ models
|___ demo
      |___ annotations
      |___ eval
      |___ images
      |___ inference
      |___ OIDv4_ToolKit
      |___ OpenImagesTool
      |___ pre_trainded_model
      |___ scripts
      |___ training

📂 Folder

📃 Description

🤖 models

the repo

📄 annotations

will contain generated .csv and .record files

👮‍♀️ eval

will contain results of evaluation

🖼️ images

will contain image data set

▶️ inference

will contain exported models after training

🔽 OIDv4_ToolKit

the repo (OpenImages Downloader)

👩‍🔧 OpenImagesTool

the repo (OpenImages Organizer)

👩‍🏫pre_trained_model

will contain files of TensorFlow model that we will retrain

👩‍💻 scripts

will contain scripts that we will use for pre-processing and training processes

🚴‍♀️ training

will contain generated check points during training

🚀 OpenImages Dataset

🕵️‍♀️ You can get images in various methods
👩‍🏫 I will show process of organizing OpenImages data set
🗃️ OpenImages is a huge data set contains annotated images of 600 objects
🔍 You can explore images by categories from here

🎨 Downloading By Category

OIDv4_Toolkit is a tool that we can use to download OpenImages dataset by category and by set (test, train, validation)

💻 To clone and build the project, open CMD and run:

# under (tf1) E:\demo>
git clone https://github.com/EscVM/OIDv4_ToolKit.git
cd OIDv4_ToolKit

# under (tf1) E:\demo\OIDv4_ToolKit>
pip install -r requirements.txt

⏬ To start downloading by category:

# python main.py downloader --classes <OBJECT_LIST> --type_csv <TYPE>
# TYPE: all | test | train | validation 
# under (tf1) E:\demo\OIDv4_ToolKit>
python main.py downloader --classes Apple Orange --type_csv validation

👮‍♀️ If object name consists of 2 parts then write it with '_', e.g. Bell_pepper

🤹‍♀️ Image Organization

🔮 OpenImagesTool

👩‍💻 OpenImagesTool is a tool to convert OpenImages images and annotations to TensorFlow-friendly structure.
🙄 OpenImages provides annotations ad .txt files in a format like:<OBJECT_NAME> <XMIN> <YMIN> <XMAX> <YMAX> which is not compatible with TensorFlow that requires VOC annotation format
💫 To do that synchronization we can do the following

💻 To clone and build the project, open CMD and run:

# under (tf1) E:\demo>
git clone https://github.com/asmaamirkhan/OpenImagesTool.git
cd OpenImagesTool/src

💻 Applying Organizing

🚀 Now, we will convert images and annotations that we have downloaded and save them to images folder

# under (tf1) E:\demo\OpenImagesTool\src> 
# python script.py -i <INPUT_PATH> -o <OUTPUT_PATH>
python script.py -i E:\pre_trainded_model\OIDv4_ToolKit\OID\Dataset -o E:\pre_trainded_model\images

👩‍🔬 OpenImagesTool adds validation images to training set by default, if you wand to disable this behavior you can add -v flag to the command.

🏷️ Creating Label Map

⛓️ label_map.pbtxt is a file that maps object names to corresponded IDs
➕ Create label_map.pbtxtfile under annotations folder and open it in a text editor
🖊️ Write your objects names and IDs in the following format

item {
    id: 1
    name: 'Hamster'
}

item {
    id: 2
    name: 'Apple'
}

👮‍♀️ id:0 is reserved for background, so don' t use it

🐞 Related error: ValueError: Label map id 0 is reserved for the background label

🏭 Generating CSV Files

🔄 Now we have to convert .xml files to csv file
🔻 Download the script xml_to_csv.py script and save it under scripts folder
💻 Open CMD and run:

👩‍🔬 Generating train csv file

# under (tf1) E:\demo\scripts>
python xml_to_csv.py -i E:\demo\images\train -o E:\demo\annotations\train_labels.csv

👩‍🔬 Generating test csv file

# under (tf1) E:\demo\scripts>
python xml_to_csv.py -i E:\demo\images\test -o E:\demo\annotations\test_labels.csv

👩‍🏭 Generating TF Records

🙇‍♀️ Now, we will generate tfrecords that will be used in training precess
🔻 Download generate_tfrecords.py script and save it under scripts folder

👩‍🔬 Generating train tfrecord

# under (tf1) E:\demo\scripts>
# python generate_tfrecords.py --label_map=<PATH_TO_LABEL_MAP> 
# --csv_input=<PATH_TO_CSV_FILE> --img_path=<PATH_TO_IMAGE_FOLDER>
# --output_path=<PATH_TO_OUTPUT_FILE>
python generate_tfrecords.py --label_map=E:/demo/annotations/label_map.pbtxt --csv_input=E:\demo\annotations\train_labels.csv --img_path=E:\demo\images\train --output_path=E:\demo\annotations\train.record

👩‍🔬 Generating test tfrecord

# under (tf1) E:\demo\scripts>
python generate_tfrecords.py --label_map=E:/demo/annotations/label_map.pbtxt --csv_input=E:\demo\annotations\test_labels.csv --img_path=E:\demo\images\test --output_path=E:\demo\annotations\test.record

🤖 Model Selecting

🎉 TensorFLow Object Detection Zoo provides a lot of pre-trained models
🕵️‍♀️ Models differentiate in terms of accuracy and speed, you can select the suitable model due to your priorities
💾 Select a model, extract it and save it under pre_trained_model folder
👀 Check out my notes here to get insight about differences between popular models

👩‍🔧 Model Configuration

⏬ Downloading config File

😎 We have downloaded the models (pre-trained weights) but now we have to download configuration file that contains training parameters and settings
👮‍♀️ Every model in TensorFlow Object Detection Zoo has a configuration file presented here
💾 Download the config file that corresponds to the models you have selected and save it under training folder

👩‍🔬 Updating config File

You have to update the following lines:

🙄 Take a look at Loss exploding issue

// number of classes
num_classes: 1 // set it to total number of classes you have

// path of pre-trained checkpoint
fine_tune_checkpoint: "E:/demo/pre_trained_model/ssd_mobilenet_v1_quantized_300x300_coco14_sync_2018_07_18/model.ckpt"

// path to train tfrecord
tf_record_input_reader {
    input_path: "E:/demo/annotations/train.record"
}

// number of images that will be used in evaluation process
eval_config: {
  metrics_set: "coco_detection_metrics"
  use_moving_averages: false
  // I suggest setting it to total number of testing set to get accurate results
  num_examples: 11193
}

eval_input_reader: {
  tf_record_input_reader {
    // path to test tfrecord
    input_path: "E:/demo/annotations/test.record"
  }
  // path to label map
  label_map_path: "E:/demo/annotations/label_map.pbtxt"
  // set it to true if you want to shuffle test set at each evaluation   
  shuffle: false
  num_readers: 1
}

🤹‍♀️ If you give the whole test set to evaluation process then shuffle functionality won't affect the results, it will only give you different examples on TensorBoard

👶 Training

🎉 Now we have done all preparations
🚀 Let the computer start learning
💻 Open CMD and run:

# under (tf1) E:\models\research\object_detection\legacy> 
# python train.py --train_dir=<DIRECTORY_TO_SAVE_CHECKPOINTS> 
# --pipeline_config_path=<PATH_TO_CONFIG_FILE>
python train.py --train_dir=E:/demo/training --pipeline_config_path=E:/demo/training/ssd_mobilenet_v1_quantized_300x300_coco14_sync.config

🕐 This process will take long (You can take a nap 🤭, but a long nap 🙄)
🕵️‍♀️ While model is being trained you will see loss values on CMD
✋ You can stop the process when the loss value achieves a good value (under 1)

👮‍♀️ Evaluation

🎳 Evaluating Script

🤭 After training process is done, let's do an exam to know how good (or bad 🙄) is our model doing
🎩 The following command will use the model on whole test set and after that print the results, so that we can do error analysis.
💻 So that, open CMD and run:

# under (tf1) E:\models\research\object_detection\legacy> 
# python eval.py --logtostderr --pipeline_config_path=<PATH_TO_CONFIG_FILE>
# --checkpoint_dir=<DIRECTORY_OF_CHECKPOINTS> --eval_dir=<DIRECTORY_TO_SAVE_EVAL_RESULTS>
python eval.py --pipeline_config_path=E:/demo/training/ssd_mobilenet_v1_quantized_300x300_coco14_sync.config --checkpoint_dir=E:/demo/training --eval_dir=E:/demo/eval

👀 Visualizing Results

✨ To see results on charts and images we can use TensorBoard for better analyzing
💻 Open CMD and run:

👩‍🏫 Training Values Visualization

🧐 Here you can see graphs of loss, learning rate and other values
🤓 And much more (You can investigate tabs at the top)
😋 It is feasable to use it while training (and exciting 🤩)

# under (tf1) E:\>
tensorboard --logdir=E:/demo/tarining

👮‍♀️ Evaluation Values Visualization

👀 Here you can see images from your test set with corresponded predictions
🤓 And much more (You can inspect tabs at the top)
❗ You must use this after running evaluation script

# under (tf1) E:\>
tensorboard --logdir=E:/demo/eval

🔍 See the visualized results on localhost:6006 and
🧐 You can inspect numerical values from report on terminal, result example:

 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.708
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.984
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.868
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.289
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.623
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.767
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.779
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.781
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.781
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.300
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.703
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.824

🎨 If you want to get metric report for each class you have to change evaluating protocol to pascal metrics by configuring metrics_set in .config file:

eval_config: {
  ...
  metrics_set: "weighted_pascal_voc_detection_metrics"
  ...
}

👒 Model Exporting

🔧 After training and evaluation processes are done, we have to make the model in such a format that we can use
🦺 For now, we have only checkpoints, so that we have to export .pb file
💻 So, open CMD and run:

# under (tf1) E:\models\research\object_detection>
# python export_inference_graph.py --input_type image_tensor 
# --pipeline_config_path <PATH_TO_CONFIG_FILE> 
# --trained_checkpoint_prefix <PATH_TO_LAST_CHECKPOINT>
# --output_directory <PATH_TO_SAVE_EXPORTED_MODEL>
python export_inference_graph.py --input_type image_tensor --pipeline_config_path=E:/demo/training/ssd_mobilenet_v1_quantized_300x300_coco14_sync.config --trained_checkpoint_prefix E:/demo/training/model.ckpt-16438 --output_directory E:/demo/inference/ssd_v1_quant

If you are using SSD and planning to convert it to tflite later you have to run

# under (tf1) E:\models\research\object_detection>
# python export_tflite_ssd_graph.py --input_type image_tensor 
# --pipeline_config_path <PATH_TO_CONFIG_FILE> 
# --trained_checkpoint_prefix <PATH_TO_LAST_CHECKPOINT>
# --output_directory <PATH_TO_SAVE_EXPORTED_MODEL>
python export_tflite_ssd_graph.py --input_type image_tensor --pipeline_config_path=E:/demo/training/ssd_mobilenet_v1_quantized_300x300_coco14_sync.config --trained_checkpoint_prefix E:/demo/training/model.ckpt-16438 --output_directory E:/demo/inference/ssd_v1_quant

📱 Converting to tflite

💁‍♀️ If you want to use the model in mobile apps or tflite supported embedded devices you have to convert .pb file to .tflite file

📙 About TFLite

📱 TensorFlow Lite is TensorFlow’s lightweight solution for mobile and embedded devices.
🧐 It enables on-device machine learning inference with low latency and a small binary size.
😎 TensorFlow Lite uses many techniques for this such as quantized kernels that allow smaller and faster (fixed-point math) models.
📍 Official site

🍫 Converting Command

💻 To apply converting open CMD and run:

# under (tf1) E:\>
# toco --graph_def_file=<PATH_TO_PB_FILE>
# --output_file=<PATH_TO_SAVE> --input_shapes=<INPUT_SHAPES>
# --input_arrays=<INPUT_ARRAYS> --output_arrays=<OUTPUT_ARRAYS>
# --inference_type=<QUANTIZED_UINT8|FLOAT> --change_concat_input_ranges=<true|false>
# --alow_custom_ops 
# args for QUANTIZED_UINT8 inference
# --mean_values=<MEAN_VALUES> std_dev_values=<STD_DEV_VALUES> 
toco --graph_def_file=E:\demo\inference\ssd_v1_quant\tflite_graph.pb --output_file=E:\demo\tflite\ssd_mobilenet.tflite --input_shapes=1,300,300,3 --input_arrays=normalized_input_image_tensor --output_arrays=TFLite_Detection_PostProcess,TFLite_Detection_PostProcess:1,TFLite_Detection_PostProcess:2,TFLite_Detection_PostProcess:3 --inference_type=QUANTIZED_UINT8 --mean_values=128 --std_dev_values=128 --change_concat_input_ranges=false --allow_custom_ops

🐞 Common Issues

🥅 nets module issue

ModuleNotFoundError: No module named 'nets'

This means that there is a problem in setting PYTHONPATH, try to run:

(tf1) E:\models\research>set PYTHONPATH=E:\models\research;E:\models\research\slim

🗃️ tf_slim module issue

ModuleNotFoundError: No module named 'tf_slim'

This means that tf_slim module is not installed, try to run:

(tf1) E:\models\research>pip install tf_slim

🗃️ Allocation error

2020-08-11 17:44:00.357710: I tensorflow/core/common_runtime/bfc_allocator.cc:929] Stats: 
Limit:                 10661327
InUse:                 10656704
MaxInUse:              10657688
NumAllocs:                 2959
MaxAllocSize:           3045064

For me it is fixed by minimizing batch_size in .config file, it is related to your computations resources

train_config: {
  ....
  batch_size: 128
  ....
}

❗ no such file or directory error

train.py tensorflow.python.framework.errors_impl.notfounderror no such file or directory

🙄 For me it was a typo in train.py command
📍 Related discussion 1
📍 Related discussion 2

🤯 LossTensor is inf issue

LossTensor is inf or nan. : Tensor had NaN values

👀 Related discussion is here, it is common that it is an annotation problem
🙄 Maybe there is some bounding boxes outside the image boundaries
🤯 The solution for me was minimizing batch size in .config file

🙄 Ground truth issue

The following classes have no ground truth examples

👀 Related discussion is here
👩‍🔧 For me it was a misspelling issue in label_map file,
🙄 Pay attention to small and capital letters

🏷️ labelmap issue

ValueError: Label map id 0 is reserved for the background label

👮‍♀️ id:0 is reserved for background, We can not use it for objects
🆔 start IDs from 1

🔦 No Variable to Save issue

Value Error: No Variable to Save

👀 Related solution is here
👩‍🔧 Adding the following line to .config file solved the problem

train_config: {
  ...
  fine_tune_checkpoint_type:  "detection"
  ...
}

🧪 pycocotools module issue

ModuleNotFoundError: No module named 'pycocotools'

👀 Related discussion is here
👩‍🔧 Applying the downloading instructions provided here solved the problem for me (on Windows 10)

$ conda install -c conda-forge pycocotools

🥴 pycocotools type error issue

pycocotools typeerror: object of type cannot be safely interpreted as an integer.

👩‍🔧 I solved the problem by editing the following lines in cocoeval.py script under pycocotools package (by adding casting)
👮‍♀️ Make sure that you are editting the package in you env not in other env.

self.iouThrs = np.linspace(.5, 0.95, int(np.round((0.95 - .5) / .05)) + 1, endpoint=True)
self.recThrs = np.linspace(.0, 1.00, int(np.round((1.00 - .0) / .01)) + 1, endpoint=True)

💣 Loss Exploding

INFO:tensorflow:global step 440: loss = 2106942657570782838784.0000 (0.405 sec/step)
INFO:tensorflow:global step 440: loss = 2106942657570782838784.0000 (0.405 sec/step)
INFO:tensorflow:global step 441: loss = 7774169971762292326400.0000 (0.401 sec/step)
INFO:tensorflow:global step 441: loss = 7774169971762292326400.0000 (0.401 sec/step)
INFO:tensorflow:global step 442: loss = 25262924095336287830016.0000 (0.404 sec/step)
INFO:tensorflow:global step 442: loss = 25262924095336287830016.0000 (0.404 sec/step)

🙄 For me there were 2 problems:

First:

Some of annotations were wrong and overflow the image (e.g. xmax > width)
I could check that by inspecting .csv file
Example:

filename

width

height

class

xmin

ymin

xmax

ymax

104.jpg

640

480

class_1

284

406

320

492

Second:

Learning rate in .config file is too big (the default value was big 🙄)
The following values are valid and tested on mobilenet_ssd_v1_quantized (Not very good 🙄)

learning_rate: {
  cosine_decay_learning_rate {
    learning_rate_base: .01
    total_steps: 50000
    warmup_learning_rate: 0.005
    warmup_steps: 2000
  }
}

🥴 Getting convolution Failure

Error : Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.

It may be a Cuda version incompatibility issue
For me it was a memory issue and I solved it by adding the following line to train.py script

os.environ['TF_FORCE_GPU_ALLOW_GROWTH'] = 'true'

📦 Invalid box data error

raise ValueError('Invalid box data. data must be a numpy array of '
ValueError: Invalid box data. data must be a numpy array of N*[y_min, x_min, y_max, x_max]

🙄 For me it was a logical error, in test_labels.csv there were some invalid values like: file123.jpg,134,63,3,0,0,-1029,-615
🏷 So, it was a labeling issue, fixing these lines solved the problem
👀 Related discussion

🔄 Image with id added issue

raise ValueError('Image with id {} already added.'.format(image_id))
ValueError: Image with id 123.png already added.

☝ It is an issue in .config caused by giving value to num_example that is greater than total number of test image in test directory

eval_config: {
  metrics_set: "coco_detection_metrics"
  use_moving_averages: false
  num_examples: 1265 // <--- this value was greater than total test images
}