🤖TensorFlow Object Detection API
Training Custom Object Detector Step by Step
🌱 Introduction
✨ Tensorflow object detection API is a powerful tool that allows us to create custom object detectors depending on pre-trained, fine tuned models even if we don't have strong AI background or strong TensorFlow knowledge.
💁♀️ Building models depending on pre-trained models saves us a lot of time and labor since we are using models that maybe trained for weeks using very strong machines, this principle is called Transfer Learning.
🗃️ As a data set I will show you how to use OpenImages data set and converting its data to TensorFlow-friendly format.
🎀 You can find this article on Medium too.
🚩 Development Pipeline
🤕 While you are applying the instructions if you get errors you can check out 🐞 Common Issues section at the end of the article
👩💻 Environment Preparation
🔸 Environment Info
💻 Platform
🏷️ Version
Python version
3.7
TensorFlow version
1.15
🥦 Conda env Setting
🔮 Create new env
🥦 Install Anaconda
💻 Open cmd and run:
▶️ Activate the new env
🔽 Install Packages
💥 GPU vs CPU Computing
🚙 CPU
🚀 GPU
Brain of computer
Brawn of computer
Very few complex cores
hundreds of simpler cores with parallel architecture
single-thread performance optimization
thousands of concurrent hardware threads
Can do a bit of everything, but not great at much
Good for math heavy processes
🚀 Installing TensorFlow
📦 Installing other packages
🤖 Downloading models repository
🤸♀️ Cloning from GitHub
A repository that contains required utils for training and evaluation process
Open CMD and run in
E
disk and run:
🧐 I assume that you are running your commands under E
disk,
🔃 Compiling Protobufs
📦 Compiling Packages
🚩 Setting Python Path Temporarily
👮♀️ Every time you open CMD you have to set PYTHONPATH
again
👩🔬 Installation Test
🧐 Check out that every thing is done
💻 Command
🎉 Expected Output
🖼️ Image Acquiring
👮♀️ Directory Structure
🏗️ I suppose that you created a structure like:
📂 Folder
📃 Description
🤖 models
📄 annotations
will contain generated .csv
and .record
files
👮♀️ eval
will contain results of evaluation
🖼️ images
will contain image data set
▶️ inference
will contain exported models after training
🔽 OIDv4_ToolKit
👩🔧 OpenImagesTool
👩🏫pre_trained_model
will contain files of TensorFlow model that we will retrain
👩💻 scripts
will contain scripts that we will use for pre-processing and training processes
🚴♀️ training
will contain generated check points during training
🚀 OpenImages Dataset
🕵️♀️ You can get images in various methods
👩🏫 I will show process of organizing OpenImages data set
🗃️ OpenImages is a huge data set contains annotated images of 600 objects
🔍 You can explore images by categories from here
🎨 Downloading By Category
OIDv4_Toolkit is a tool that we can use to download OpenImages dataset by category and by set (test, train, validation)
💻 To clone and build the project, open CMD and run:
⏬ To start downloading by category:
👮♀️ If object name consists of 2 parts then write it with '_', e.g.
Bell_pepper
🤹♀️ Image Organization
🔮 OpenImagesTool
👩💻 OpenImagesTool is a tool to convert OpenImages images and annotations to TensorFlow-friendly structure.
🙄 OpenImages provides annotations ad
.txt
files in a format like:<OBJECT_NAME> <XMIN> <YMIN> <XMAX> <YMAX>
which is not compatible with TensorFlow that requires VOC annotation format💫 To do that synchronization we can do the following
💻 To clone and build the project, open CMD and run:
💻 Applying Organizing
🚀 Now, we will convert images and annotations that we have downloaded and save them to images
folder
👩🔬 OpenImagesTool adds validation images to training set by default, if you wand to disable this behavior you can add -v
flag to the command.
🏷️ Creating Label Map
⛓️
label_map.pbtxt
is a file that maps object names to corresponded IDs➕ Create
label_map.pbtxt
file under annotations folder and open it in a text editor🖊️ Write your objects names and IDs in the following format
👮♀️ id:0
is reserved for background, so don' t use it
🐞 Related error: ValueError: Label map id 0 is reserved for the background label
🏭 Generating CSV Files
🔄 Now we have to convert
.xml
files to csv file🔻 Download the script xml_to_csv.py script and save it under
scripts
folder💻 Open CMD and run:
👩🔬 Generating train csv file
👩🔬 Generating test csv file
👩🏭 Generating TF Records
🙇♀️ Now, we will generate tfrecords that will be used in training precess
🔻 Download generate_tfrecords.py script and save it under
scripts
folder
👩🔬 Generating train tfrecord
👩🔬 Generating test tfrecord
🤖 Model Selecting
🎉 TensorFLow Object Detection Zoo provides a lot of pre-trained models
🕵️♀️ Models differentiate in terms of accuracy and speed, you can select the suitable model due to your priorities
💾 Select a model, extract it and save it under
pre_trained_model
folder👀 Check out my notes here to get insight about differences between popular models
👩🔧 Model Configuration
⏬ Downloading config File
😎 We have downloaded the models (pre-trained weights) but now we have to download configuration file that contains training parameters and settings
👮♀️ Every model in TensorFlow Object Detection Zoo has a configuration file presented here
💾 Download the config file that corresponds to the models you have selected and save it under
training
folder
👩🔬 Updating config File
You have to update the following lines:
🙄 Take a look at Loss exploding issue
🤹♀️ If you give the whole test set to evaluation process then shuffle functionality won't affect the results, it will only give you different examples on TensorBoard
👶 Training
🎉 Now we have done all preparations
🚀 Let the computer start learning
💻 Open CMD and run:
🕐 This process will take long (You can take a nap 🤭, but a long nap 🙄)
🕵️♀️ While model is being trained you will see loss values on CMD
✋ You can stop the process when the loss value achieves a good value (under 1)
👮♀️ Evaluation
🎳 Evaluating Script
🤭 After training process is done, let's do an exam to know how good (or bad 🙄) is our model doing
🎩 The following command will use the model on whole test set and after that print the results, so that we can do error analysis.
💻 So that, open CMD and run:
👀 Visualizing Results
✨ To see results on charts and images we can use TensorBoard for better analyzing
💻 Open CMD and run:
👩🏫 Training Values Visualization
🧐 Here you can see graphs of loss, learning rate and other values
🤓 And much more (You can investigate tabs at the top)
😋 It is feasable to use it while training (and exciting 🤩)
👮♀️ Evaluation Values Visualization
👀 Here you can see images from your test set with corresponded predictions
🤓 And much more (You can inspect tabs at the top)
❗ You must use this after running evaluation script
🔍 See the visualized results on localhost:6006 and
🧐 You can inspect numerical values from report on terminal, result example:
🎨 If you want to get metric report for each class you have to change evaluating protocol to pascal metrics by configuring
metrics_set
in.config
file:
👒 Model Exporting
🔧 After training and evaluation processes are done, we have to make the model in such a format that we can use
🦺 For now, we have only checkpoints, so that we have to export
.pb
file💻 So, open CMD and run:
If you are using SSD and planning to convert it to tflite later you have to run
📱 Converting to tflite
💁♀️ If you want to use the model in mobile apps or tflite supported embedded devices you have to convert
.pb
file to.tflite
file
📙 About TFLite
📱 TensorFlow Lite is TensorFlow’s lightweight solution for mobile and embedded devices.
🧐 It enables on-device machine learning inference with low latency and a small binary size.
😎 TensorFlow Lite uses many techniques for this such as quantized kernels that allow smaller and faster (fixed-point math) models.
🍫 Converting Command
💻 To apply converting open CMD and run:
🐞 Common Issues
🥅 nets module issue
ModuleNotFoundError: No module named 'nets'
This means that there is a problem in setting PYTHONPATH
, try to run:
🗃️ tf_slim module issue
ModuleNotFoundError: No module named 'tf_slim'
This means that tf_slim module is not installed, try to run:
🗃️ Allocation error
For me it is fixed by minimizing batch_size in .config
file, it is related to your computations resources
❗ no such file or directory error
train.py tensorflow.python.framework.errors_impl.notfounderror no such file or directory
🙄 For me it was a typo in train.py command
🤯 LossTensor is inf issue
LossTensor is inf or nan. : Tensor had NaN values
👀 Related discussion is here, it is common that it is an annotation problem
🙄 Maybe there is some bounding boxes outside the image boundaries
🤯 The solution for me was minimizing batch size in
.config
file
🙄 Ground truth issue
The following classes have no ground truth examples
👀 Related discussion is here
👩🔧 For me it was a misspelling issue in
label_map
file,🙄 Pay attention to small and capital letters
🏷️ labelmap issue
ValueError: Label map id 0 is reserved for the background label
👮♀️ id:0 is reserved for background, We can not use it for objects
🆔 start IDs from 1
🔦 No Variable to Save issue
Value Error: No Variable to Save
👀 Related solution is here
👩🔧 Adding the following line to
.config
file solved the problem
🧪 pycocotools module issue
ModuleNotFoundError: No module named 'pycocotools'
🥴 pycocotools type error issue
pycocotools typeerror: object of type cannot be safely interpreted as an integer.
👩🔧 I solved the problem by editing the following lines in
cocoeval.py
script under pycocotools package (by adding casting)👮♀️ Make sure that you are editting the package in you env not in other env.
💣 Loss Exploding
🙄 For me there were 2 problems:
First:
Some of annotations were wrong and overflow the image (e.g. xmax > width)
I could check that by inspecting
.csv
fileExample:
filename
width
height
class
xmin
ymin
xmax
ymax
104.jpg
640
480
class_1
284
406
320
492
Second:
Learning rate in
.config
file is too big (the default value was big 🙄)The following values are valid and tested on
mobilenet_ssd_v1_quantized
(Not very good 🙄)
🥴 Getting convolution Failure
It may be a Cuda version incompatibility issue
For me it was a memory issue and I solved it by adding the following line to
train.py
script
📦 Invalid box data error
🙄 For me it was a logical error, in
test_labels.csv
there were some invalid values like:file123.jpg,134,63,3,0,0,-1029,-615
🏷 So, it was a labeling issue, fixing these lines solved the problem
🔄 Image with id added issue
☝ It is an issue in
.config
caused by giving value tonum_example
that is greater than total number of test image in test directory
🧐 References
Last updated