TensorFlow Object Detection API
Training Custom Object Detector Step by Step
๐ฑ Introduction
โจ Tensorflow object detection API is a powerful tool that allows us to create custom object detectors depending on pre-trained, fine tuned models even if we don't have strong AI background or strong TensorFlow knowledge.
๐โโ๏ธ Building models depending on pre-trained models saves us a lot of time and labor since we are using models that maybe trained for weeks using very strong machines, this principle is called Transfer Learning.
๐๏ธ As a data set I will show you how to use OpenImages data set and converting its data to TensorFlow-friendly format.
๐ You can find this article on Medium too.
๐ฉ Development Pipeline
๐ค While you are applying the instructions if you get errors you can check out ๐ Common Issues section at the end of the article
๐ฉโ๐ป Environment Preparation
๐ธ Environment Info
๐ป Platform
๐ท๏ธ Version
Python version
3.7
TensorFlow version
1.15
๐ฅฆ Conda env Setting
๐ฎ Create new env
๐ฅฆ Install Anaconda
๐ป Open cmd and run:
โถ๏ธ Activate the new env
๐ฝ Install Packages
๐ฅ GPU vs CPU Computing
๐ CPU
๐ GPU
Brain of computer
Brawn of computer
Very few complex cores
hundreds of simpler cores with parallel architecture
single-thread performance optimization
thousands of concurrent hardware threads
Can do a bit of everything, but not great at much
Good for math heavy processes
๐ Installing TensorFlow
๐ฆ Installing other packages
๐ค Downloading models repository
๐คธโโ๏ธ Cloning from GitHub
A repository that contains required utils for training and evaluation process
Open CMD and run in
E
disk and run:
๐ง I assume that you are running your commands under E
disk,
๐ Compiling Protobufs
๐ฆ Compiling Packages
๐ฉ Setting Python Path Temporarily
๐ฎโโ๏ธ Every time you open CMD you have to set PYTHONPATH
again
๐ฉโ๐ฌ Installation Test
๐ง Check out that every thing is done
๐ป Command
๐ Expected Output
๐ผ๏ธ Image Acquiring
๐ฎโโ๏ธ Directory Structure
๐๏ธ I suppose that you created a structure like:
๐ Folder
๐ Description
๐ค models
๐ annotations
will contain generated .csv
and .record
files
๐ฎโโ๏ธ eval
will contain results of evaluation
๐ผ๏ธ images
will contain image data set
โถ๏ธ inference
will contain exported models after training
๐ฝ OIDv4_ToolKit
๐ฉโ๐ง OpenImagesTool
๐ฉโ๐ซpre_trained_model
will contain files of TensorFlow model that we will retrain
๐ฉโ๐ป scripts
will contain scripts that we will use for pre-processing and training processes
๐ดโโ๏ธ training
will contain generated check points during training
๐ OpenImages Dataset
๐ต๏ธโโ๏ธ You can get images in various methods
๐ฉโ๐ซ I will show process of organizing OpenImages data set
๐๏ธ OpenImages is a huge data set contains annotated images of 600 objects
๐ You can explore images by categories from here
๐จ Downloading By Category
OIDv4_Toolkit is a tool that we can use to download OpenImages dataset by category and by set (test, train, validation)
๐ป To clone and build the project, open CMD and run:
โฌ To start downloading by category:
๐ฎโโ๏ธ If object name consists of 2 parts then write it with '_', e.g.
Bell_pepper
๐คนโโ๏ธ Image Organization
๐ฎ OpenImagesTool
๐ฉโ๐ป OpenImagesTool is a tool to convert OpenImages images and annotations to TensorFlow-friendly structure.
๐ OpenImages provides annotations ad
.txt
files in a format like:<OBJECT_NAME> <XMIN> <YMIN> <XMAX> <YMAX>
which is not compatible with TensorFlow that requires VOC annotation format๐ซ To do that synchronization we can do the following
๐ป To clone and build the project, open CMD and run:
๐ป Applying Organizing
๐ Now, we will convert images and annotations that we have downloaded and save them to images
folder
๐ฉโ๐ฌ OpenImagesTool adds validation images to training set by default, if you wand to disable this behavior you can add -v
flag to the command.
๐ท๏ธ Creating Label Map
โ๏ธ
label_map.pbtxt
is a file that maps object names to corresponded IDsโ Create
label_map.pbtxt
file under annotations folder and open it in a text editor๐๏ธ Write your objects names and IDs in the following format
๐ฎโโ๏ธ id:0
is reserved for background, so don' t use it
๐ Related error: ValueError: Label map id 0 is reserved for the background label
๐ญ Generating CSV Files
๐ Now we have to convert
.xml
files to csv file๐ป Download the script xml_to_csv.py script and save it under
scripts
folder๐ป Open CMD and run:
๐ฉโ๐ฌ Generating train csv file
๐ฉโ๐ฌ Generating test csv file
๐ฉโ๐ญ Generating TF Records
๐โโ๏ธ Now, we will generate tfrecords that will be used in training precess
๐ป Download generate_tfrecords.py script and save it under
scripts
folder
๐ฉโ๐ฌ Generating train tfrecord
๐ฉโ๐ฌ Generating test tfrecord
๐ค Model Selecting
๐ TensorFLow Object Detection Zoo provides a lot of pre-trained models
๐ต๏ธโโ๏ธ Models differentiate in terms of accuracy and speed, you can select the suitable model due to your priorities
๐พ Select a model, extract it and save it under
pre_trained_model
folder๐ Check out my notes here to get insight about differences between popular models
๐ฉโ๐ง Model Configuration
โฌ Downloading config File
๐ We have downloaded the models (pre-trained weights) but now we have to download configuration file that contains training parameters and settings
๐ฎโโ๏ธ Every model in TensorFlow Object Detection Zoo has a configuration file presented here
๐พ Download the config file that corresponds to the models you have selected and save it under
training
folder
๐ฉโ๐ฌ Updating config File
You have to update the following lines:
๐ Take a look at Loss exploding issue
๐คนโโ๏ธ If you give the whole test set to evaluation process then shuffle functionality won't affect the results, it will only give you different examples on TensorBoard
๐ถ Training
๐ Now we have done all preparations
๐ Let the computer start learning
๐ป Open CMD and run:
๐ This process will take long (You can take a nap ๐คญ, but a long nap ๐)
๐ต๏ธโโ๏ธ While model is being trained you will see loss values on CMD
โ You can stop the process when the loss value achieves a good value (under 1)
๐ฎโโ๏ธ Evaluation
๐ณ Evaluating Script
๐คญ After training process is done, let's do an exam to know how good (or bad ๐) is our model doing
๐ฉ The following command will use the model on whole test set and after that print the results, so that we can do error analysis.
๐ป So that, open CMD and run:
๐ Visualizing Results
โจ To see results on charts and images we can use TensorBoard for better analyzing
๐ป Open CMD and run:
๐ฉโ๐ซ Training Values Visualization
๐ง Here you can see graphs of loss, learning rate and other values
๐ค And much more (You can investigate tabs at the top)
๐ It is feasable to use it while training (and exciting ๐คฉ)
๐ฎโโ๏ธ Evaluation Values Visualization
๐ Here you can see images from your test set with corresponded predictions
๐ค And much more (You can inspect tabs at the top)
โ You must use this after running evaluation script
๐ See the visualized results on localhost:6006 and
๐ง You can inspect numerical values from report on terminal, result example:
๐จ If you want to get metric report for each class you have to change evaluating protocol to pascal metrics by configuring
metrics_set
in.config
file:
๐ Model Exporting
๐ง After training and evaluation processes are done, we have to make the model in such a format that we can use
๐ฆบ For now, we have only checkpoints, so that we have to export
.pb
file๐ป So, open CMD and run:
If you are using SSD and planning to convert it to tflite later you have to run
๐ฑ Converting to tflite
๐โโ๏ธ If you want to use the model in mobile apps or tflite supported embedded devices you have to convert
.pb
file to.tflite
file
๐ About TFLite
๐ฑ TensorFlow Lite is TensorFlowโs lightweight solution for mobile and embedded devices.
๐ง It enables on-device machine learning inference with low latency and a small binary size.
๐ TensorFlow Lite uses many techniques for this such as quantized kernels that allow smaller and faster (fixed-point math) models.
๐ Official site
๐ซ Converting Command
๐ป To apply converting open CMD and run:
๐ Common Issues
๐ฅ
nets module issue
ModuleNotFoundError: No module named 'nets'
This means that there is a problem in setting PYTHONPATH
, try to run:
๐๏ธ tf_slim module issue
ModuleNotFoundError: No module named 'tf_slim'
This means that tf_slim module is not installed, try to run:
๐๏ธ Allocation error
For me it is fixed by minimizing batch_size in .config
file, it is related to your computations resources
โ no such file or directory error
train.py tensorflow.python.framework.errors_impl.notfounderror no such file or directory
๐ For me it was a typo in train.py command
๐คฏ LossTensor is inf issue
LossTensor is inf or nan. : Tensor had NaN values
๐ Related discussion is here, it is common that it is an annotation problem
๐ Maybe there is some bounding boxes outside the image boundaries
๐คฏ The solution for me was minimizing batch size in
.config
file
๐ Ground truth issue
The following classes have no ground truth examples
๐ Related discussion is here
๐ฉโ๐ง For me it was a misspelling issue in
label_map
file,๐ Pay attention to small and capital letters
๐ท๏ธ labelmap issue
ValueError: Label map id 0 is reserved for the background label
๐ฎโโ๏ธ id:0 is reserved for background, We can not use it for objects
๐ start IDs from 1
๐ฆ No Variable to Save issue
Value Error: No Variable to Save
๐ Related solution is here
๐ฉโ๐ง Adding the following line to
.config
file solved the problem
๐งช pycocotools module issue
ModuleNotFoundError: No module named 'pycocotools'
๐ฅด pycocotools type error issue
pycocotools typeerror: object of type cannot be safely interpreted as an integer.
๐ฉโ๐ง I solved the problem by editing the following lines in
cocoeval.py
script under pycocotools package (by adding casting)๐ฎโโ๏ธ Make sure that you are editting the package in you env not in other env.
๐ฃ Loss Exploding
๐ For me there were 2 problems:
First:
Some of annotations were wrong and overflow the image (e.g. xmax > width)
I could check that by inspecting
.csv
fileExample:
filename
width
height
class
xmin
ymin
xmax
ymax
104.jpg
640
480
class_1
284
406
320
492
Second:
Learning rate in
.config
file is too big (the default value was big ๐)The following values are valid and tested on
mobilenet_ssd_v1_quantized
(Not very good ๐)
๐ฅด Getting convolution Failure
It may be a Cuda version incompatibility issue
For me it was a memory issue and I solved it by adding the following line to
train.py
script
๐ฆ Invalid box data error
๐ For me it was a logical error, in
test_labels.csv
there were some invalid values like:file123.jpg,134,63,3,0,0,-1029,-615
๐ท So, it was a labeling issue, fixing these lines solved the problem
๐ Related discussion
๐ Image with id added issue
โ It is an issue in
.config
caused by giving value tonum_example
that is greater than total number of test image in test directory
๐ง References
Last updated