๐คTensorFlow Object Detection API
Training Custom Object Detector Step by Step
๐ฑ Introduction
โจ Tensorflow object detection API is a powerful tool that allows us to create custom object detectors depending on pre-trained, fine tuned models even if we don't have strong AI background or strong TensorFlow knowledge.
๐โโ๏ธ Building models depending on pre-trained models saves us a lot of time and labor since we are using models that maybe trained for weeks using very strong machines, this principle is called Transfer Learning.
๐๏ธ As a data set I will show you how to use OpenImages data set and converting its data to TensorFlow-friendly format.
๐ You can find this article on Medium too.
๐ฉ Development Pipeline
๐ฉโ๐ป Environment Preparation
๐ธ Environment Info
๐ป Platform
๐ท๏ธ Version
Python version
3.7
TensorFlow version
1.15
๐ฅฆ Conda env Setting
๐ฎ Create new env
๐ฅฆ Install Anaconda
๐ป Open cmd and run:
# conda create -n <ENV_NAME> python=<REQUIRED_VERSION>
conda create -n tf1 python=3.7
โถ๏ธ Activate the new env
# conda activate <ENV_NAME>
conda activate tf1
๐ฝ Install Packages
๐ฅ GPU vs CPU Computing
๐ CPU
๐ GPU
Brain of computer
Brawn of computer
Very few complex cores
hundreds of simpler cores with parallel architecture
single-thread performance optimization
thousands of concurrent hardware threads
Can do a bit of everything, but not great at much
Good for math heavy processes
๐ Installing TensorFlow
conda install tensorflow-gpu=1.15
๐ฆ Installing other packages
conda install pillow Cython lxml jupyter matplotlib
conda install -c anaconda protobuf
๐ค Downloading models repository
๐คธโโ๏ธ Cloning from GitHub
A repository that contains required utils for training and evaluation process
Open CMD and run in
E
disk and run:
# note that every time you open CMD you have
# to activate your env again by running:
# under E:\>
conda activate tf1
git clone https://github.com/tensorflow/models.git
cd models/research
๐ง I assume that you are running your commands under E
disk,
๐ Compiling Protobufs
# under (tf1) E:\models\research>
for /f %i in ('dir /b object_detection\protos\*.proto') do protoc object_detection\protos\%i --python_out=.
๐ฆ Compiling Packages
# under (tf1) E:\models\research>
python setup.py build
python setup.py install
๐ฉ Setting Python Path Temporarily
# under (tf1) E:\models\research> or anywhere ๐
set PYTHONPATH=E:\models\research;E:\models\research\slim
๐ฉโ๐ฌ Installation Test
๐ง Check out that every thing is done
๐ป Command
# under (tf1) E:\models\research>
python object_detection/builders/model_builder_tf1_test.py
๐ Expected Output
Ran 17 tests in 0.833s
OK (skipped=1)
๐ผ๏ธ Image Acquiring
๐ฎโโ๏ธ Directory Structure
๐๏ธ I suppose that you created a structure like:
E:
|___ models
|___ demo
|___ annotations
|___ eval
|___ images
|___ inference
|___ OIDv4_ToolKit
|___ OpenImagesTool
|___ pre_trainded_model
|___ scripts
|___ training
๐ Folder
๐ Description
๐ค models
the repo here
๐ annotations
will contain generated .csv
and .record
files
๐ฎโโ๏ธ eval
will contain results of evaluation
๐ผ๏ธ images
will contain image data set
โถ๏ธ inference
will contain exported models after training
๐ฝ OIDv4_ToolKit
the repo here (OpenImages Downloader)
๐ฉโ๐ง OpenImagesTool
the repo here (OpenImages Organizer)
๐ฉโ๐ซpre_trained_model
will contain files of TensorFlow model that we will retrain
๐ฉโ๐ป scripts
will contain scripts that we will use for pre-processing and training processes
๐ดโโ๏ธ training
will contain generated check points during training
๐ OpenImages Dataset
๐ต๏ธโโ๏ธ You can get images in various methods
๐ฉโ๐ซ I will show process of organizing OpenImages data set
๐๏ธ OpenImages is a huge data set contains annotated images of 600 objects
๐ You can explore images by categories from here
๐จ Downloading By Category
OIDv4_Toolkit is a tool that we can use to download OpenImages dataset by category and by set (test, train, validation)
๐ป To clone and build the project, open CMD and run:
# under (tf1) E:\demo>
git clone https://github.com/EscVM/OIDv4_ToolKit.git
cd OIDv4_ToolKit
# under (tf1) E:\demo\OIDv4_ToolKit>
pip install -r requirements.txt
โฌ To start downloading by category:
# python main.py downloader --classes <OBJECT_LIST> --type_csv <TYPE>
# TYPE: all | test | train | validation
# under (tf1) E:\demo\OIDv4_ToolKit>
python main.py downloader --classes Apple Orange --type_csv validation
๐ฎโโ๏ธ If object name consists of 2 parts then write it with '_', e.g.
Bell_pepper
๐คนโโ๏ธ Image Organization
๐ฎ OpenImagesTool
๐ฉโ๐ป OpenImagesTool is a tool to convert OpenImages images and annotations to TensorFlow-friendly structure.
๐ OpenImages provides annotations ad
.txt
files in a format like:<OBJECT_NAME> <XMIN> <YMIN> <XMAX> <YMAX>
which is not compatible with TensorFlow that requires VOC annotation format๐ซ To do that synchronization we can do the following
๐ป To clone and build the project, open CMD and run:
# under (tf1) E:\demo>
git clone https://github.com/asmaamirkhan/OpenImagesTool.git
cd OpenImagesTool/src
๐ป Applying Organizing
๐ Now, we will convert images and annotations that we have downloaded and save them to images
folder
# under (tf1) E:\demo\OpenImagesTool\src>
# python script.py -i <INPUT_PATH> -o <OUTPUT_PATH>
python script.py -i E:\pre_trainded_model\OIDv4_ToolKit\OID\Dataset -o E:\pre_trainded_model\images
๐ท๏ธ Creating Label Map
โ๏ธ
label_map.pbtxt
is a file that maps object names to corresponded IDsโ Create
label_map.pbtxt
file under annotations folder and open it in a text editor๐๏ธ Write your objects names and IDs in the following format
item {
id: 1
name: 'Hamster'
}
item {
id: 2
name: 'Apple'
}
๐ญ Generating CSV Files
๐ Now we have to convert
.xml
files to csv file๐ป Download the script xml_to_csv.py script and save it under
scripts
folder๐ป Open CMD and run:
๐ฉโ๐ฌ Generating train csv file
# under (tf1) E:\demo\scripts>
python xml_to_csv.py -i E:\demo\images\train -o E:\demo\annotations\train_labels.csv
๐ฉโ๐ฌ Generating test csv file
# under (tf1) E:\demo\scripts>
python xml_to_csv.py -i E:\demo\images\test -o E:\demo\annotations\test_labels.csv
๐ฉโ๐ญ Generating TF Records
๐โโ๏ธ Now, we will generate tfrecords that will be used in training precess
๐ป Download generate_tfrecords.py script and save it under
scripts
folder
๐ฉโ๐ฌ Generating train tfrecord
# under (tf1) E:\demo\scripts>
# python generate_tfrecords.py --label_map=<PATH_TO_LABEL_MAP>
# --csv_input=<PATH_TO_CSV_FILE> --img_path=<PATH_TO_IMAGE_FOLDER>
# --output_path=<PATH_TO_OUTPUT_FILE>
python generate_tfrecords.py --label_map=E:/demo/annotations/label_map.pbtxt --csv_input=E:\demo\annotations\train_labels.csv --img_path=E:\demo\images\train --output_path=E:\demo\annotations\train.record
๐ฉโ๐ฌ Generating test tfrecord
# under (tf1) E:\demo\scripts>
python generate_tfrecords.py --label_map=E:/demo/annotations/label_map.pbtxt --csv_input=E:\demo\annotations\test_labels.csv --img_path=E:\demo\images\test --output_path=E:\demo\annotations\test.record
๐ค Model Selecting
๐ TensorFLow Object Detection Zoo provides a lot of pre-trained models
๐ต๏ธโโ๏ธ Models differentiate in terms of accuracy and speed, you can select the suitable model due to your priorities
๐พ Select a model, extract it and save it under
pre_trained_model
folder๐ Check out my notes here to get insight about differences between popular models
๐ฉโ๐ง Model Configuration
โฌ Downloading config File
๐ We have downloaded the models (pre-trained weights) but now we have to download configuration file that contains training parameters and settings
๐ฎโโ๏ธ Every model in TensorFlow Object Detection Zoo has a configuration file presented here
๐พ Download the config file that corresponds to the models you have selected and save it under
training
folder
๐ฉโ๐ฌ Updating config File
You have to update the following lines:
// number of classes
num_classes: 1 // set it to total number of classes you have
// path of pre-trained checkpoint
fine_tune_checkpoint: "E:/demo/pre_trained_model/ssd_mobilenet_v1_quantized_300x300_coco14_sync_2018_07_18/model.ckpt"
// path to train tfrecord
tf_record_input_reader {
input_path: "E:/demo/annotations/train.record"
}
// number of images that will be used in evaluation process
eval_config: {
metrics_set: "coco_detection_metrics"
use_moving_averages: false
// I suggest setting it to total number of testing set to get accurate results
num_examples: 11193
}
eval_input_reader: {
tf_record_input_reader {
// path to test tfrecord
input_path: "E:/demo/annotations/test.record"
}
// path to label map
label_map_path: "E:/demo/annotations/label_map.pbtxt"
// set it to true if you want to shuffle test set at each evaluation
shuffle: false
num_readers: 1
}
๐ถ Training
๐ Now we have done all preparations
๐ Let the computer start learning
๐ป Open CMD and run:
# under (tf1) E:\models\research\object_detection\legacy>
# python train.py --train_dir=<DIRECTORY_TO_SAVE_CHECKPOINTS>
# --pipeline_config_path=<PATH_TO_CONFIG_FILE>
python train.py --train_dir=E:/demo/training --pipeline_config_path=E:/demo/training/ssd_mobilenet_v1_quantized_300x300_coco14_sync.config
๐ This process will take long (You can take a nap ๐คญ, but a long nap ๐)
๐ต๏ธโโ๏ธ While model is being trained you will see loss values on CMD
โ You can stop the process when the loss value achieves a good value (under 1)
๐ฎโโ๏ธ Evaluation
๐ณ Evaluating Script
๐คญ After training process is done, let's do an exam to know how good (or bad ๐) is our model doing
๐ฉ The following command will use the model on whole test set and after that print the results, so that we can do error analysis.
๐ป So that, open CMD and run:
# under (tf1) E:\models\research\object_detection\legacy>
# python eval.py --logtostderr --pipeline_config_path=<PATH_TO_CONFIG_FILE>
# --checkpoint_dir=<DIRECTORY_OF_CHECKPOINTS> --eval_dir=<DIRECTORY_TO_SAVE_EVAL_RESULTS>
python eval.py --pipeline_config_path=E:/demo/training/ssd_mobilenet_v1_quantized_300x300_coco14_sync.config --checkpoint_dir=E:/demo/training --eval_dir=E:/demo/eval
๐ Visualizing Results
โจ To see results on charts and images we can use TensorBoard for better analyzing
๐ป Open CMD and run:
๐ฉโ๐ซ Training Values Visualization
๐ง Here you can see graphs of loss, learning rate and other values
๐ค And much more (You can investigate tabs at the top)
๐ It is feasable to use it while training (and exciting ๐คฉ)
# under (tf1) E:\>
tensorboard --logdir=E:/demo/tarining
๐ฎโโ๏ธ Evaluation Values Visualization
๐ Here you can see images from your test set with corresponded predictions
๐ค And much more (You can inspect tabs at the top)
โ You must use this after running evaluation script
# under (tf1) E:\>
tensorboard --logdir=E:/demo/eval
๐ See the visualized results on localhost:6006 and
๐ง You can inspect numerical values from report on terminal, result example:
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.708
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.984
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.868
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.289
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.623
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.767
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.779
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.781
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.781
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.300
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.703
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.824
๐จ If you want to get metric report for each class you have to change evaluating protocol to pascal metrics by configuring
metrics_set
in.config
file:
eval_config: {
...
metrics_set: "weighted_pascal_voc_detection_metrics"
...
}
๐ Model Exporting
๐ง After training and evaluation processes are done, we have to make the model in such a format that we can use
๐ฆบ For now, we have only checkpoints, so that we have to export
.pb
file๐ป So, open CMD and run:
# under (tf1) E:\models\research\object_detection>
# python export_inference_graph.py --input_type image_tensor
# --pipeline_config_path <PATH_TO_CONFIG_FILE>
# --trained_checkpoint_prefix <PATH_TO_LAST_CHECKPOINT>
# --output_directory <PATH_TO_SAVE_EXPORTED_MODEL>
python export_inference_graph.py --input_type image_tensor --pipeline_config_path=E:/demo/training/ssd_mobilenet_v1_quantized_300x300_coco14_sync.config --trained_checkpoint_prefix E:/demo/training/model.ckpt-16438 --output_directory E:/demo/inference/ssd_v1_quant
If you are using SSD and planning to convert it to tflite later you have to run
# under (tf1) E:\models\research\object_detection>
# python export_tflite_ssd_graph.py --input_type image_tensor
# --pipeline_config_path <PATH_TO_CONFIG_FILE>
# --trained_checkpoint_prefix <PATH_TO_LAST_CHECKPOINT>
# --output_directory <PATH_TO_SAVE_EXPORTED_MODEL>
python export_tflite_ssd_graph.py --input_type image_tensor --pipeline_config_path=E:/demo/training/ssd_mobilenet_v1_quantized_300x300_coco14_sync.config --trained_checkpoint_prefix E:/demo/training/model.ckpt-16438 --output_directory E:/demo/inference/ssd_v1_quant
๐ฑ Converting to tflite
๐โโ๏ธ If you want to use the model in mobile apps or tflite supported embedded devices you have to convert
.pb
file to.tflite
file
๐ About TFLite
๐ฑ TensorFlow Lite is TensorFlowโs lightweight solution for mobile and embedded devices.
๐ง It enables on-device machine learning inference with low latency and a small binary size.
๐ TensorFlow Lite uses many techniques for this such as quantized kernels that allow smaller and faster (fixed-point math) models.
๐ Official site
๐ซ Converting Command
๐ป To apply converting open CMD and run:
# under (tf1) E:\>
# toco --graph_def_file=<PATH_TO_PB_FILE>
# --output_file=<PATH_TO_SAVE> --input_shapes=<INPUT_SHAPES>
# --input_arrays=<INPUT_ARRAYS> --output_arrays=<OUTPUT_ARRAYS>
# --inference_type=<QUANTIZED_UINT8|FLOAT> --change_concat_input_ranges=<true|false>
# --alow_custom_ops
# args for QUANTIZED_UINT8 inference
# --mean_values=<MEAN_VALUES> std_dev_values=<STD_DEV_VALUES>
toco --graph_def_file=E:\demo\inference\ssd_v1_quant\tflite_graph.pb --output_file=E:\demo\tflite\ssd_mobilenet.tflite --input_shapes=1,300,300,3 --input_arrays=normalized_input_image_tensor --output_arrays=TFLite_Detection_PostProcess,TFLite_Detection_PostProcess:1,TFLite_Detection_PostProcess:2,TFLite_Detection_PostProcess:3 --inference_type=QUANTIZED_UINT8 --mean_values=128 --std_dev_values=128 --change_concat_input_ranges=false --allow_custom_ops
๐ Common Issues
๐ฅ
nets module issue
ModuleNotFoundError: No module named 'nets'
This means that there is a problem in setting PYTHONPATH
, try to run:
(tf1) E:\models\research>set PYTHONPATH=E:\models\research;E:\models\research\slim
๐๏ธ tf_slim module issue
ModuleNotFoundError: No module named 'tf_slim'
This means that tf_slim module is not installed, try to run:
(tf1) E:\models\research>pip install tf_slim
๐๏ธ Allocation error
2020-08-11 17:44:00.357710: I tensorflow/core/common_runtime/bfc_allocator.cc:929] Stats:
Limit: 10661327
InUse: 10656704
MaxInUse: 10657688
NumAllocs: 2959
MaxAllocSize: 3045064
For me it is fixed by minimizing batch_size in .config
file, it is related to your computations resources
train_config: {
....
batch_size: 128
....
}
โ no such file or directory error
train.py tensorflow.python.framework.errors_impl.notfounderror no such file or directory
๐ For me it was a typo in train.py command
๐คฏ LossTensor is inf issue
LossTensor is inf or nan. : Tensor had NaN values
๐ Related discussion is here, it is common that it is an annotation problem
๐ Maybe there is some bounding boxes outside the image boundaries
๐คฏ The solution for me was minimizing batch size in
.config
file
๐ Ground truth issue
The following classes have no ground truth examples
๐ Related discussion is here
๐ฉโ๐ง For me it was a misspelling issue in
label_map
file,๐ Pay attention to small and capital letters
๐ท๏ธ labelmap issue
ValueError: Label map id 0 is reserved for the background label
๐ฎโโ๏ธ id:0 is reserved for background, We can not use it for objects
๐ start IDs from 1
๐ฆ No Variable to Save issue
Value Error: No Variable to Save
๐ Related solution is here
๐ฉโ๐ง Adding the following line to
.config
file solved the problem
train_config: {
...
fine_tune_checkpoint_type: "detection"
...
}
๐งช pycocotools module issue
ModuleNotFoundError: No module named 'pycocotools'
๐ฅด pycocotools type error issue
pycocotools typeerror: object of type cannot be safely interpreted as an integer.
๐ฉโ๐ง I solved the problem by editing the following lines in
cocoeval.py
script under pycocotools package (by adding casting)๐ฎโโ๏ธ Make sure that you are editting the package in you env not in other env.
self.iouThrs = np.linspace(.5, 0.95, int(np.round((0.95 - .5) / .05)) + 1, endpoint=True)
self.recThrs = np.linspace(.0, 1.00, int(np.round((1.00 - .0) / .01)) + 1, endpoint=True)
๐ฃ Loss Exploding
INFO:tensorflow:global step 440: loss = 2106942657570782838784.0000 (0.405 sec/step)
INFO:tensorflow:global step 440: loss = 2106942657570782838784.0000 (0.405 sec/step)
INFO:tensorflow:global step 441: loss = 7774169971762292326400.0000 (0.401 sec/step)
INFO:tensorflow:global step 441: loss = 7774169971762292326400.0000 (0.401 sec/step)
INFO:tensorflow:global step 442: loss = 25262924095336287830016.0000 (0.404 sec/step)
INFO:tensorflow:global step 442: loss = 25262924095336287830016.0000 (0.404 sec/step)
๐ For me there were 2 problems:
First:
Some of annotations were wrong and overflow the image (e.g. xmax > width)
I could check that by inspecting
.csv
fileExample:
filename
width
height
class
xmin
ymin
xmax
ymax
104.jpg
640
480
class_1
284
406
320
492
Second:
Learning rate in
.config
file is too big (the default value was big ๐)The following values are valid and tested on
mobilenet_ssd_v1_quantized
(Not very good ๐)
learning_rate: {
cosine_decay_learning_rate {
learning_rate_base: .01
total_steps: 50000
warmup_learning_rate: 0.005
warmup_steps: 2000
}
}
๐ฅด Getting convolution Failure
Error : Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
It may be a Cuda version incompatibility issue
For me it was a memory issue and I solved it by adding the following line to
train.py
script
os.environ['TF_FORCE_GPU_ALLOW_GROWTH'] = 'true'
๐ฆ Invalid box data error
raise ValueError('Invalid box data. data must be a numpy array of '
ValueError: Invalid box data. data must be a numpy array of N*[y_min, x_min, y_max, x_max]
๐ For me it was a logical error, in
test_labels.csv
there were some invalid values like:file123.jpg,134,63,3,0,0,-1029,-615
๐ท So, it was a labeling issue, fixing these lines solved the problem
๐ Related discussion
๐ Image with id added issue
raise ValueError('Image with id {} already added.'.format(image_id))
ValueError: Image with id 123.png already added.
โ It is an issue in
.config
caused by giving value tonum_example
that is greater than total number of test image in test directory
eval_config: {
metrics_set: "coco_detection_metrics"
use_moving_averages: false
num_examples: 1265 // <--- this value was greater than total test images
}
๐ง References
Last updated
Was this helpful?