DIY a classifier with Tensorflow

4 min readFeb 18, 2020

What if I’m lonely and I want to play a rock-scissor-paper game?

Here is the answer for lonely nerds like me: train a classifier with TF then play with myself.

But how to do this? Let’s start to roll our sleeves up and start to get our hands dirty. In the end we’ll build something like this.

The big plan

Let’s plan something big. It’s easy.

Build a model that can recognise rock paper scissors
Transform the model to a form that can be put into the browser
Find an existing github project that can capture the image and return something(thanks github!)
Link the model with the project code, build the website locally.
Put it on Cloud to make it available to the world.

Prepare environment and code

Install git, Python, pip, virtualenv. (These are pretty standard)

Create a virtualenv called venv and install all the packages we need there.

curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
python get-pip.py
sudo pip3 install virtualenv 
virtualenv venv 
source venv/bin/activate
pip install tensorflow

Build the model

We’ll use existing code, which is from a github directory called tensorflow-for-poets-2.

git clone https://github.com/googlecodelabs/tensorflow-for-poets-2.git

Prepare training videos

We need lots of images to train the model, and a video would be perfect because we can extract loads of images from a video. So let’s record some videos with your hands and put them in 3 folders called rock, paper, scissor.

Training videos need to be put under the right directory for it to be working, let’s just create the folder training_images under tf_files, then put these 3 folders under it.

Then we create a script video2image.py to transform the videos into images. Let’s run the following command to spin up the images.

(py3) macbookpro3:scripts yiling$ python video2image.py

Train the model

We’ll do transfer training on top of mobilenet_0.50_224.

The MobileNet is configurable in two ways:

Input image resolution: 128,160,192, or 224px. Unsurprisingly, feeding in a higher resolution image takes more processing time, but results in better classification accuracy.
The relative size of the model as a fraction of the largest MobileNet: 1.0, 0.75, 0.50, or 0.25. (codelab)

IMAGE_SIZE=224
ARCHITECTURE="mobilenet_0.25_${IMAGE_SIZE}"
MODEL_NAME=grab
python -m scripts.retrain \
  --bottleneck_dir=tf_files/bottlenecks \
  --how_many_training_steps=500 \
  --model_dir=tf_files/models/ \
  --summaries_dir=tf_files/training_summaries/"${ARCHITECTURE}" \
  --output_graph=tf_files/${MODEL_NAME}.pb \
  --output_labels=tf_files/${MODEL_NAME}_labels.txt \
  --architecture="${ARCHITECTURE}" \
  --image_dir=tf_files/training_images

Model in the action

Given a picture like this:

python -m scripts.label_image \
 — graph=tf_files/${MODEL_NAME}.pb \
 — image=tf_files/training_images/paper/VID_20190820_2000145.png

This is what the model returns:

2019–08–21 15:47:56.580116: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMAEvaluation time (1-image): 0.234sscissor (score=1.00000)rock (score=0.00000)paper (score=0.00000)

Transform to TFJS model

Now we’ve got a frozen model. Currently on github, to convert a frozen model into TFJS model you need to downgrade your python to 3.6.8, create another virtual environment, then install packages:

pip install tensorflowjs==0.8.6
pip install tf-nightly-2.0-preview

Find out output_node_names using python script.

def load_graph_def(model_file):
 graph = tf.Graph()
 graph_def = tf.GraphDef()
 with open(model_file, “rb”) as f:
 graph_def.ParseFromString(f.read())
 with graph.as_default():
 tf.import_graph_def(graph_def)
 return graph_defgraph_def = load_graph_def(tf_files/retrained_graph.pb)for node in graph_def.node:
  print(node.name)

You can see this output, which indicates the output_node_names is final_result. We’ll need this as an input for the tfjs-converter.

input
MobilenetV1/Conv2d_0/weights
MobilenetV1/Conv2d_0/weights/read
MobilenetV1/MobilenetV1/Conv2d_0/convolution
…
final_training_ops/Wx_plus_b/add
final_result

Run the following script to convert.

tensorflowjs_converter \
    --input_format=tf_frozen_model \
    --output_json=true \
    --output_node_names='final_result' \
    --saved_model_tags=serve \
    tf_files/${MODEL_NAME}.pb \
    tf_files/${MODEL_NAME}_web_model

If you used another version of python/tensorflow to train the frozen model there would be some errors like this, which is just indicating that the interpreter can’t understand your graph binary.

ValueError: NodeDef mentions attr ‘explicit_paddings’ not in Op<name=Conv2D; signature=input:T, filter:T -> output:T; attr=T:type,allowed=[DT_HALF, DT_BFLOAT16, DT_FLOAT, DT_DOUBLE]; attr=strides:list(int); attr=use_cudnn_on_gpu:bool,default=true; attr=padding:string,allowed=[“SAME”, “VALID”]; attr=data_format:string,default=”NHWC”,allowed=[“NHWC”, “NCHW”]; attr=dilations:list(int),default=[1, 1, 1, 1]>; NodeDef: {{node MobilenetV1/MobilenetV1/Conv2d_0/convolution}}. (Check whether your GraphDef-interpreting binary is up to date with your GraphDef-generating binary.).

Now your under your web_model, model.json would appear. Upload them to Google Cloud Storage, set the visibility to public, also change the CORS setting.

echo '[{"origin": ["*"],"responseHeader": ["Content-Type"],"method": ["GET", "HEAD"],"maxAgeSeconds": 3600}]' > cors-config.jsongsutil cors set cors-config.json gs://YOUR_BUCKET_NAME

Replace YOUR_BUCKET_NAME with your actual bucketname in the following command to update the cors-settings from your bucket. (See here)

Now we’ve got our models nicely uploaded on GCS, and they are accessible by anyone! It’s time to look into a web framework to embed our model.json

https://storage.googleapis.com/user_yiling/model.json
https://storage.googleapis.com/user_yiling/group1-shard2of2
https://storage.googleapis.com/user_yiling/group1-shard1of2

Web framework

It’s the pain of every creative engineer. They’ve built amazing things but when they are shown it’s always either without UI or with an UI that is so ugly that no one else gets excited. Let’s find some nice web framework to showcase the model!

Looking on github, this project is really interesting. It let’s you to point a camera and gets an object returned by Vision API, then used Google Translate to translate it into other langauges. It’s using choo.js API, it’s a nice lovely easy to use javascript API. (I’ve never used javascript before but I still followed this tutorial and understood Choo.js)

I encountered some error messages like:

Regenerator Runtime is not defined

And solved it by installing @babel/preset-env Node.js package with npm.

Modify the call to Vision API to a call using our own model.js, now your app can run successfully locally! Let’s make it available to the world by putting it on Google Cloud.

Put your app on Google Cloud

npm run build 
gcloud app deploy

Congrats! You’ve done it! You can even change the images as input and get more fun out of it.

I’m planning to train it on the Chinese 12 animals so that I know my silly friend looks more like a dog or a chicken!