Wally Part 2

Posted: 7 April 2016

I was reminded of this task as I had to do a write up on this. As such, I went back to fix the data sets! Unfortunately, I really wanted to put a collage that looks like this to show you the information.


However, this was hardcoded for this task. I haven’t gotten down to writing a nice script with a GUI to do collages. In any case, the images are similar to my previous post on this. I won’t talk much about the data set, I’ll talk more about the results obtained.

I have switched to using DIGITS instead of TensorFlow. DIGITS is really quite a neat tool that allows one to train and test neural networks very quickly. I highly recommend it.

The neural network achieved an accuracy of 100% within 1 epoch, which is pretty amazing. I hypothesize that this happens because it’s a binary classification problem and I only have 10 unique wally faces.

Graph from DIGITS

I proceeded to test some sample images on this neural network.

Test Image 1

Naturally, I first tested on this image. I was wondering if it would classify it correctly. To my horror, it classified it as 54% background. I further analyzed the situation and postulated that it could be due to the wrong image aspect ratio. I changed it to a square aspect ratio and centered it around Wally, and it was a 99% hit! This shows that it is not entirely robust.

Test Image 2

I needed to test on more images. Hypothetically, it would also identify Wanda as Wally. I cropped a 60x60 patch and tested it but it was classified as background. I had a hunch that it was because Wanda’s face is relatively smaller. I should make it such that her face is roughly to scale when scaled to a 30x30 patch for the neural network. I cropped a 40x40 patch and this time, it classified it as 100% Wally!

Random Old Man

To further test this neural network, I tested it on a random old man. Well, it was 100% background. I’m quite sure it’s classifying Wallys pretty well if Wally is well-centered.

And therein lies the problem of many image recognition tasks now. Classification is relatively easy once we have the patches. However, getting the patches is still a very tough problem. Running a sliding window through the entire thing does not help as one would get a heat map that would have a lot of noise as well. I learnt this from a project on detecting text in natural scenes.