Posted: 10 November 2017
EDIT on 29th December 2017. Deep Learning research is moving so quickly. The idea I had for the paper I was trying to write was released by NVIDIA on December 4th 2017. They probably submitted it for CVPR 2018. Note that I’m not saying I could have achieved their results because I clearly could not in the 3 months! The paper’s results are really stunning.
I’ve worked as an RA for the past 3 months. If I could only say one thing that I learnt, it would be that reproducibility in machine learning is really difficult and the devil lies in the details.
I was interested in computer vision research as I’ve been doing it for the past 2 years and thought it would be a good time to write a paper on it. I explored various topics available and found sort of a niche where there were publications on it but not too many people were looking at it. My main considerations were to choose a topic that could potentially become a paper within 3 months. We started with the idea of multi-view inpainting using GANs. It was hard because I have no prior experience in 3D pipelines. If we did that, I would have to spend most of my time learning the 3D pipelines. I discussed this with my advisor and made a suggestion to pivot to the single view case but to try and improve on existing state-of-the-art methods. He supported the idea and I began exploring the latest CVPR 2016/2017 papers on it. Turns out, there were only 3 seminal papers so gaining a thorough understanding of each paper wasn’t that hard. Or so I thought.
I started off with the paper by Pathak et al. (CVPR 2016). It was the standard encoder decoder pipeline, with the only exception of a channel-wise fully connected layer that greatly reduces the amount of connections, and hence reduces memory and computation speed, of which memory is the more important factor. They used a masked L2 loss together with a GAN loss and had lambda to be 0.999 and 0.001 respectively. Their code was in Torch, something I wasn’t familiar with. I could run their code though so it wasn’t too bad.
The next paper was by Yang et al. (CVPR 2017). It models Pathak et al. with the exception of the architecture having a fully connected layer instead of a channel-wise fully connected one, and they opted to use ELUs instead of LeakyReLUs. This made more sense from a coding perspective (it’s a lot easier to code this!). I wasn’t sure that the channel-wise architecture actually had a significant effect as well. Following that, they opted to use this optimization technique that compares neural patches. This was a pretty neat idea I must say.
The next paper was by Yeh et al. (CVPR 2017). It’s a simple idea that’s pretty obvious in my opinion. Use a standard DCGAN, optimize the random vector such that it produces the closest image, and then inpaint it. It was a simple idea and Brandon Amos wrote a really great blog post about it.
Those are the papers in general. I won’t be talking much about how we are doing it as it’s not been published yet. I’d like to talk more about my learning points:
Once again, great learning experience overall and looking forward to writing more TensorFlow code!