In a previous post we explained how we fed 1000 pictures to a StyleGAN network to create new images of new coffee cups. We took this experiment a step further and tried to feed an image of a coffee cup to a neural network that is designed to generate 3D objects based on 2D images of the object. We used an existing project, AtlasNet, developed by Thibault Groueix, Matthew Fisher, Vladimir G. Kim, Bryan C. Russell, and Mathieu Aubry (École des Ponts and Adobe Research).
AtlasNet is a neural network that reconstructs shapes of 2D images into 3D objects. In their project the researchers used different example objects to train their models, for instance chairs and airplanes. We used a pre-trained model to see if the network also recognizes a coffee cup. Although we spent quite some time figuring out how to prepare the project for generation, the first results were kind of disappointing. As we knew what their model had been trained on, we also tried images of airplanes found on the internet, in order to try and reproduce their results. However, these results also don’t resemble the quality of their outcomes, see the last image on the left. Another aspect we discovered is that the resolution is very important for the end result. In the images you can see 3 types of results. All based on the same image but with a different resolution. We found out that apparently, the researchers used very low resolution images for the construction of the models. The last attempt with a low resolution cup already worked out better.
To find out why we did not manage to obtain good results, we will have to dive deeper into the design of the model and the underlying code, as well as train our own model based on images of coffee cups. We think this is definitely worth investigating.