Last month, a team of software engineers working for Google released images created by programs designed for image recognition software[1a]. Image recognition software works by using artificial neural networks, which attempt to mimic neural networks in the brain. Information is input, artificial neurons process the image, and the identification is output.
If you want to produce software that can identify human faces, for example, then you would input millions of pictures, some identified as human faces and some not, until the software ‘learns’ what a human face looks like.
Google’s artificial neural network typically has 10-30 layers of artificial neurons that can communicate and learn from each other, so that information gets more complex as the layers get higher. Software engineers adjust the parameters until the correct output is given but do not fully understand what is going on in individual neural layers[1b].
In order to gain a better understanding of how the software works, Google’s software engineers have run the program a different way. Rather than getting it to recognise an image of a banana, for example, they got it to ‘draw’ what it thought a banana looks like. In order to do this, they imputed an image of ‘noise’, like television static, and then adjusted the image, pixel by pixel, until an image was created that the software identified as a banana.
In some cases, the software got the image wrong. It was not able to identify dumbbells, for example, without showing an arm attached. Now that they know this, the software engineers can make the software more accurate by showing it more images of dumbbells that are not being held.
In order to see what’s going on in different layers of the artificial neural network, they got the software to enhance what it ‘saw’ at specific layers. They found that lower layers, which have less complex information, tended to see basic outlines.
The images begun to look stranger, however, when higher layers were chosen. These layers were looking for whole images, many of which would usually be dismissed by the time the software reached output.
This is similar to when humans see images of objects in clouds, in a process known as pareidolia. We might think a cloud looks like an object, but we understand that it's not really that object, and so we would still identify the image as a cloud.
In this case, however, the engineers programed the network to enhance the images it thought it saw so that they would not be dismissed and would be evident in the output. The output image reflects whatever the artificial neural network was trained to identify, and so many output images contain animals and buildings.
Some have noted that these images look similar to experiences people have while on LSD, or other psychedelic substances. This may not be a coincidence. While we do not fully understand how these substances work, there is evidence that they inhibit parts of our brain that would otherwise filter this information out, so, just like with the artificial neural network, we have to make a ‘best guess’ based on less advanced information.
The software engineers behind these images have since released their code to the public so that anyone (with the right software and an understanding of how to code) can generate these images. Many examples can be found under #deepdream.