From Giraffes to Dogs: unveiling the unpredictability of Computer Image Recognition

Most image recognition systems use what is known as Probabilistic Image Recognition, a technique used by computer systems to identify objects or patterns in images based on statistical probabilities. Rather than providing a definitive answer, the system assigns a probability value to each potential category or label associated with the image.

The process typically involves training the system on a large dataset of labelled images, where each image is associated with a known category. The system analyses these images and learns patterns, features and statistical relationships between the visual elements present in the images and their corresponding labels.

When presented with a new image, the system compares its visual features with the learned patterns and calculates the probability that it belongs to different categories. The probabilities are often calculated based on the similarity between the new image and the training examples. The system assigns a higher probability to the category that best matches the visual features of the image.

Alexander Turner, from the University of Nottingham, explains in a video how that image recognition by computers essentially assigns a probability value to each image for identification.

For example, if one of these software programs is shown a photo of a pair of glasses, the software responds that it falls into the category of “glasses” with a probability of 93%, but it does not exclude the possibility that it is a rocking chair or a staircase handrail, although with much lower probabilities.

Actually the software does not “know” what glasses or rocking chairs are, but it is solely based on the shapes and colors present in the image and compares them to millions of samples of images of glasses, rocking chairs, and handrails on which it has been trained: it measures how closely the proposed image aligns with one of the known categories and then selects the category with the highest probability of a match

This probabilistic approach, so far from human certainty, leads to an unexpected vulnerability in these image recognition systems. As Alexander Turner explains, the software typically assigns a very high probability to a single category and very low probabilities to other categories, but these assignments can be strongly influenced by a trick: simply change a few random pixels of the image and see if the probability of correct identification increases or decreases by a few decimal points. If it decreases, the changed pixel is retained, and another one is tried, and so on, repeatedly, keeping the altered pixels that decrease the probability of correct identification and increase the probability of incorrect identification.

Furthermore, the altered pixels that affect recognition have nothing to do with the object in the image but appear as a cloud of seemingly random colored dots. For example, starting from a photo of a giraffe, which the software correctly identifies as a giraffe with a 61% probability, change a few pixels here and there, perhaps even just in the background, and have the software identify the image as a dog with a 63% probability.

To human eyes, the photo still clearly shows a giraffe, but for the software, that giraffe is now a dog:

Turner continues his demonstration with a photo of a television remote control against a white background, which is correctly recognized by the software.

However, by strategically scattering colored pixels on the image, the software declares it to be a cup, assigning a staggering 99% probability to this identification:

The conclusion of this experiment is that not only do computers recognize objects very differently from humans, but there are images that completely confuse them even if they are unambiguous to our eyes and simply appear as photos of an object smeared with some randomly arranged dots.