Learning to Discriminate Between Images via Machine Learning: A Reflection from a Computer Scientist on Generative AI Models and Diffusion Models
The models have changed industry and users alike. A computer scientist at California Institute of Technology said this is an exciting time for generative models. She said that generative models are useful for downstream tasks, and that the realistic-looking images created by diffusion models can sometimes perpetuate social and cultural biases. [that] improve the fairness of predictive AI models.”
As we look closer at the ways that this proposal might impact both our tools and our relationship to them however, the long shadows of this seemingly convenient solution begin to take frightening shape.
Computer vision has been in development in some form since the mid-20th century. Initially, researchers attempted to build tools top-down, manually defining rules (“human faces have two symmetrical eyes”) to identify a desired class of images. These rules would be converted into a computational formula, then programmed into a computer to help it search for pixel patterns that corresponded to those of the described object. This approach, however, proved largely unsuccessful given the sheer variety of subjects, angles, and lighting conditions that could constitute a photo— as well as the difficulty of translating even simple rules into coherent formulae.
A more bottom-up process could be achieved via machine learning, thanks to an increase in publicly available images. With this methodology, mass aggregates of labeled data are fed into a system. This data is taken and the Algorithm learns to discriminate between the various categories designated by the researchers. This technique is much more flexible than the top-down method since it doesn’t rely on rules that might vary across different conditions. By training itself on a variety of inputs, the machine can identify the relevant similarities between images of a given class without being told explicitly what those similarities are, creating a much more adaptable model.
Still, the bottom-up method isn’t perfect. The systems are largely dependent on the data provided. As the tech writer Rob Horning puts it, technologies of this kind “presume a closed system.” Microsoft’s FaceDetect had a 20 percent error rate for dark-skinned women and its error rate for white, which was due to discrepancies in data. The ripple effects of these training biases on performance are the reason that technology ethicists began preaching the importance of dataset diversity, and why companies and researchers are in a race to solve the problem. As the popular saying in AI goes, “garbage in, garbage out.”
Had Dalle 2 Been Slurped by Goldfish, Coca-Cola, and Beaches? A Case Study in Neural Network Machine Learning
Ask DALL·E 2, an image generation system created by OpenAI, to paint a picture of “goldfish slurping Coca-Cola on a beach,” and it will spit out surreal images of exactly that. It is highly unlikely that the program would have seen images of Coca-Cola, goldfish, and beaches, since they did not come together during training. Something that might have made Dal proud is possible with DALLE 2.
Original story reprinted with permission from Quanta Magazine, an editorially independent publication of the Simons Foundation whose mission is to enhance public understanding of science by covering research developments and trends in mathematics and the physical and life sciences.
DALL·E 2 is a type of generative model—a system that attempts to use training data to generate something new that’s comparable to the data in terms of quality and variety. This is a difficult problem to solve in machine learning and requires a lot of effort.
Artificial neural networks were used to create the first important generative models for images. The models were unreliable and hard to train as the images got better. Two graduate students made technical breakthrough that brought the model to life, and the generative model was put on hold.
DALLE 2 is a beast, by the way. The key insight that makes DALL·E 2’s images possible—as well as those of its competitors Stable Diffusion and Imagen—comes from the world of physics. The system that underpins them is heavily influenced by nonequilibrium thermodynamics, which governs fluids and gases. “There are a lot of techniques that were initially invented by physicists and now are very important in machine learning,” said Yang Song, a machine-learning researcher at OpenAI.
To understand how data works for images, we have to start with a simple image that only contains two adjacent grayscale plams. We can describe the image in terms of the shade of thepixels, from a completely black 0 to a completely white 255. You can use these two values to plot the image as a point in 2D space.
Some images may emerge when we plot them as points, as they tend to occur more frequently than others. Think of a surface that the height matches how dense the clusters are. This surface maps out a probability distribution. You can find data points in the highest part of the surface and in the lowest part of the surface.