Post by account_disabled on Mar 10, 2024 13:05:06 GMT 5
Two years ago I was amazed to see how a software, GPT-3 , was able to generate meaningful texts, indistinguishable from those of a human, starting from a minimal initial input. Today, we are faced with a new technological leap because it has become possible to make a machine create images, simply by providing it with minimal textual information. The transition from text to image is even more astonishing because it makes the "magic" of creation more evident and questions us about the origin of creativity, about the meaning of art and authorship.
Extremely stimulating topics that I leave for a future discussion. For now I am interested India Mobile Number Data in introducing you to these new software, without going into technical details, but giving you some references so you can try them independently. We can divide them into two categories: the less sophisticated, but easily accessible ones (Dream, StarryAI, Craiyon) and the more sophisticated, but limited access ones (DALL-E 2, Imagen, Parti, Midjourney). How image generators work Image generators are based on generative adversarial networks or GANs (Generative Adversarial Networks). These are architectures in which two neural networks compete in a sort of zero-sum game. The network called Generator, starting from random numbers, has the task of processing realistic images, trying to deceive the Discriminator. The Discriminator network is trained to recognize pre-existing images, through the analysis of millions of appropriately labeled examples, with the aim of understanding whether those produced by the Generator are real or artificial.
Little by little, from attempt to attempt, the Generator learns to produce synthetic images that appear to have been created by a human. The most advanced "text to image" systems, such as those of OpenAI and Google's Imagen, use "diffusion models". Both start from a model capable of understanding complex sentences, not simple keywords. In the OpenAI system these sentences are passed to computers that use a model, called "prior", which has the task of generating "CLIP image embeddings" or rather "getting an idea" of those words (as happens to us humans when they ask us to draw a beach with umbrellas and boats on the horizon). Then these "CLIP image embeddings" are passed to another network which, based on a "Decoder Diffusion model" (unCLIP), begins to draw that idea in successive steps (see video below).
Extremely stimulating topics that I leave for a future discussion. For now I am interested India Mobile Number Data in introducing you to these new software, without going into technical details, but giving you some references so you can try them independently. We can divide them into two categories: the less sophisticated, but easily accessible ones (Dream, StarryAI, Craiyon) and the more sophisticated, but limited access ones (DALL-E 2, Imagen, Parti, Midjourney). How image generators work Image generators are based on generative adversarial networks or GANs (Generative Adversarial Networks). These are architectures in which two neural networks compete in a sort of zero-sum game. The network called Generator, starting from random numbers, has the task of processing realistic images, trying to deceive the Discriminator. The Discriminator network is trained to recognize pre-existing images, through the analysis of millions of appropriately labeled examples, with the aim of understanding whether those produced by the Generator are real or artificial.
Little by little, from attempt to attempt, the Generator learns to produce synthetic images that appear to have been created by a human. The most advanced "text to image" systems, such as those of OpenAI and Google's Imagen, use "diffusion models". Both start from a model capable of understanding complex sentences, not simple keywords. In the OpenAI system these sentences are passed to computers that use a model, called "prior", which has the task of generating "CLIP image embeddings" or rather "getting an idea" of those words (as happens to us humans when they ask us to draw a beach with umbrellas and boats on the horizon). Then these "CLIP image embeddings" are passed to another network which, based on a "Decoder Diffusion model" (unCLIP), begins to draw that idea in successive steps (see video below).