Generative AI for Images

When I think of AI capabilities I tend to classify them in two. First, doing things that are impossible for humans. Second, doing things that humans can do, but faster or cheaper.

An example of the first is asking Google Bard to search across a million papers and answer questions about them. There’s no way a human can or will ever be able to read a million scientific papers to answer a question.

An example of the second is image generation. Using an image generation model is like having a team of photographers, graphic designers, and 3d artists on demand ready to create anything you can imagine.

Image generation models are especially good at generating image types that are somehow present in their training set. If an artist anywhere in the world has ever thought of something, or if you have ever seen a work of art before - no matter how original you thought it was at that time - the models can probably recreate it.

The variety of images that are being generated is astonishing. Here are some of my favorite prompts.

Pictures of people

Here is Anya:

Young woman

And here she is in a variety of action scenes. Keep in mind that even when using an image as part of the prompt the character may not look exactly the same.

Young woman in different scenes

We can also get multiple shots from different angles in the same image.

Young woman in various angles

And we can change the seed while keeping the prompt. To see the effect of changing words or the word order. Keep in mind that even with the same seed and very similar prompts the images may vary substantially. The images below were all generated from the same seed changing only one word (the color of the hair).

Young women

Generative AI models are very good creating pictures of people. Maybe because training datasets include so many people. And this is one of the most significant use cases. Getting professional quality photos of people can be expensive.

One interesting use case (which seems to be very popular in social media) is style transfer applied to photos.

For example putting the same character in a painting:

Young women

Pictures of other things

Apart from people we can use these models for all sorts of other things. But they will tend to work best for things that someone, somewhere, has already imagined before.

For example macro photos:

Bee

Or drawings

Bee

Or very refined illustrations. For example in the style of Alphonse Mucha (one of my favorite artists)

Illustration

One category with a lot of interesting potential is architecture.

Building

And also interior design.

Building interior

And in particular furniture. The training datasets must contain tons of photos of furniture catalogs.

Living room

An unexplored category (at least compared with Instagram) is food

Food

Also calligraphy

Letters

These models are specially useful in the initial ideation stages for videogames or other media products. They are probably not ready yet to be used for production assets, but they are very quick to come up with a wide variety of ideas.

Game room

What’s next?

One of the main problems with these models is that it’s still very hard to control the composition of the scenes. In general they are good following directions about style and lightning, but it’s hard to insert a specific character into a scene.

House with woman

Finally, something to be worried about. With the quality becoming so good it’s increasingly difficult to distinguish real photos from AI generated images. Of course this will have wide implications for society, and what we can trust. Even when there is supposedly “visual evidence”.

Written on April 16, 2023