How to generate photorealistic images with DALL-E 2

How to generate photorealistic images with DALL-E 2

First of all: Yes, with DALL-E you can achieve impressive photorealistic images. But the question is: What do you understand by the term “photorealism”?

Author: Vladimir Alexeev

Our perception is oversaturated by the media. Our expectations might embrace “the same quality as in real life”. But that’s, between you and me, a big lie. A good photograph doesn’t transfer reality from “real life” to the photo paper or digital file. Instead, it’s a staging reality: a specific angle, lighting, lens, etc.

In short, what you see is not “reality”; it’s instead an interpretation of a photograph.

With DALL-E 2 we get an Artificial Interpretation of our world. To stay simple and superficial here, let’s segment photorealism into

  • Emulating Reality: approach to bring an image most convincingly (aligning with viewers’ expectations and experiences)
  • Emulating Medium: meta-approach to simulate different photo techniques, cameras, and styles.

A realistic Lomography does not look photorealistic, but it should convince us of its “realism”. And DALL-E can do it.

Emulating reality: What’s in a prompt?

If you enter a Content Prompt without any modifiers, and this content has a relatively objective or figural character, you will get already photorealistic images.

For example, entering “An Apple” will get a series of photorealistic apple images. No more and no less.

Indeed, if you add Modifier “by Magritte”, this supplement will dramatically change the entire character of the prompt:

Recommended articles

OpenAI’s DALL-E 2 develops a hidden vocabulary

Things will become complicated if you try to create paradox images, which undoubtedly weren’t within the dataset for DALL-E training, like: A cat driving a bicycle.

Here you see how DALL-E tries to reproduce your prompt but fails. You can help AI by adding an artist modifier: A cat driving a bicycle, an illustration by Michael Sowa.

Anthropomorphism of animals is typical for book illustrations, so such a task is easy for DALL-E with the appropriate modifier.

Sure, everything is possible — and with the right prompt, you can create a photograph of a cat driving a bicycle, for example, adding a correctional modifier “but as photography”: A cat driving a bicycle, an illustration by Michael Sowa, but as photography.

Now we have, even if not wholly, almost achieved the photorealism of our demanded vision:

  • We created a content (cat on a bicycle)
  • We let it fantasize about non-real, absurd situations via an “illustration” trick
  • We brought this weird vision back to “photographic” realms by the final modifier.

But what about photorealism? About emulation of Reality?

The magic of the lens

DALL-E users exchange ideas, observations, and experiences in our Discord. One of the interesting discoveries by DALL-E Discord community was the following: If you add lens specifications as modifiers, you will get the especially “photorealistic” images, typical for photography shoots with these specifications.

Either the training dataset for DALL-E was very well labeled, or it even considered meta-data in the image files. Here are lens examples (thank you, Sharif).

Note: Due to OpenAI’s rules, we do not publish photorealistic human portraits. But we can do it with animals and objects.

Sigma 85 mm f/1.4 – good for a portrait lens

  • A portrait of a dog in a library, Sigma 85 mm f/1.4.
  • A bitten apple hanging from the branch of an apple tree, Sigma 85 mm f/1.4
  • A plastic cup on the sidewalk of a big city, Sigma 85 mm f/1.4

This is what photorealism looks like. You can literally see every hair in the dog’s fur. And the library background is a gorgeous bokeh.

Sigma 85 mm f/8 – less depth of field and sharper background (less bokeh)

  • A portrait of a dog in a library, Sigma 85 mm f/8
  • A bitten apple hanging from the branch of an apple tree, Sigma 85 mm f/8
  • A plastic cup on the sidewalk of a big city, Sigma 85 mm f/8

Mind how the background is shimmering through the translucent plastic cup.

Sigma 24 mm f/8 – wider angle, smaller focal length

  • A portrait of a dog in a library, Sigma 24 mm f/8
  • A bitten apple hanging from the branch of an apple tree, Sigma 24 mm f/8
  • A plastic cup on the sidewalk of a big city, Sigma 24 mm f/8

Sigma 24 mm f/8, 1/10 s shutter speed – motion blur, slower shutter speed

If you want to capture somebody in movement, this is the right setting.

  • Running dog in a library, Sigma 24 mm f/8, 1/10 s shutter speed
  • A bitten-into apple flutters in the strong wind on the branch of an apple tree, in motion blur, Sigma 24 mm f/8, 1/10 sec.
  • A plastic cup is drifted by wind on sidewalk of a big city, Sigma 24 mm f/8, 1/10 sec.

Interestingly, DALL-E hesitated to blur the apple, so we have to explicitly add “in motion blur” for more motion. Probably there were not too many blurred apple images in the dataset (since we previously sorted them out as “failed shot”).

Sigma 24 mm f/8 1/1000 sec shutter speed – motion but sharp image – with slower shutter speed

  • Running dog in a library, Sigma 24 mm f/8 1/1000 sec. shutter
  • A bitten-into apple, captured in the moment of falling down, Sigma 24 mm f/8, 1/10 sec shutter
  • A plastic cup with liquid being captured in the moment of being overturned by wind on sidewalk of a big city, Sigma 24 mm f/8, 1/1000 sec shutter

Interestingly, in the case of the dog image, here we see a phenomenon of disintegration — the image is sharp but losing its photorealism.

Looking for a photo meta-data might bring you more ideas about achieving the quality you want. For example, using this architectural setting, you can re-create convincing interior photos:

Interior of a bright apartment with bookshelves, paintings and windows overlooking the megapolis, Nikon D810 | ISO 64 | focal length 20 mm (Voigtländer 20 mm f3.5) | aperture f/9 | exposure time 1/40 Sec (DRI)

Finding the right settings

Using popular photo collections like Unsplash or Flickr, you can learn more about settings since the meta-data is always included within the image description. An example is this wonderful photo of the Japanese Momiji.

According to Flickr, the following camera settings were in use: Autumn Momiji, Nikon D810, ƒ/2.5, focal length: 85.0 mm, exposure time: 1/800, ISO: 200

So let’s try to reproduce the motif and settings:

Or let’s create a photo with dancing people, as in this photo. Dancing people in the evening, seen from behind, sunset, Canon EOS 1000D, ƒ/3.5, focal length: 18.0 mm, exposure time: 1/5, ISO 400, flash on.

Light trails

If you want to create a night photo of a car with light streaks, you need to work with ISO: A car passes the photographer at night with lights, seen from outside, 24 mm, f8, 1.6 s, ISO 1000.

Telephoto lens? But of course! This beautiful moon shot was taken with the following settings. Let’s try to make it more interesting and add a bird. Photo of a moon with a flying bird in the foreground, Canon EOS Digital Rebel XTi, 100-300 mm Canon f/5.6, exposure time: 1/160, ISO 400

You can endlessly try out different lenses, apertures, and ISO values. The main thing is your idea and concept of what and how it should look like.

Studio light

Another great trick is to use the Studio Light modifier. Let’s just compare the prompt “One apple” and the prompt “One apple, studio light”.

Every ridiculous and boring object (sorry, Apple) becomes profound and visually striking.

I suppose, in the dataset, there were so many studio photographs that DALL-E knows meanwhile how to create a perfect image. We are still at the beginning. As you see, DALL-E can reproduce “photorealistic” images in very manifold and interesting ways (in the meaning “emulated reality”). For more updates on AI and art, check out Merzmensch Kosmopol on Twitter.