Thoughts about AI

I have been using Midjourney for over a year for various purposes - initially out of curiosity, but then quite quickly to create image material for a wide variety of projects. I won't start the discussion here about where the whole AI topic will lead in terms of image and film generation, because the systems are here to stay. So let's use them to our advantage - that's how I see it at the moment.

The fact that different AI models can also be used to create image descriptions is nothing new. Since mid-2023, for example, you can upload an image to Midjourney with the "/descripe" function or provide it via a link and after a few seconds it will display the result of how Midjourney sees the image. The function is very helpful for creating your own prompts for image generation by "feeding" a similar image with the desired result into the AI. The results then usually go in the desired direction and you learn quite well what is important when creating the prompts.

I have been using this function for a long time, e.g. to create ALT tags for images on websites. The output from the AI is compact, contains the essentials and with a few adjustments you can create a larger data set of descriptions in a short time.

Last week, however, I was a little taken aback. Midjourney not only produced an image description, but also a kind of interpretation of the image.

Example 1: A picture of an abandoned greenhouse.

An abandoned greenhouse surrounded by overgrown vegetation, captured in an old film style with soft focus and natural lighting. The scene is set against the backdrop of mountains, adding to its timeless feel. This photograph captures nature's resilience as it rekyers around the structure, symbolizing life on everything that once was.

"An abandoned greenhouse surrounded by overgrown vegetation, captured in an old film style with soft focus and natural lighting. The scene is set against the backdrop of mountains, adding to its timeless feel. This photograph captures nature's resilience as it rekyers around the structure, symbolizing life on everything that once was."

OK the part about the timeless feel because of the mountains is a bit strange, but the rest is quite amazing. The trigger is certainly the dilapidated greenhouse and the plants. The photo has certainly been taken umpteen times in this form, but Midjourney's data model is probably also enriched with image interpretations and not just keywords.

Example 2: A hotel room in East Germany.

A simple black and white photograph of an empty bed in front of a window, with light seeping through vertical blinds. The room is dark but has subtle hints of color from outside. There is no one around, creating a feeling of solitude or contemplation.

"A simple black and white photograph of an empty bed in front of a window, with light seeping through vertical blinds. The room is dark but has subtle hints of color from outside. There is no one around, creating a feeling of solitude or contemplation."

I could hardly have done it better myself. The picture is described perfectly in just a few words. The title Image of this blog has been generated based on the image description.

Example 3: On the race track at the LeMans 24-hour race

A black and white photo of the Le Mans crowd in front of their stands with Dunlop written on it, shot from behind them at night, capturing the excitement as they watch an intense race between two cars. The scene is illuminated by spotlights that cast long shadows over the grandstand's roof. The shot was taken during evening time, the atmosphere conveys energy and anticipation among fans, creating a sense of place within the ambiance of live racing action.

"A black and white photo of the Le Mans crowd in front of their stands with Dunlop written on it, shot from behind them at night, capturing the excitement as they watch an intense race between two cars. The scene is illuminated by spotlights that cast long shadows over the grandstand's roof. The shot was taken during evening time, the atmosphere conveys energy and anticipation among fans, creating a sense of place within the ambiance of live racing action."

That's a long way from a purely factual description of the image.

 

Besides "interpretation"

But even independent of the "interpretation", the results of the image description are for the most part quite astonishing. For example, I sent an image from a market hall in Hanoi through the interpreter, and Midjourney even located the image quite accurately.

"Black and white street photography of the interior view inside Hanoi old market. Walking people with different accessories on their heads. A wide angle lens captures a fish eye, low angle shot. Multiple goods are stacked in piles around them. Light comes from the top left corner, creating a dark mood with strong contrast between shadows and lights."

I am curious to see where the journey will take us here. Google and the like have certainly been using similar techniques to index images on the web for a long time, but end users have had very few similar options to date.

For a SEO-test I set up a page on this site - the catalogue - which contains all images that already have a generated description. I have an eye on the Google Search Console if this has some effect. If so I'll let you know.