This is something I've been mulling over ever since being invited into the early access for DALL-E 2 by OpenAI. The idea is simple but novel: type some words describing something, press submit and then receive a freshly synthesized image of that something in mere seconds. 'Freshly' is a very important operative word here, because the resulting images are not directly derivative of prior art. They are not parts of existing images slapped together - which some people incorrectly believe - and every result is different.
The results from earlier image synthesis tools were impressive, but very low resolution and often prone to weird flaws, such as blatantly repeating textures or humans with ten fingers on each hand. Focusing on the flaws of this tech is foolish: I cannot think of any other technology that has advanced so rapidly. This sort of progress is truly unprecedented -- even chip manufacturing and computing power cannot compare.
Fast forward to today, with the recent release of Midjourney v5. There's a good chance you've heard about it and that's because the results are shockingly good.
As you can see, shockingly good is not a hyperbole! They're still only about 1 megapixel in resolution and not good for printing, but you can expect that to change soon.
My thoughts about this technology are all over the place. The implications are vast and far reaching, touching a lot more than just art. At first I felt threatened and didn't consider it to be art. There is very little craft involved with these ready-to-use tools and the human touch is, in most cases, insignificant. The real artists are arguably the people writing the code, and certainly the people who created the images that the AI was trained on. But ultimately I've come to accept one thing that must be true: if it evokes some sort of emotional response in a viewer, that means it's art, whether we like it or not.
There are also some benefits of this technology that cannot be denied, such as increasing accessibility to art for people with severe disabilities. Image synthesis is going to democratize visual art similar to how smartphones democratized photography. Eventually these tools may allow non-verbal or non-communicative people to communicate or express themselves in ways that were previously impossible.
And it won't be just limited to images. We can now synthesize sounds, including voices and music, and we can generate anything text related, whether it be legalese or stories and poems or computer code. Are you a bad singer but thought of a funny song? Type in the lyrics and the AI will sing it for you in whatever voice you'd like. Need a stock image for your website? Not a problem, and you can customize it too. Want to create your own version of the classic game, Pong? Simply open up ChatGPT and ask it to make it for you. It will write the code required to do so within seconds, right in front of you. It won't be long before we have studio quality movies and games that are completely AI generated.
There are endless potential applications for AI, good and bad. I have some worries about the loss of jobs, which is always an issue whenever a novel form of automation becomes mainstream. If you're a photographer specializing in stock or product photography, it's probably time to expand your horizons. Entry level programming jobs will probably be on the chopping block soon too. A lot of people will focus on job loss because those impacts are more immediate, but I firmly believe it's the least of our worries. Why? Because some people are saying we're now entering a post-truth world.
Those people are wrong, sort of.
The reality is we entered the post-truth world years ago, thanks to social media algorithms, echo chambers and people believing whatever they want to believe, even going as far as creating entire personalities around it. The act of challenging their views or even simple disagreement is unacceptable because it becomes an attack on them. If that's how people react now, how will people react when that dial is cranked to eleven?
The fact is, we are now entering a post-experiential world. Now imagine the above problems, but combine it with infinite content that is infinitely customizable. Aural and visual evidence may become inadmissible in many cases.
For better or for worse, there will be any content you want, right at your fingertips, perfectly crafted in seconds or minutes. Individually tailor your favorite movies or create the sequel you always wished for; explore virtual worlds generated just for you and become friends with people that don't actually exist; digitally resurrect loved ones; record your dreams and then go live inside them in VR. These things are the tip of the iceberg.
This worries me a lot, because it's not a small iceberg. As a photographer, the idea of mindless consumption of AI generated content feels strangely dystopic. It's discomforting that we might not be able to discern fact from fiction. There is also the risk of accidentally plagiarizing someone by using these tools. Or even worse, intentional plagiarism. What happens when someone has a distinct style they're known for, and suddenly droves of people are replicating it by typing a few words into a computer program? An artist could have their livelihood ruined by this.
While the example images I've generated for this blog are mostly photographic, it's important to remember it doesn't have to be the case. If I want to create an image that looks like an Albert Bierstadt painting, all I need to do is add that to the prompt:
My hope is that arts and crafts directly rooted in reality and life experiences will see a significant increase in both cultural and monetary value. This would be a particularly great boon for photography considering it traditionally has never performed well in the fine art market, compared to other mediums like painting. For artists established prior to the rise of capable image synthesis tools, this will likely be easier, but for everyone getting started now it could be much more difficult.
If we want photography and similar mediums to be seen as valuable in the future, we will need standardized systems of provenance and content authentication, and we will need laws that protect people from the misuse of synthesized media. For example, perhaps it should be illegal for news broadcasts and governments to use AI generated media.
That being said, I'm finding myself unable to come up with a system that would make such provenance infallible. I do not believe we are far away from an AI that can generate raw files that mimic any camera's format. To a raw processing program, these files would appear identical to raws from the camera in question. Hopefully someone smarter than me can figure out a solution to that problem!
It's likely that this blog raised more questions than answers, but that was mostly the purpose -- I needed to finally organize my thoughts in some semi-coherent fashion. To cap things off, I want to talk briefly about how this all affects me personally. I've actually found playing around with Midjourney to be quite inspiring because it's making me want to find places similar to some of the ones I've generated. AI changes nothing about my love for being outdoors and using my camera to create photographs, because for me it's the experiences that I seek. My experiences then allow me to express myself.
Some of my favorite moments are when I'm just sitting around; when I'm listening, watching and observing; dialing in my camera settings and then waiting. Landscape photography is a very zen-like and meditative experience. Even using a smartphone instead of a traditional interchangeable lens camera would lessen that experience for me. And of course I cannot forget the obvious: being outside immersed in nature is incredibly therapeutic, with many benefits for mental and physical health.
I believe it's also important to document the world, even though I'm doing so with an artistic spin. Photography, no matter who the photographer is or how skilled they are, is a testament to humanity and our planet that we call home.
These are all things that AI cannot take from you or I, and I really hope other people appreciate that.