Thursday, September 08, 2022

AI Imaging Tech Will Change Everything

Pardon the hyperbole in the title, but in this case, I think it's justified.

In the last few months, I've been hearing more and more about a new AI-based imaging technology (known as imaging synthesis modelling or ISM). You may have heard of programs like DALL-E 2 or Midjourney or seen some of the images that people have created using them. For example, here's a cat picture created by SF author John Scalzi using Midjourney.  

Cat by John Scalzi using Midjourney

As I person with no artistic talent other than photography, I find this quite impressive. 

Now there's a new ISM tool called Stable Diffusion that is getting a lot of notice because it's open source and can be run on a moderately powerful home PC (my son's gaming rig would have no trouble running it). And because it's open source, people are finding all sorts of interesting uses for it. Ars Technica has published a long article that looks at what Stable Diffusion can do, what people are using it for now, where it might go in the future, and some of the implications. Some of them are rather disquieting.

As hinted at above, Stable Diffusion's public release has raised alarm bells among people who fear its cultural and economic impact. Unlike DALL-E 2, Stable Diffusion's training data (the "weights") are available for anyone to use without any hard restrictions. The official Stable Diffusion release (and DreamStudio) includes automatic "NSFW" filters (nudity) and an invisible tracking watermark embedded in the images, but these restrictions can easily be circumvented in the open source code. This means Stable Diffusion can be used to create images that OpenAI currently blocks with DALL-E 2: propaganda, violent imagery, pornography, images that potentially violate corporate copyright, celebrity deepfakes, and more. In fact, there are already some private Discord servers dedicated to pornographic output from the model.

But wait. There's more!

The synthesis technology used by tools like Stable Diffusion isn't limited to just images. People are already working on adapting it to video. 

 Stable Diffusion and other models are already starting to take on dynamic video generation and manipulation, so expect photorealistic video generation via text prompts before too long. From there, it's logical to extend these capabilities to audio and music, real-time video games, and 3D VR experiences. Soon, advanced AI may do most of the creative heavy lifting with just a few suggestions. Imagine unlimited entertainment generated in real-time, on demand. "I expect it to be fully multi-modal," said Mostaque, "So you can create anything you can imagine, like the Star Trek holodeck experience."

ISMs are also a dramatic form of image compression: Stable Diffusion takes hundreds of millions of images and squeezes knowledge about them into a 4.2GB weights file. With the correct seed and settings, certain generated images can be reproduced deterministically. One could imagine using a variation of this technology in the future to compress, say, an 8K feature film into a few megabytes of text. Once that's the case, anyone could compose their own feature films that way as well. The implications of this technology are only just beginning to be explored, so it may take us in wild new directions we can't foresee at the moment

Hold on to your hats, folks. To paraphrase Al Jolson: "You ain't seen nothin' yet."



No comments: