Getty Images/iStockphoto

News

Stability AI adopts new architecture in Stable Diffusion 3

The new version of the image model uses a different architecture than previous versions. It comes in different sizes and has better spelling capabilities.

Esther Shittu, News Writer

Published: 22 Feb 2024

Image-generating vendor Stability AI on Thursday introduced the latest version of its text-to-image model Stable Diffusion 3, touting it as more able to classify images easily and accurately and better represent text.

Available as an early preview, Stable Diffusion 3 is a suite of models that range from 800 million to 8 billion parameters.

It combines a diffusion transformer architecture and flow matching, the AI image vendor said.

Stable Diffusion 3 comes two weeks after Stability AI introduced Stability Cascade, a text-to-image model that uses the Würstchen diffusion model architecture.

The architecture allows for hierarchical compression of images, the vendor said.

Significance of a new architecture

While the generative AI vendor is taking a different approach with Stable Diffusion 3, the new diffusion transformer architecture addresses concerns about the previous version of its technology.

The diffusion transformer architecture helps the model use compute power more efficiently when it is being trained, Futurum Research analyst Keith Kirkpatrick said.

Flow matching enables the model to be trained on a probability path. For example, if someone asks the model to generate an image of a car, the model can easily classify the various aspects of the car and compartmentalize it.

The image-generating vendor has also improved the appearance and spelling of text within the generated image.

The new architecture helps the models address a problem with image-generating models, according to Gartner analyst Arun Chandrasekaran.

"One of the challenges that I think we've seen with these image integration models is they are great at creating photorealistic images, but they're not we're not very good at representing text within those images," Chandrasekaran said. "They're trying to create a more seamless image generation with ... texture and language representation in those images."

The different sizes of Stable Diffusion 3 provide choices for developers considering various tradeoffs between accuracy, performance and cost for different applications.

The 8 billion parameter size also makes Stable Diffusion's offerings comparable to those of bigger competitors such as Adobe, Kirkpatrick said.

"That's going to allow for the creation of complex models and complex scenes," he said. "In that aspect, it's very similar to the Adobes of the world."

Some challenges

Despite its better architecture, one challenge for Stability AI is depicting how its image model applies to enterprises.

While Stable Diffusion tends to appeal to individual designers and creators, other image tools such as OpenAI's Dall-E are more geared for enterprises because of OpenAI's partnership with tech giant Microsoft.

"B2B opportunity here is a significant opportunity, particularly in industries like media and entertainment and gaming insurance agriculture," Chandrasekaran said. "But we just haven't seen companies like Stability AI demonstrate a very robust enterprise go-to-market."

They're trying to create a more seamless image generation with like texture and language representation in those images.

Arun ChandrasekaranAnalyst, Gartner

Another challenge for Stability AI and other image-generating vendors is making sure that the safety guardrails it uses are efficient in enterprise settings.

"Everyone is talking about putting in guardrails," Kirkpatrick said. It is most important for enterprises that the generative AI tools are applicable in commercial settings, he said.

While it is important for vendors to test for bias, it's also necessary that they don't overcompensate, Kirkpatrick continued, referring to Google having to pause the image-generating part of its new Gemini generative AI model family after the model produced contextually wrong images of key historical features. The Gemini model produced only images of Black, Native American and Asian people.

"There is a vast opportunity for [the tools] when we're talking about marketing content or commerce," he said. "You want to make sure that you have that trust that the images that are generated are going to be free from bias. And you want to make sure that whatever is generated, the source material is okay to use from a copyright perspective."

Stability AI said it has deployed multiple safeguards to prevent the misuse of Stable Diffusion by bad actors.

The introduction of the updated model comes on the same day generative AI vendor Jasper AI revealed it acquired Stable Diffusion's image platform Clipdrop.

Esther Ajao is a TechTarget Editorial news writer covering artificial intelligence software and systems.

Stability AI adopts new architecture in Stable Diffusion 3

The new version of the image model uses a different architecture than previous versions. It comes in different sizes and has better spelling capabilities.

Significance of a new architecture

Some challenges

Dig Deeper on AI infrastructure

What are diffusion models?

Adobe launches new GenAI Firefly tool for retailers

37 AI content generators to explore in 2025

Google's Veo 2 is technically advanced, but concerns remain

Significance of a new architecture

Some challenges

Related Resources

Dig Deeper on AI infrastructure

What are diffusion models?

Adobe launches new GenAI Firefly tool for retailers

37 AI content generators to explore in 2025

Google's Veo 2 is technically advanced, but concerns remain