News & Analysis

Stability AI’s Text-to-Image Model

At a time when generative AI models are facing ethical issues, the company launched a new text to image model

Generative AI may have polarized users around both ethical and competition issues, but it is hardly stopping companies engaged in this next-gen technology from refining their models and making them more human-line. Stability AI has now come up with an advanced version of its text-to-image model, one that it says can deliver better customized results. 

The AI startup made available its new Stable Diffusion XL 1.0 on open source on GitHub with Stability’s API and consumer apps ClipDrop and DreamStudio in tow. The new version delivers more vibrant and accurate colors as well as better contrast, shadows and lighting compared to its predecessors, the company says in a blog post

The previous version of the stable diffusion model could produce higher-resolution images too, but needed additional computing power. The latest version is also customizable and can fine-tune concepts and styles on the go. Besides, it is easier to use and can generate complex designs with basic natural language processing prompts. 

The new version offers better text generation too

A report published in TechCrunch quoted Stability AI’s head of applied machine learning Joe Penna to suggest that Stable Diffusion XL 1.0 contains 3.5 billion parameters that can yield full 1-megapixel resolution images in seconds across multiple aspect ratios. It also offers improved text generation capabilities. 

Penna claimed that while most text-to-image models struggle to generate images with legible logos, lower calligraphy or fonts, the latest version of Stable Diffusion can manage advanced text generation and legibility. The new version also supports inpainting that reconstructs missing parts of an image and outpainting which extends existing images. 

By using Stable Diffusion XL 1.0, users can input an image and provide additional text prompts to create detailed variations of the same. The AI model also is capable of figuring out complicated, multiple instructions provided in short prompts. This is a major difference from the earlier version that required longer text prompts. 

However, the moral and ethical issues persist

Of course, the new launch does not take away from the old questions around ethical and moral issues. The open source version of the new app can theoretically be used by bad actors to generate toxic content such as non-consensual deepfakes. The chances of it happening is higher due to the fact that millions of images from the web were used to train the AI model. 

In the article on TechCrunch, Panna doesn’t refute this idea and also accepts that the model contains some biases. However, he claims that Stability AI has taken additional measures to mitigate harmful content generation through filtering the model’s training data for what he describes as unsafe imagery and via warnings associated with problematic prompts. 

There’s also the ethical question around the training model and data as several artists have protested over the usage of their work as training data for generative AI models. Stability AI says it is shielding itself from legal liability by fair use doctrines in the United States for now. However, several artists and photo company Getty images have already filed lawsuits. 

The company also joined hands with another startup Spawning to create opt-out requests from these artists. The company says it hasn’t removed or flagged artwork from its training data sets but would continue to incorporate artists requests. The company hopes to use such requests to further iterate the process of training. 

Leave a Response