Stable Diffusion: Can Extract Training Data from Image AI

Stable Diffusion

Image diffusion models such as Dall-E 2, Imagen and Stable Diffusion have attracted extreme attention for their ability to create high-quality and realistic images. Researchers have now succeeded in reading training data.

Anyone who uses an image generator has certainly wondered where the artificial intelligence training data actually comes from. Diffusion models are trained using images from the Internet. The use is potentially risky for copyright reasons since the AI likes to use everything it can find.

What Had Done in Stable Diffusion Research

The models like Stable Diffusion are often trained on copyrighted, trademarked, private and sensitive images. Researchers have now examined whether the AI also works with self-generated image material.

They noticed that the AI sometimes remembers images and creates almost identical copies of any image. Many of these images are copyrighted or licensed and feature explicit photographs of individuals.

What will impact in the future?

One of the contributing researchers, Eric Wallace, tweeted about the paper: “Personally, I have many thoughts about this paper. First, everyone should deduplicate their data as it reduces memorization. In rare cases, however, we can still extract non-duplicated images!”.

Stable Diffusion: How to Run on Your PC to Generate AI Images


“Finally, there are still open questions about the impact of ongoing lawsuits against StabilityAI, OpenAI, and GitHub,” of our work. In particular, models that remember training points could have problems with laws such as copyright law, US trademark protection law or the GDPR.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top