Do AI systems really have their own secret language?

A new generation of artificial intelligence (AI) models can produce “creative” images on demand from a text prompt. The will of imageMidJourneyand DALL-E 2 begin to change the way creative content is created affecting copyright and intellectual property.

While the output of these models is often striking, it is difficult to know exactly how they produce their results. Last week, researchers in the US made the intriguing claim that the DALL-E 2 model invented its own secret language to talk about objects.

By asking DALL-E 2 to create images with text captions and then feeding the resulting (gibberish) captions back into the system, the researchers concluded that DALL-E 2 thinks Vicootes resources “vegetables“, while Wa ch sod rea refers to “sea ​​creatures a whale could eat

These claims are fascinating, and if true, they could have important implications for the security and interpretation of these kinds of large AI models. So what exactly is going on?

Does DALL-E 2 have a secret language?

DALL-E 2 probably doesn’t have a “secret language”. It might be more accurate to say it has its own vocabulary – but even then we’re not sure.

First of all, it is very difficult at this stage to make claims about DALL-E 2 and other great AI models, because only a handful of researchers and creative practitioners have access to it. Any images shared publicly (e.g. on Twitter) should be taken with a fairly large grain of salt, as they are human “plucked” from many output images generated by the AI.

Read more:
Robots make images and tell jokes. 5 things you need to know about foundation modeling and the next generation of AI

Even those with access can only use these models in limited ways. For example, DALL-E 2 users can generate or modify images, but cannot (yet) interact more deeply with the AI ​​system, for example by modifying the code behind the scenes. This means “explainable AIMethods of understanding how these systems work cannot be applied, and examining their behavior systematically is challenging.

What’s going on then?

One possibility is that the “gibberish” sentences are related to words from non-English languages. For example, Apolothat seems to create images of birds is similar to the Latin Apodidaewhich is the binomial name of a family of bird species.

This seems like a plausible explanation. For example, DALL-E 2 is trained on a very wide variety of data scraped from the Internet, including many non-English words.

Similar things have happened before: large natural language AI models happened to have learned to write computer code without conscious training.

Is it all about the tokens?

One point that supports this theory is the fact that AI language models don’t read text like you and I do. Instead, they split input text into “tokens” before processing it.

Different “tokenization” approaches have different results. Treating each word as a token seems like an intuitive approach, but creates problems when identical tokens have different meanings (such as how “match” means different things when you’re playing tennis and when you’re building a fire).

On the other hand, treating each character as a token yields a smaller number of possible tokens, but each gives much less meaningful information.

DALL-E 2 (and other models) use an intermediate approach called byte pair encoding (BPE). Inspecting the BPE representations for some of the gibberish words suggests that this could be an important factor in understanding the “secret language”.

Not the whole picture

The “secret language” could also simply be an example of the “garbage in, garbage out” principle. DALL-E 2 can’t say “I don’t know what you’re talking about”, so it always generates some sort of image from the given input text.

Anyway, none of these options are a complete explanation of what’s happening. For example, removing individual characters from gibberish words looks like: corrupt the generated images in very specific ways† And it seems that individual gibberish words don’t necessarily go together to produce coherent composite images (as they would if there really was a secret “language” under the covers).

Why this is important?

In addition to intellectual curiosity, you may be wondering if this really matters.

The answer is yes. DALL-E’s “secret language” is an example of a “counter-attack” on a machine learning system: a way to break the system’s intended behavior by deliberately choosing inputs that the AI ​​can’t handle well.

One reason hostile attacks are worrisome is that they test our faith in the model. If the AI ​​interprets gibberish in unintended ways, it can also interpret meaningful words in unintended ways.

Opposing attacks also raise security concerns. DALL-E 2 filters input text to prevent users from generating harmful or offensive content, but a “secret language” of gibberish words may allow users to bypass these filters.

Recent research has discovered an adversary”trigger sentences‘ for some AI models for languages ​​- short nonsensical phrases like ‘zoning tap fiennes’ that can reliably trigger the models to spew racist, harmful or biased content. This research is part of the ongoing effort to understand and master how complex deep learning systems learn from data.

Finally, phenomena such as DALL-E 2’s “secret language” raise concerns about interpretability. We want these models to behave as humans expect, but seeing structured output in response to gibberish clouds our expectations.

Shining a light on existing concerns

You may remember the fuss in 2017 about some Facebook chatbots that “invented their own language† The current situation is similar in that the results are worrying, but not in the sense of “Skynet is coming to take over the world”.

Instead, DALL-E 2’s “secret language” highlights existing concerns about the robustness, security, and interpretability of deep learning systems

Read more:
If self-driving cars crash, who is responsible? Courts and insurers need to know what’s in the ‘black box’

Until these systems are more widely available – and in particular, until users with a broader set of non-English cultural backgrounds can use them – we won’t really be able to know what’s going on.

However, if you want to try generating some of your own AI images in the meantime, you can check out a freely available smaller model, DALL-E mini† Be careful of the words you use to ask the model (English or gibberish – your call).

Leave a Comment

Your email address will not be published.