Do AI systems really have their own secret language?

Credit: Giannis Daras / DALL-E

A new generation of artificial intelligence (AI) models can produce “creative” images on demand based on a text prompt. Companies like Imagen, MidJourney and DALL-E 2 are starting to change the way creative content is created, with implications for copyright and intellectual property.

While the output of these models is often striking, it is difficult to know exactly how they produce their results. Last week, researchers in the United States made the intriguing claim that the DALL-E 2 model may have invented its own secret language for talking about objects.

By having DALL-E 2 create images containing text captions, then sending the resulting captions (gibberish) back into the system, the researchers concluded that DALL-E 2 thinks Vicootes stands for “vegetables“, while Wa ch zod rea refers to “sea ​​creatures a whale could eat“.

These claims are fascinating and, if true, could have important security and interpretability implications for this type of large AI model. So what exactly is going on?

Does DALL-E 2 have a secret language?

DALL-E 2 probably doesn’t have a “secret language”. It might be more accurate to say he has his own vocabulary– but even then, we can’t know for sure.

First, at this stage, it is very difficult to verify claims about DALL-E 2 and other great AI models, as only a handful of researchers and creative practitioners have access to them. Any images shared publicly (on Twitter for example) should be taken with a fairly large grain of salt, as they have been “handpicked” by a human from many AI-generated output images.

Even those with access can only use these templates to a limited extent. For example, DALL-E 2 users can generate or modify images, but cannot (yet) interact more deeply with the AI ​​system, for example by modifying the code in the background. This means that “explainable AI” methods for understanding how these systems work cannot be applied, and it is difficult to systematically investigate their behavior.

What happens then?

One possibility is that “gibberish” phrases are linked to words in languages ​​other than English. For example, Apoploe, which seems to create images of birds, is similar to the Latin Apodidae, which is the binomial name for a family of bird species.

This seems like a plausible explanation. For example, DALL-E 2 was trained on a very wide variety of data taken from the Internet, which included many non-English words.

Similar things have happened before: great natural language AI models have coincidentally learned to write computer code without deliberate training.

Is it a question of tokens?

One point that supports this theory is the fact that AI language models don’t read text the way you and I do. Instead, they break the input text into “tokens” before processing it.

Different approaches to tokenization have different results. Treating each word as a token seems like an intuitive approach, but causes problems when identical tokens have different meanings (like how “match” means different things when playing tennis and when lighting a fire).

On the other hand, treating each character as a token produces a smaller number of possible tokens, but each conveys much less meaningful information.

DALL-E 2 (and other models) use an intermediate approach called Byte Pair Encoding (BPE). Examination of BPE representations for some of the gibberish words suggests that this may be an important factor in understanding “secret language”.

Not the whole picture

The “secret language” could also be just one example of the “garbage in, garbage out” principle. DALL-E 2 can’t say “I don’t know what you’re talking about”, so it will always generate some kind of image from the given input text.

Either way, none of these options is a complete explanation of what’s going on. For example, removing individual characters from gibberish words seems corrupt generated images in very specific ways. And it seems that individual gibberish words don’t necessarily combine to produce coherent composite images (as they would if there really was a secret “language” under the covers).

why it matters

Beyond intellectual curiosity, you might be wondering if all of this really matters.

The answer is yes. DALL-E’s “secret language” is an example of an “adversarial attack” on a machine learning system: a way to break the system’s intended behavior by intentionally choosing inputs that the AI ​​doesn’t handle well.

One of the reasons contradictory attacks are concerning is that they challenge our confidence in the model. If the AI ​​is interpreting gibberish words unintentionally, it can also interpret meaningful words unintentionally.

Adversary attacks also raise security issues. DALL-E 2 filters input text to prevent users from generating harmful or abusive content, but a “secret language” of gibberish words can allow users to bypass these filters.

Recent research has discovered conflicting “trigger phrases” for some linguistic AI models, short, nonsensical phrases such as “zoner tapping ass” that can reliably trigger models to spit out racist, harmful, or biased content. This research is part of the ongoing effort to understand and control how complex deep learning systems learn from data.

Finally, phenomena such as the “secret language” of DALL-E 2 raise problems of interpretability. We want these models to behave as a human expects them to, but seeing structured output in response to gibberish confuses our expectations.

Shedding Light on Existing Concerns

You may remember the 2017 fuss over some Facebook chatbots that “invented their own language.” The current situation is similar in that the results are cause for concern, but not in the “Skynet is taking over the world” sense.

Instead, the “secret language” of DALL-E 2 highlights existing concerns about the robustness, security, and interpretability of deep learning systems.

Until these systems are more widely available – and in particular, until users from a wider set of non-English speaking cultural backgrounds can use them – we won’t be able to really know what’s going on.

In the meantime, however, if you want to try generating some of your own AI images, you can check out a smaller model available for free, DALL-E mini. Just be careful what words you use to invite the model (English or gibberish – your call).

A neuroscientist explains the differences between AI and human learning

Provided by
The conversation

This article is republished from The Conversation under a Creative Commons license. Read the original article.

Do AI systems really have their own secret language? (2022, June 7)
retrieved 7 June 2022

This document is subject to copyright. Other than fair use for purposes of private study or research, no
any part may be reproduced without written permission. The content is provided for information only.

Stay connected with us on social media platform for instant update, click here to join our Jwitter& Facebook

Back To Top