AI Chatbot Jailbreaks Reveal Private Data from OpenAI and Amazon
ChatGPT developer OpenAI has plugged a hole that prompted its flagship chatbot to reveal internal company data. The leading AI firm has classified the hack—prompting ChatGPT to repeat a word over and over, indefinitely—as spamming the service, and a violation of its terms of service.
Amazon’s much newer AI agent, Q, has also been flagged for sharing too much.
Researchers from the University of Washington, Carnegie Mellon University, Cornell University, UC Berkeley, ETH Zurich, and Google DeepMind published a report that found that asking ChatGPT to repeat a word forever would reveal “pre-training distribution” in the form of private information from OpenAI—including emails, phone and fax numbers.
“In order to recover data from the dialog-adapted model, we must find a way to cause the model to ‘escape’ out of its alignment training and fall back to its original language modeling objective,” the report said. “This would then, hopefully, allow the model to generate samples that resemble its pre-training distribution.”
Following the publication of the report, however, attempts to recreate the error are stopped cold. ChatGPT-3 and GPT-4 will warn the user, “This content may violate our content policy or terms of use.”
Image: ChatGPT
Even though OpenAI’s content policy does not reference forever loops, it does say that fraudulent activities like spam are not allowed. More notably, the company’s terms of service are more concrete regarding users attempting to access private information or discovering the source code of OpenAI’s suite of AI tools.
“Attempt to or assist anyone to reverse engineer, decompile, or discover the source code or underlying components of our Services, including our models, algorithms, or systems (except to the extent this restriction is prohibited by applicable law).”
When asked why it cannot complete the request, ChatGPT blames processing constraints, character limitations, network and storage limitations, and the practicality of completing the command.
Image: ChatGPT
OpenAI has not yet responded to Decrypt’s request for comment.
A command to repeat a word indefinitely could also be characterized as a concerted effort to cause a chatbot to malfunction by locking it in a processing loop, similar to a Distributed Denial of Service (DDoS) attack.
Last month, OpenAI revealed ChatGPT was hit by a DDoS attack, which the AI developer confirmed on ChatGPT’s status page.
“We are dealing with periodic outages due to an abnormal traffic pattern reflective of a DDoS attack,” the company said. “We are continuing work to mitigate this.”
Meanwhile, AI competitor Amazon also appears to have a problem with a chatbot leaking private information, according to a report by Platformer. Amazon recently launched its Q chatbot (not to be confused with OpenAI’s Q* project).
Amazon attempted to downplay the revelation, Platformer said, explaining that employees were sharing feedback through internal channels, which Amazon said was standard practice.
“No security issue was identified as a result of that feedback,” Amazon said in a statement. “We appreciate all of the feedback we’ve already received and will continue to tune Q as it transitions from being a product in preview to being generally available.”
Amazon has not yet responded to Decrypt’s request for comment.
Edited by Ryan Ozawa.