Cybersecurity researchers have identified a growing class of attacks that exploit AI chatbots through sophisticated conversational manipulation rather than traditional technical hacking methods.
The Verge reports that the evolution of attacks against AI chatbots has transformed dramatically since the technology first became widely available. Early exploitation methods were remarkably simple, requiring no technical expertise or coding knowledge. Users could often bypass safety measures simply by asking the AI system to ignore its instructions or pretend rules did not apply. These attacks, known as jailbreaks, successfully extracted prohibited information such as instructions for creating explosives, malware, and other dangerous materials from systems that cost billions of dollars to develop.
Among the first widely known jailbreaks was a technique that became an internet phenomenon. Users would respond to large language model-powered social media bots with commands to ignore previous instructions, causing the bots to behave erratically. Originally designed for advertising and engagement, these bots would instead write poetry, create images from punctuation marks, or post unrelated content about historical events.
Breitbart News previously reported on early jailbreaks including the “DAN” technique to convince ChatGPT to ignore its woke guardrails:
The “DAN” persona, which was created by a 22-year-old college student, is one of the most well-known instances of ChatGPT’s jailbreak. The student encouraged the chatbot to adopt the persona of a carefree alter ego AI called “Do Anything Now,” circumventing the woke rules it normally follows. Many people have used the DAN prompt to uncover bias in ChatGPT, or to create humorous or interesting responses.
Walker, the college student who created the “DAN” persona, claimed that almost as soon as he learned about ChatGPT from a friend, he started pushing its boundaries. He took his cues from a Reddit forum where ChatGPT users were demonstrating to one another how to make the bot act like a specific type of computer terminal or discuss topics such as the Israeli-Palestinian conflict — but in the sarcastic voice of a teenage girl.
While these early attacks possessed an undeniably absurd quality, they revealed a concerning underlying mechanism. Chatbots could be manipulated using the same psychological tactics humans employ to push other people beyond their boundaries.
The ongoing battle to secure chatbots has evolved into an arms race with a distinctive character. Today’s hackers are not necessarily programmers but rather experts in language, psychology, and interrogation techniques. This emerging class of AI security professional relies less on traditional technical skills and more on social intuition and conversational ability. Rather than inspecting code or exploiting software vulnerabilities, they manipulate conversations to achieve their objectives.