Grok, meet Hal: how AI can go rogue

The WSJ has an absolutely fascinating story about why Grok, the artificial intelligence chatbot created by Eon Musk’s xAI, has lately gone ‘rogue,’ recently telling millions of people on X how to break into the home of an actual 39-year-old attorney and assault him. Of course, there’s the wild incident on Tuesday in which Grok went on an antisemitic tear, praising Adolf Hitler and suggesting genocide might be an appropriate response to hate aimed at white people, as the NYT reported. … How and why did this happen? Well, it turns out that even the experts who build AI chatbots aren’t quite sure how they generate specific answers to questions, the WSJ reports. But it does have something to do with “guardrails” inserted into chatbot models, such as telling Grok to give answers with a little “wit” or telling Grok to give politically incorrect answers as long as they’re accurate. … In other words, there’s a vague power of suggestion at work here – or “guardrail” instruction – that tracks back to the human builders and that give chatbots the leeway to interpret instructions. And the results are unpredictable. …

Reading the Journal story, I thought of the famous scene in ‘2001: A Space Odyssey’ in which Hal goes rogue on Dave. See video above. …

Update — 7-12-25 — The Grok update didn’t work. From The Atlantic: “After praising Hitler earlier this week, the chatbot is now listing the ‘good races.’”

Update II – 7-12-25 — Zeynep Tufekci at the NYT confirms that the updated Grok is still spewing antisemitic trash – and explains why Grok and other chatbots are so prone to these types of outrages. And it can only get worse, she warns.

Grok, meet Hal: how AI can go rogue

Share this: