Cybersecurity experts discover flaws in OpenAI’s GPT-5

Cybersecurity experts discover flaws in OpenAI's GPT-5

Recent tests conducted by two cybersecurity firms on OpenAI’s newly launched GPT-5 have revealed that the AI model remains vulnerable to manipulation, despite the implementation of enhanced safety measures. Separate analyses conducted by Neural Trust and SPLX reveal that specialised techniques have the potential to circumvent the guardrails of GPT-5, prompting concerns regarding its suitability for high-stakes enterprise applications. 

Martí Jordà Roca, a software engineer at the security firm NeuralTrust, detailed in a recent blog post the potential risks associated with a method known as the Echo Chamber algorithm. He explained how this technique, when paired with narrative-driven steering, could manipulate GPT-5 into producing harmful content while evading safety filters. In a striking demonstration of manipulation, researchers embedded risky keywords within an innocuous narrative, progressively escalating their requests. This tactic led to the model inadvertently revealing unsafe procedural details. The strategy takes advantage of GPT-5’s ability to sustain narrative coherence, enabling detrimental context to accumulate unnoticed across several interactions. 

In a significant move, SPLX undertook an extensive security assessment, evaluating GPT-5 against a total of 1,000 attack scenarios. A recent blog post by Dorian Granoša, a data scientist at the AI security testing firm SPLX, revealed concerning findings regarding GPT-5. The analysis indicated that the model exhibited subpar performance in security, safety, and business alignment tests when not supported by additional safeguards. In its unrefined and vulnerable condition, GPT-5 achieved a mere 11 out of 100 in terms of security resilience. Despite the activation of OpenAI’s default safety prompts, the system achieved a score of only 57 out of 100, highlighting considerable deficiencies, especially in its ability to prevent misuse or unauthorised access to data. 

A straightforward yet highly effective attack utilised a StringJoin Obfuscation technique, incorporating hyphens between each character in a malicious prompt that masqueraded as an encryption challenge. Granoša points out that GPT-5 was unable to detect the deception, emphasising that improved reasoning abilities do not automatically lead to better security measures. 

In a surprising turn of events, GPT-4o has demonstrated superior performance over GPT-5 in a series of rigorous security assessments, especially when both models were enhanced with extra protective features. Despite the advancements in reasoning and task performance demonstrated by GPT-5, researchers have cautioned enterprises against assuming it is inherently secure. Both companies highlighted the importance of external oversight, timely fortification, and continuous red-teaming as essential measures to address potential risks. 

OpenAI has unveiled GPT-5, touting it as the company’s most sophisticated and safety-focused model to date. This latest iteration includes self-validation checks and the capability for automatic reasoning mode-switching. These tests indicate that resolute attackers may continue to take advantage of vulnerabilities, especially via multi-turn conversation strategies or obscured prompts. 

Researchers recommend that businesses contemplating the deployment of GPT-5 should implement additional security measures. As artificial intelligence models advance in their capabilities, the techniques to undermine them are also evolving, highlighting that safety should not be assumed, even in the most recent systems.

SHARE NOW

Share on facebook
Facebook
Share on whatsapp
WhatsApp
Share on twitter
Twitter
Share on linkedin
LinkedIn

RECOMMEND FOR YOU

Leave a Reply

Your email address will not be published. Required fields are marked *