GPT-4 exploited 87% of vulnerabilities

LLM and their emerging role in cybersecurity

GPT-4 exploited 87% of vulnerabilities: LLM and their emerging role in cybersecurity Science

An actual study demonstrates that large language models (LLMs), particularly GPT-4, can autonomously exploit one-day vulnerabilities in real-world systems with an 87% success rate when provided with CVE descriptions, a capability not matched by other models or open-source vulnerability scanners. Without CVE descriptions, GPT-4's effectiveness drops significantly to 7%, indicating its dependency on detailed vulnerability data for successful exploitation. The research underscores a significant advancement in the use of AI in cybersecurity, posing both potential risks and benefits. The findings prompt a reevaluation of the deployment of such capable AI agents in cybersecurity, considering their ability to autonomously exploit vulnerabilities. Ethical considerations are discussed, emphasizing responsible usage and the importance of secure deployment of LLM technologies in sensitive environments.

The study "LLM Agents can Autonomously Exploit One-day Vulnerabilities" authored by Richard Fang, Rohan Bindu, Akul Gupta, and Daniel Kang explores the capabilities of large language models (LLMs), specifically GPT-4, in autonomously exploiting one-day vulnerabilities in real-world systems. The study is significant as it highlights the advanced abilities of LLMs not just in benign applications but also in potentially malicious uses, such as cybersecurity exploitation.

The abstract introduces the main findings of the study, emphasizing that LLMs, particularly GPT-4, have shown a high success rate (87%) in exploiting one-day vulnerabilities from a dataset when provided with detailed CVE descriptions. This is contrasted with other models and tools which showed no success, underscoring the advanced capability of GPT-4.

LLM applications and their emerging role in cybersecurity

The background section elaborates on the concept of computer security and the role of LLM agents. It points out that while previous research mostly involved "toy problems" or controlled environments, this study leverages real-world scenarios to test the efficacy of LLMs in hacking. This section sets the stage by discussing the broader context of LLM applications in various fields and their emerging role in cybersecurity.

Benchmark of 15 real-world one-day vulnerabilities

The paper details the methodology involving the creation of a benchmark consisting of 15 real-world one-day vulnerabilities. These vulnerabilities are sourced from the Common Vulnerabilities and Exposures (CVE) database and academic papers, focusing on those that can be reproduced in a controlled environment. The LLM agent used in the study, equipped with access to CVE descriptions and various tools, demonstrates the simplicity yet effectiveness of deploying such models for cybersecurity tasks.

Findings and Analysis

The core findings highlight that GPT-4 successfully exploited 87% of the vulnerabilities when given CVE descriptions, a significant performance compared to other models and tools like ZAP and Metasploit which had 0% success. The significant drop in success rate (to 7%) without CVE descriptions indicates the importance of detailed vulnerability information for successful exploitation by LLMs.

Downside and upside of LLM technologies

The discussion section reflects on the implications of such capabilities, considering both the potential misuse of LLM technologies in malicious contexts and the possibilities for enhancing cybersecurity defenses by understanding and anticipating such exploits. The ability of LLMs to perform complex tasks autonomously raises important questions about the deployment and control of such technologies in sensitive environments.

Ethical Considerations

The ethics statement addresses the potential negative uses of LLMs in hacking, stressing the importance of responsible use and further research to mitigate risks associated with AI capabilities in cybersecurity. The research adheres to ethical guidelines, with experiments conducted in sandboxed environments to avoid real-world harm.


The paper concludes by reiterating the potential of LLMs to impact cybersecurity significantly, showcasing a dual-use technology that can both aid in security efforts and pose new challenges. It calls for the cybersecurity community to adapt and evolve in response to these new technological capabilities.

In summary, the document provides a thorough examination of the autonomous capabilities of LLMs like GPT-4 in exploiting cybersecurity vulnerabilities, presenting both the technological advancements and the accompanying risks. It serves as a call to action for both the AI and cybersecurity communities to collaborate on developing robust safeguards and ethical guidelines for the deployment of AI technologies in sensitive fields.

LLM Agents can Autonomously Exploit One-day Vulnerabilities
Richard Fang, Rohan Bindu, Akul Gupta, Daniel Kang 


[ Source of cover photo: Generated with AI ]
Risk Academy

The seminars of the RiskAcademy® focus on methods and instruments for evolutionary and revolutionary ways in risk management.

More Information

The newsletter RiskNEWS informs about developments in risk management, current book publications as well as events.

Register now
Solution provider

Are you looking for a software solution or a service provider in the field of risk management, GRC, ICS or ISMS?

Find a solution provider
Ihre Daten werden selbstverständlich vertraulich behandelt und nicht an Dritte weitergegeben. Weitere Informationen finden Sie in unseren Datenschutzbestimmungen.