CTIBench: How Good Are LLMs at Detecting Cyber Threats? [Nidhi Rastogi] - 729

The TWIML AI Podcast with Sam Charrington 494 2 weeks ago

Video Not Working? Fix It Now

Today, we're joined by Nidhi Rastogi, assistant professor at Rochester Institute of Technology to discuss Cyber Threat Intelligence (CTI), focusing on her recent project CTIBench—a benchmark for evaluating LLMs on real-world CTI tasks. Nidhi explains the evolution of AI in cybersecurity, from rule-based systems to LLMs that accelerate analysis by providing critical context for threat detection and defense. We dig into the advantages and challenges of using LLMs in CTI, how techniques like Retrieval-Augmented Generation (RAG) are essential for keeping LLMs up-to-date with emerging threats, and how CTIBench measures LLMs’ ability to perform a set of real-world tasks of the cybersecurity analyst. We unpack the process of building the benchmark, the tasks it covers, and key findings from benchmarking various LLMs. Finally, Nidhi shares the importance of benchmarks in exposing model limitations and blind spots, the challenges of large-scale benchmarking, and the future directions of her AI4Sec Research Lab, including developing reliable mitigation techniques, monitoring "concept drift" in threat detection models, improving explainability in cybersecurity, and more. 🎧 / 🎥 Listen or watch the full episode on our page: https://twimlai.com/go/729. 🔔 Subscribe to our channel for more great content just like this: https://youtube.com/twimlai?sub_confirmation=1 🗣️ CONNECT WITH US! =============================== Subscribe to the TWIML AI Podcast: https://twimlai.com/podcast/twimlai/ Follow us on Twitter: https://twitter.com/twimlai Follow us on LinkedIn: https://www.linkedin.com/company/twimlai/ Join our Slack Community: https://twimlai.com/community/ Subscribe to our newsletter: https://twimlai.com/newsletter/ Want to get in touch? Send us a message: https://twimlai.com/contact/ 📖 CHAPTERS =============================== 00:00 - Introduction 3:00 - LLMs in the intersection of cybersecurity and AI 6:04 - RAG in cybersecurity 8:11 - Cyber threat intelligence (CTI) 11:00 - LLMs in CTI 13:37 - How LLMs perform with log-style data 16:35 - CTI Bench 19:41 - CTI Bench examples 25:53 - Building CTI bench 31:16 - Performance of LLMs 38:32 - LLM-as-judge 41:09 - Evaluation of LLM responses 41:41 - Examples of LLMs hallucinating 44:12 - Updating the benchmark 45:41 - Future directions 48:55 - Surprising challenges while building CTIBench 50:08 - SecGemini 51:45 - AI4Sec Research Lab 🔗 LINKS & RESOURCES =============================== CTIBench: A Benchmark for Evaluating LLMs in Cyber Threat Intelligence - https://arxiv.org/abs/2406.07599 AI4Sec Research Lab - https://nidhirastogi.github.io/ 📸 Camera: https://amzn.to/3TQ3zsg 🎙️Microphone: https://amzn.to/3t5zXeV 🚦Lights: https://amzn.to/3TQlX49 🎛️ Audio Interface: https://amzn.to/3TVFAIq 🎚️ Stream Deck: https://amzn.to/3zzm7F5

Comment