Mastodawn

🎁 GenAI x Sec Advent #1

💡 RAG (Retrieval Augmented Generation) is an excellent way to use GenAI with your data!

A RAG allows you to add your data as context to a GenAI model like GPT, Claude, Mistral, and others. Think of it as incorporating your data directly into your prompts. 🤖

🤓 But it’s more than that— a RAG doesn’t just add context; it retrieves the most relevant information based on your query, like a search engine for your own data! 😉

We usually talk about the following RAG Techniques:

1️⃣ Naive RAG:
It retrieves relevant documents using vector similarity search and incorporates them into the LLM's context. It works by embedding the user queries and documents into a shared vector space to find semantically similar content. The performance depends on well-tuned chunking strategies and embedding parameters.

2️⃣ Advanced RAG:
It optimizes data indexing, retrieval quality, and post-retrieval processing to improve the relevance and coherence into the generated responses. For example it adds additional techniques such as Re-Ranking, Prompt Compression, Noise Filtering.

3️⃣ Modular RAG:
It introduces flexible modules—such as search, memory, fusion, and routing—that can be customized and rearranged to address specific tasks.

🤓 RAG in Cybersecurity:
This year, I used it in a very practical way to analyze the ISOON leak. I processed the data by:

- Extracting text via OCR,
- Translating it from Chinese to English,
- Embedding it into a vector to build my RAG and using FAISS.

The entire process is documented in the link below, and you can reuse my code! 👇

https://jupyter.securitybreak.io/ISOON_DataLeak_OCR_GenAI/ISOON_ChinLeaks.html

#genai #threatintel #RAG #cybersecurity #infosec