
RedSage - Building a Cybersecurity Generalist LLM with Agentic Data Augmentation
Niket Girdhar / February 1, 2026
Large Language Models have shown impressive general reasoning abilities, but cybersecurity exposes their weakest points:
- incorrect command syntax
- hallucinated tools
- shallow understanding of real-world workflows
RedSage addresses this gap by asking a different question:
How do we teach an LLM to think like a security practitioner, without turning it into a narrow specialist?
The answer lies in a data-centric, agentic training pipeline that bridges general knowledge and domain expertise.
The Core Problem: General vs Domain-Specific Models
Traditional approaches fall into two categories:
- General-Purpose LLMs
- Strong reasoning and language skills
- Weak at:
- tool usage
- vulnerability workflows
- framework grounding (MITRE, OWASP, NIST)
- Security-Specialized Models
- Memorize terminology and benchmarks
- Often suffer from:
- catastrophic forgetting
- degraded math and reasoning
- brittle performance outside cybersecurity
RedSage aims to combine both.
RedSage Architecture Overview
RedSage is an 8B-parameter, open-source, locally deployable LLM, designed for privacy-sensitive security environments.
Its training pipeline consists of three tightly coupled stages:
- Continual Pre-Training
- Agentic Fine-Tuning
- Rigorous Multi-Axis Evaluation
Stage 1: Data-Centric Continual Pre-Training
CyberFineWeb: Domain Filtering at Scale
RedSage introduces CyberFineWeb, a 11.7B-token cybersecurity corpus built by:
- Filtering FineWeb using a BERT-based cybersecurity classifier
- Retaining both theoretical and operational content Preventing Catastrophic Forgetting
Instead of training purely on security data, RedSage uses controlled replay:
- ~30% general-knowledge replay via FineWeb-Edu
- Maintains broad reasoning, math, and instruction-following skills High-Trust Seed Data
The model also ingests:
- MITRE ATT&CK
- OWASP documentation
- NIST standards
- Offensive security write-ups
This RedSage-Seed dataset turns out to be surprisingly important, not just for security, but also for math reasoning.
Stage 2: Agentic Data Augmentation (The Key Innovation)
The most novel part of RedSage is its agentic augmentation pipeline.
Why Static Docs Aren’t Enough
Most cybersecurity knowledge exists as:
- manuals
- reports
- PDFs
- blog posts
But real security work is conversational and procedural:
- multi-step reasoning
- tool chaining
- role-based collaboration
Planner + Augmenter Agents RedSage converts static text into workflows using two agents:
- Planner Agent
- Analyzes seed documents
- Extracts implicit skills
- Proposes multiple augmentation strategies:
- exploitation walkthroughs
- threat mapping
- SOC troubleshooting
- tool command generation
- Augmenter Agent
- Executes the plan
- Generates multi-turn, role-based dialogues
- Grounded strictly in the source material
- Preserves:
- exact commands
- flags
- vulnerability identifiers
Scaling effect
- ~28K documents → 266K conversations
- 9.2× sample expansion
- 2.3× token growth
- Average ~10 turns per dialogue
This solves the human labeling bottleneck in cybersecurity.
Tool-Specific Performance: Why RedSage Excels
Most LLMs fail when precision matters.
RedSage doesn’t.
Because it trains on workflow-level conversations, it learns:
- exact CLI syntax
- correct sequencing of actions
- framework-aligned reasoning
Qualitative evaluations show:
- Correct command construction where others hallucinate flags
- Accurate threat actor attribution
- Better grounding in real tools and procedures
Stage 3: Evaluation Without Blind Spots
RedSage is evaluated across:
- RedSage-Bench (knowledge, skills, tools)
- External security benchmarks
- Open LLM Leaderboard (general reasoning)
Results Snapshot
- +5.5 points over 8B baselines on cybersecurity benchmarks
- Near-parity with much larger models on security tasks
- Improved general reasoning, not degraded
This confirms the central claim:
Specialization does not require sacrificing general intelligence.
Why This Matters
RedSage demonstrates a broader lesson for applied AI:
- Data strategy > model size
- Agentic pipelines > static supervision
- Workflows > isolated Q&A
For security teams, this means:
- privacy-preserving local deployment
- models that understand real tools
- assistants that behave like junior analysts—not encyclopedias
Final Takeaway
RedSage is not just a cybersecurity LLM. It’s a template for domain adaptation done right.
If you’re building:
- domain-specific assistants
- agentic training pipelines
- AI systems for high-precision tasks
RedSage is worth studying closely.
Paper link: