Skip to main content

Graph-Structured RAG

When the GraphReasoner’s confidence falls below 0.8, the RAG pipeline retrieves compliance context from Memgraph and injects it into the LLM prompt. This grounds the LLM’s analysis in specific regulatory articles rather than relying on general knowledge.

Architecture

GraphReasoner → fires rules → extract risk_factor_ids


                        GraphRAGContextRetriever
                           │  │  │  │
    ┌──────────────────────┘  │  │  └──────────────────────┐
    ▼                         ▼  ▼                         ▼
Article Context    Mitigation Context    Violation Text    Cross-Framework
(RiskFactor →      (RiskFactor →         (Direct article   (Multi-framework
 Article →          Mitigation)           ID lookup)        impact)
 Framework)
    │                  │                  │                 │
    └──────────┬───────┘──────────┬──────┘─────────────────┘
               ▼                  ▼
           RAGContext         format_rag_context()


                        Markdown for LLM prompt

Retriever API

from quint_graph.rag import GraphRAGContextRetriever, format_rag_context

retriever = GraphRAGContextRetriever(memgraph_client)

# Retrieve compliance context from fired rules
context = await retriever.retrieve(
    result=inference_result,
    max_articles=10,
    max_mitigations=5
)

# Format as markdown for LLM prompt injection
markdown = format_rag_context(context)

Data Models

RAGContext

@dataclass
class RAGContext:
    risk_factors: list[RiskFactorContext]
    articles: list[ArticleContext]
    mitigations: list[MitigationContext]
    cross_framework: CrossFrameworkContext

ArticleContext

@dataclass
class ArticleContext:
    id: str              # e.g., "gdpr_art_6_1_a"
    label: str           # e.g., "GDPR Art. 6(1)(a)"
    text: str            # Full regulatory article text
    framework_name: str  # e.g., "GDPR"

Cypher Queries

The retriever executes four Cypher queries against Memgraph:
Traverses: RiskFactor → (TRIGGERS) → Article → (GOVERNED_BY) ← Category → (HAS_CATEGORY) ← FrameworkReturns articles with their framework context, ordered by PageRank (most authoritative first).
Traverses: RiskFactor → (MITIGATED_BY) → MitigationReturns mitigations ordered by coverage count (mitigations that address the most risk factors first).
Direct lookup by article ID for articles already referenced in violations.
Multi-framework impact analysis — how many frameworks are affected by the detected risk factors. Events crossing 3+ frameworks receive severity boost.

Formatted Output

The format_rag_context() function produces markdown injected into the LLM prompt:
## Compliance Context (from knowledge graph)

### Applicable Risk Factors
- **Data Exfiltration** (rf:data_exfiltration): Unauthorized transfer of data
  outside organizational boundaries
- **Bulk Data Access** (rf:bulk_data_access): Accessing large volumes of data
  in a single operation

### Relevant Compliance Articles
- **GDPR Art. 5(1)(c)** [GDPR]: Data shall be adequate, relevant and limited
  to what is necessary in relation to the purposes for which they are processed.
- **SOC2 CC6.1** [SOC2]: The entity implements logical access security software,
  infrastructure, and architectures over protected information assets.

### Recommended Mitigations
- Restrict bulk exports to internal resources or require approval
- Implement data loss prevention (DLP) controls on outbound transfers

### Cross-Framework Impact
- **3 frameworks affected**: GDPR, SOC2, ISO27001
- This indicates a systemic compliance gap, not an isolated issue

Integration in Scoring Pipeline

The RAG pipeline is wired into the event scoring route:
# In routes/events.py
result = graph_reasoner.evaluate(event, policies, tenant_id)

if graph_reasoner.needs_llm(result, event):
    # Retrieve compliance context from Memgraph
    rag_context = await rag_retriever.retrieve(result)
    compliance_markdown = format_rag_context(rag_context)

    # Inject into LLM prompt
    llm_score = await gemini_client.evaluate(
        event=event,
        policies=policies,
        compliance_context=compliance_markdown,
        graph_reasoning=result.explanation
    )

Rule-to-Ontology Mapping

The rule_mapping.py module maps fired rule names to ontology risk factor IDs:
RULE_TO_RISK_FACTORS = {
    "bulk_external_exfiltration": ["rf:data_exfiltration", "rf:bulk_data_access"],
    "gdpr_bulk_email_no_consent": ["rf:gdpr_consent_violation", "rf:bulk_communication"],
    "hipaa_phi_unencrypted": ["rf:hipaa_phi_exposure", "rf:encryption_missing"],
    # ... 90 rules mapped
}
This mapping bridges the forward-chaining engine (which knows rule names) with the Memgraph ontology (which knows risk factor IDs), enabling graph traversal for article retrieval.
The RAG pipeline gracefully degrades when Memgraph is unavailable — it returns an empty RAGContext and the LLM operates without compliance context grounding.