Show HN: Detecting LLM hallucinations in <1ms using hidden states (RTX3050, 4GB)

1points

byyubainu

1 hour ago

GitHub: https://github.com/yubainu/sibainu-engine

TL;DR: I built a lightweight auditor that detects hallucinations by monitoring Transformer Hidden State Dynamics in real-time. It achieves 0.90+ ROC-AUC on Gemma/Llama-3.2/Mistral using a single RTX 3050 (4GB), with a core computation time of <1ms.

What it is

The Sibainu Engine is a pre-emptive auditing layer that identifies "latent trajectory collapse"—geometric turbulence in the vector transformations between transformer layers—before the token is even sampled. It requires no training and works with frozen weights.

The "15ms vs 1ms" Latency Reality

I prioritized "no-nonsense" performance reporting. In a local Python/FastAPI environment, the total response time is 15-25ms, but it's important to distinguish the components: Auditing Core (NumPy): < 1.0 ms. The actual vectorized math is near-instant. System Overhead: ~12.0 ms is spent on Pydantic validation and JSON-to-Array conversion.

The Bottom Line: The core logic is significantly faster than the LLM's token generation speed (typically 30-70ms), meaning the audit is theoretically "zero-overhead" if integrated directly into the C++/CUDA inference pipeline.

Key Metrics (Gemma-2B / HaluEval-QA)

ROC-AUC: 0.9176 Recall @ 5% False Signal Rate (FSR): 59.7% (It captures ~60% of hallucinations while only flagging 5% of factual truths). Hardware: Validated on consumer-grade RTX 3050 (4GB) using 4-bit (NF4) quantization.

How it works: Layer Dissonance

Instead of just looking at logit entropy, v6.4 monitors Layer Dissonance—the structural inconsistency between the middle and final layers. When a model hallucinations, the geometric stability between these layers exhibits a specific turbulence that is absent during factual recall.

Closed-Loop Recovery

I’ve included a recovery_agent_gemma.py that demonstrates Autonomous Safety Control. If the engine detects a physical neural anomaly (Score > 3.6510), it immediately aborts the session and triggers a re-generation using deterministic greedy search to stabilize the output.

1 comment