PROTOS/EARL gov_report Benchmark Results

PROTOS/EARL Document Summarization Benchmark

gov_report Dataset — 19.5K U.S. Government Reports — 100 Documents Evaluated
Jou Labs, Inc.
PROTOS_OS / EARL Cognitive Engine
Benchmark Harness v4
Standard Metrics — Head-to-Head
PROTOS/EARL (0 parameters, bare metal)
LLaMA 3.2 (3.2B parameters, GPU)
ROUGE-1 F1 +35.5%
PROTOS
0.4743
LLaMA
0.3500
ROUGE-2 F1 +34.4%
PROTOS
0.1613
LLaMA
0.1200
ROUGE-L F1 +6.6%
PROTOS
0.1918
LLaMA
0.1800
ROUGE-Lsum F1 +24.8%
PROTOS
0.2745
LLaMA
0.2200
BLEU-4 +172.5%
PROTOS
0.1090
LLaMA
0.0400
PROTOS-Exclusive — DoD Directive 3000.09
Provenance chain length997
Chain integrity verified✔ YES
Last-doc rolling tip hash676040dfcdbfe063
Dataset anchor hash (XOR)7661f1d5b27c6f9b
Avg source tokens/sentence20.14
Reasoning chains used1,000
Multi-hop paths (depth>1)477
Max reasoning depth3 hops
PMI relations traversed537,482
Full traceability score100.00%
Execution Performance
Total cycles2,386,192,303,068
Avg cycles/doc23,861,923,030
Est. tokens/sec405
Total input tokens386,275
Est. wall time954.48s @ 2.5 GHz
Audit Workflow
chainsearch BENCH — find all benchmark forensics events
chainverify — verify entire chain including benchmark entries
chainentry <n> — inspect individual doc results + hash
chainprovenance — triple cross-check (forensics + NVMe + narrator)
Key result: PROTOS/EARL outperforms the 3.2-billion-parameter LLaMA 3.2 across all five standard metrics using extractive-structural summarization via PMI graph traversal — with zero parameters, full provenance, zero cloud dependency, and complete 3000.09 compliance. LLaMA cannot provide any of these properties regardless of configuration.
Architectural Comparison
Capability PROTOS/EARL LLaMA 3.2 (3.2B)
ROUGE-1 F10.4743~0.35
ROUGE-2 F10.1613~0.12
ROUGE-L F10.1918~0.18
ROUGE-Lsum F10.2745~0.22
BLEU-40.1090~0.04
Parameters0 (symbolic)3.2 Billion
Model / kernel size3,823 KB~6 GB
Reasoning typeExtractive-structural (PMI graph)Autoregressive (neural)
Explainable✔ Glass-box✘ Opaque
Provenance chain✔ FNV-1a rolling hash✘ None
Forensics chain✔ Per-doc + aggregate✘ None
Air-gapped capable✔ No network required✘ Cloud / GPU
3000.09 compliant✔ Full compliance✘ Not possible
Ring 0 execution✔ Bare metal, no OS✘ Requires OS + GPU
Memory safe✔ Rust, no_stdC++ / Python
Knowledge base300K tokens + 31M PMI edgesN/A (weights)
NVMe anchor file/earl/.benchmark_anchorN/A
System Profile
Rust no_std Bare Metal Ring 0 3,823 KB Kernel NVMe Direct I/O Zero Dependencies Air-Gapped 0 Parameters DoD 3000.09 Compliant
Scoring Notes
ROUGE-L: Standard DP-LCS (space-optimized).   ROUGE-Lsum: Sentence-aligned best-match LCS (both sides split).   BLEU: Zero-count add-1 smoothing, log-space geometric mean.   Sentences: Abbreviation-aware splitting (U.S., Sec., No., etc.).   LLaMA 3.2 scores estimated from published long-document summarization evaluations (Permion Inc. / GTS Inc. Datasets for LLM/LGM Benchmarks, Arun Majumdar, 2025).
LLaMA 3.2 cannot provide provenance, forensics chains, or glass-box explainability. Neural models are fundamentally opaque — no hash-linked audit trail, no source traceability, no 3000.09 compliance path. PROTOS/EARL achieves superior ROUGE and BLEU scores with zero parameters in an air-gapped environment where LLaMA cannot boot.