PROTOS/EARL Document Summarization Benchmark
gov_report Dataset — 19.5K U.S. Government Reports — 100 Documents Evaluated
Jou Labs, Inc.
PROTOS_OS / EARL Cognitive Engine
Benchmark Harness v4
Standard Metrics — Head-to-Head
PROTOS/EARL (0 parameters, bare metal)
LLaMA 3.2 (3.2B parameters, GPU)
ROUGE-1 F1 +35.5%
ROUGE-2 F1 +34.4%
ROUGE-L F1 +6.6%
ROUGE-Lsum F1 +24.8%
BLEU-4 +172.5%
PROTOS-Exclusive — DoD Directive 3000.09
Provenance chain length997
Chain integrity verified✔ YES
Last-doc rolling tip hash676040dfcdbfe063
Dataset anchor hash (XOR)7661f1d5b27c6f9b
Avg source tokens/sentence20.14
Reasoning chains used1,000
Multi-hop paths (depth>1)477
Max reasoning depth3 hops
PMI relations traversed537,482
Full traceability score100.00%
Execution Performance
Total cycles2,386,192,303,068
Avg cycles/doc23,861,923,030
Est. tokens/sec405
Total input tokens386,275
Est. wall time954.48s @ 2.5 GHz
Audit Workflow
chainsearch BENCH — find all benchmark forensics events
chainverify — verify entire chain including benchmark entries
chainentry <n> — inspect individual doc results + hash
chainprovenance — triple cross-check (forensics + NVMe + narrator)
Key result: PROTOS/EARL outperforms the 3.2-billion-parameter LLaMA 3.2 across all five standard metrics using extractive-structural summarization via PMI graph traversal — with zero parameters, full provenance, zero cloud dependency, and complete 3000.09 compliance. LLaMA cannot provide any of these properties regardless of configuration.
Architectural Comparison
| Capability | PROTOS/EARL | LLaMA 3.2 (3.2B) |
|---|---|---|
| ROUGE-1 F1 | 0.4743 | ~0.35 |
| ROUGE-2 F1 | 0.1613 | ~0.12 |
| ROUGE-L F1 | 0.1918 | ~0.18 |
| ROUGE-Lsum F1 | 0.2745 | ~0.22 |
| BLEU-4 | 0.1090 | ~0.04 |
| Parameters | 0 (symbolic) | 3.2 Billion |
| Model / kernel size | 3,823 KB | ~6 GB |
| Reasoning type | Extractive-structural (PMI graph) | Autoregressive (neural) |
| Explainable | ✔ Glass-box | ✘ Opaque |
| Provenance chain | ✔ FNV-1a rolling hash | ✘ None |
| Forensics chain | ✔ Per-doc + aggregate | ✘ None |
| Air-gapped capable | ✔ No network required | ✘ Cloud / GPU |
| 3000.09 compliant | ✔ Full compliance | ✘ Not possible |
| Ring 0 execution | ✔ Bare metal, no OS | ✘ Requires OS + GPU |
| Memory safe | ✔ Rust, no_std | C++ / Python |
| Knowledge base | 300K tokens + 31M PMI edges | N/A (weights) |
| NVMe anchor file | /earl/.benchmark_anchor | N/A |
System Profile
Rust no_std
Bare Metal Ring 0
3,823 KB Kernel
NVMe Direct I/O
Zero Dependencies
Air-Gapped
0 Parameters
DoD 3000.09 Compliant
Scoring Notes
ROUGE-L: Standard DP-LCS (space-optimized).
ROUGE-Lsum: Sentence-aligned best-match LCS (both sides split).
BLEU: Zero-count add-1 smoothing, log-space geometric mean.
Sentences: Abbreviation-aware splitting (U.S., Sec., No., etc.).
LLaMA 3.2 scores estimated from published long-document summarization evaluations (Permion Inc. / GTS Inc. Datasets for LLM/LGM Benchmarks, Arun Majumdar, 2025).
LLaMA 3.2 cannot provide provenance, forensics chains, or glass-box explainability.
Neural models are fundamentally opaque — no hash-linked audit trail, no source traceability, no 3000.09 compliance path.
PROTOS/EARL achieves superior ROUGE and BLEU scores with zero parameters in an air-gapped environment where LLaMA cannot boot.