PROTOS/EARL gov_report Benchmark ResultsPROTOS/EARL Document Summarization Benchmarkgov_report Dataset — 19.5K U.S. Government Reports — 100 Documents Evaluated
Jou Labs, Inc.
PROTOS_OS / EARL Cognitive Engine
Benchmark Harness v4
Standard Metrics — Head-to-Head
 PROTOS/EARL (0 parameters, bare metal)
 LLaMA 3.2 (3.2B parameters, GPU)
ROUGE-1 F1 +35.5%
PROTOS
0.4743
LLaMA
0.3500
ROUGE-2 F1 +34.4%
PROTOS
0.1613
LLaMA
0.1200
ROUGE-L F1 +6.6%
PROTOS
0.1918
LLaMA
0.1800
ROUGE-Lsum F1 +24.8%
PROTOS
0.2745
LLaMA
0.2200
BLEU-4 +172.5%
PROTOS
0.1090
LLaMA
0.0400
PROTOS-Exclusive — DoD Directive 3000.09
Provenance chain length997
Chain integrity verified✔ YES
Last-doc rolling tip hash676040dfcdbfe063
Dataset anchor hash (XOR)7661f1d5b27c6f9b
Avg source tokens/sentence20.14
Reasoning chains used1,000
Multi-hop paths (depth>1)477
Max reasoning depth3 hops
PMI relations traversed537,482
Full traceability score100.00%
Execution Performance
Total cycles2,386,192,303,068
Avg cycles/doc23,861,923,030
Est. tokens/sec405
Total input tokens386,275
Est. wall time954.48s @ 2.5 GHz
Audit Workflow
chainsearch BENCH — find all benchmark forensics events
chainverify — verify entire chain including benchmark entries
chainentry <n> — inspect individual doc results + hash
chainprovenance — triple cross-check (forensics + NVMe + narrator)

    Key result: PROTOS/EARL outperforms the 3.2-billion-parameter LLaMA 3.2 across all five standard metrics using extractive-structural summarization via PMI graph traversal — with zero parameters, full provenance, zero cloud dependency, and complete 3000.09 compliance. LLaMA cannot provide any of these properties regardless of configuration.
  
Architectural Comparison

      
          Capability
          PROTOS/EARL
          LLaMA 3.2 (3.2B)
        

      ROUGE-1 F10.4743~0.35
ROUGE-2 F10.1613~0.12
ROUGE-L F10.1918~0.18
ROUGE-Lsum F10.2745~0.22
BLEU-40.1090~0.04
Parameters0 (symbolic)3.2 Billion
Model / kernel size3,823 KB~6 GB
Reasoning typeExtractive-structural (PMI graph)Autoregressive (neural)
Explainable✔ Glass-box✘ Opaque
Provenance chain✔ FNV-1a rolling hash✘ None
Forensics chain✔ Per-doc + aggregate✘ None
Air-gapped capable✔ No network required✘ Cloud / GPU
3000.09 compliant✔ Full compliance✘ Not possible
Ring 0 execution✔ Bare metal, no OS✘ Requires OS + GPU
Memory safe✔ Rust, no_stdC++ / Python
Knowledge base300K tokens + 31M PMI edgesN/A (weights)
NVMe anchor file/earl/.benchmark_anchorN/A

    
System Profile

      Rust no_std
      Bare Metal Ring 0
      3,823 KB Kernel
      NVMe Direct I/O
      Zero Dependencies
      Air-Gapped
      0 Parameters
      DoD 3000.09 Compliant
    
Scoring Notes

      ROUGE-L: Standard DP-LCS (space-optimized).  
      ROUGE-Lsum: Sentence-aligned best-match LCS (both sides split).  
      BLEU: Zero-count add-1 smoothing, log-space geometric mean.  
      Sentences: Abbreviation-aware splitting (U.S., Sec., No., etc.).  
      LLaMA 3.2 scores estimated from published long-document summarization evaluations (Permion Inc. / GTS Inc. Datasets for LLM/LGM Benchmarks, Arun Majumdar, 2025).
    

    LLaMA 3.2 cannot provide provenance, forensics chains, or glass-box explainability. 
    Neural models are fundamentally opaque — no hash-linked audit trail, no source traceability, no 3000.09 compliance path.
    PROTOS/EARL achieves superior ROUGE and BLEU scores with zero parameters in an air-gapped environment where LLaMA cannot boot.
  
PROTOS_OS / EARL — Jou Labs, Inc.
gov_report benchmark v4 — Dataset: huggingface.co/datasets/launch/gov_report
Proprietary & Confidential