Evidence-first, not marketing-first.
Everything we claim is in the repo.
Nerviq Research publishes the data, the methodology, and the reproduction instructions for every claim we make. All artifacts below are CC0 or research-repo-native — clone, re-run, cite, or criticize.
State of AI Agent Governance — 2026 Q2
20 popular public OSS repos audited across 8 platforms with NERVIQ CLI v1.29.1. 85% of AI-forward projects ship an agent config; median score 60; AGENTS.md is the emergent portable format. CC0 licensed.
Studies, measurements, and policy
20-repo external audit dataset (v1 + v2)
Every audit score for every platform on every repo, plus per-category breakdowns. Reproducible: `python tools/run-external-audit-batch.py` re-runs the whole thing.
Harmony value — 7-archetype before/after
Harmony lift quantified across 7 OSS repos spanning starting-harmony 28-71. Low-harmony repos (<40) averaged +20.5 lift; repos at 60+ showed 0 movement. Lift is bounded by starting drift.
Self-dogfood — Nerviq audits itself
We ran `nerviq audit` on our own repos and published the full numbers, including the embarrassing ones: harmony 33/100 on the research repo, organic 66 on both. An audit tool that scores itself 100/100 is either fake or trivial.
Catalog verification session (+34 items in one day)
34 catalog items verified in one working day via the VMP tier-2 pipeline and API feature verifier, for ~$0.05 total API spend. Documents the full methodology from hypothesis to bump.
Synergy EXPERIMENTAL → BETA exit criteria
The public, pre-committed bar for promoting Nerviq Synergy out of its EXPERIMENTAL label. Three non-negotiable evidence requirements plus an explicit list of non-criteria (surveys, stars, press). Every criterion is falsifiable.
Measurement: Meta-prompting + Chain-of-Thought
Two catalog items (#63, #11) earned the 📏 Measured badge today via 2-archetype tier-4 measurements with cross-model judge. Both techniques showed the same pattern: null on simple tasks, +1.00 on complex — technique benefit lives where the failure mode lives.
Methodology — verification & measurement rules
The rubric behind ✅ Tested, 📏 Measured, and the badge award rules. Updated 2026-04-17 with the Automated API Feature Verification pattern that produced 34 catalog bumps in one day.
Reproduce any of this
Every measurement and audit can be re-run on your own machine with the Nerviq CLI and the research-repo scripts. Example: re-generate the full 20-repo dataset from scratch:
# 1. Install Nerviq CLI
npm i -g @nerviq/cli
# 2. Clone the research repo + install Python deps
git clone https://github.com/DnaFin/nerviq-research.git
cd nerviq-research
pip install -r requirements.txt
# 3. Reproduce the State-of-Governance dataset (~15 min)
python tools/run-external-audit-batch.py
# Output: research/external-audits/exp11-v2-<date>.{json,md}
# 4. Reproduce a tier-4 measurement (requires ANTHROPIC_API_KEY, ~$0.02)
python tools/tier4-runner.py MEAS-2026-04-001Full methodology + badge award rules: catalog/METHODOLOGY.md.
Want to cite this work? DnaFin/nerviq-research. CC0 where applicable; open an issue tagged [citation] if you want attribution guidance.