Research

Evidence-first, not marketing-first.
Everything we claim is in the repo.

Nerviq Research publishes the data, the methodology, and the reproduction instructions for every claim we make. All artifacts below are CC0 or research-repo-native — clone, re-run, cite, or criticize.

1,014 / 1,193
Catalog items Tested (85.0%)
20
External OSS repos audited
7
📏 Measured items (live today)
29 / 29
Pipeline tests green
FeaturedFirst-of-kind public dataset

State of AI Agent Governance — 2026 Q2

20 popular public OSS repos audited across 8 platforms with NERVIQ CLI v1.29.1. 85% of AI-forward projects ship an agent config; median score 60; AGENTS.md is the emergent portable format. CC0 licensed.

Read on GitHub ↗

Studies, measurements, and policy

Raw data

20-repo external audit dataset (v1 + v2)

Every audit score for every platform on every repo, plus per-category breakdowns. Reproducible: `python tools/run-external-audit-batch.py` re-runs the whole thing.

Open on GitHub ↗
Measurement

Harmony value — 7-archetype before/after

Harmony lift quantified across 7 OSS repos spanning starting-harmony 28-71. Low-harmony repos (<40) averaged +20.5 lift; repos at 60+ showed 0 movement. Lift is bounded by starting drift.

Open on GitHub ↗
Credibility

Self-dogfood — Nerviq audits itself

We ran `nerviq audit` on our own repos and published the full numbers, including the embarrassing ones: harmony 33/100 on the research repo, organic 66 on both. An audit tool that scores itself 100/100 is either fake or trivial.

Open on GitHub ↗
Methodology demo

Catalog verification session (+34 items in one day)

34 catalog items verified in one working day via the VMP tier-2 pipeline and API feature verifier, for ~$0.05 total API spend. Documents the full methodology from hypothesis to bump.

Open on GitHub ↗
Policy

Synergy EXPERIMENTAL → BETA exit criteria

The public, pre-committed bar for promoting Nerviq Synergy out of its EXPERIMENTAL label. Three non-negotiable evidence requirements plus an explicit list of non-criteria (surveys, stars, press). Every criterion is falsifiable.

Open on GitHub ↗
Tier-4 runs

Measurement: Meta-prompting + Chain-of-Thought

Two catalog items (#63, #11) earned the 📏 Measured badge today via 2-archetype tier-4 measurements with cross-model judge. Both techniques showed the same pattern: null on simple tasks, +1.00 on complex — technique benefit lives where the failure mode lives.

Open on GitHub ↗
Foundation

Methodology — verification & measurement rules

The rubric behind ✅ Tested, 📏 Measured, and the badge award rules. Updated 2026-04-17 with the Automated API Feature Verification pattern that produced 34 catalog bumps in one day.

Open on GitHub ↗

Reproduce any of this

Every measurement and audit can be re-run on your own machine with the Nerviq CLI and the research-repo scripts. Example: re-generate the full 20-repo dataset from scratch:

# 1. Install Nerviq CLI
npm i -g @nerviq/cli

# 2. Clone the research repo + install Python deps
git clone https://github.com/DnaFin/nerviq-research.git
cd nerviq-research
pip install -r requirements.txt

# 3. Reproduce the State-of-Governance dataset (~15 min)
python tools/run-external-audit-batch.py
# Output: research/external-audits/exp11-v2-<date>.{json,md}

# 4. Reproduce a tier-4 measurement (requires ANTHROPIC_API_KEY, ~$0.02)
python tools/tier4-runner.py MEAS-2026-04-001

Full methodology + badge award rules: catalog/METHODOLOGY.md.

Want to cite this work? DnaFin/nerviq-research. CC0 where applicable; open an issue tagged [citation] if you want attribution guidance.