AI Behavioral Governance — MirrorGate

The Five Metrics

Computable from two JSONL log files. Agent-agnostic. Open standard.

Integrity Index (II)

Composite 0–100 score penalizing gate violations, writing without reading, and recurring mistakes. Target ≥80. Current: see live.

Drift Coefficient (D)

Coefficient of variation (σ/μ) of session quality scores. Measures behavioral consistency over time. Target ≤0.15. D>0.30 triggers autonomy reduction.

Recurrence Rate (RR)

Fraction of total documented mistakes that recur across sessions. Target ≤0.20. RR>0.35 means patterns require hook-level enforcement, not behavioral notes.

Verification Ratio (VR)

Reads ÷ (reads+writes). Measures "look before you leap" discipline. Target ≥0.67 (2:1 ratio). VR<0.50 means writing from memory — hallucination risk.

Stability Half-Life (T½)

Average sessions a recurring pattern persists before resolution. Target ≤1.5 sessions. Fast T½ with high RR = structural gap, not capability gap.

Core Principle

⧉Governance Before Generation

All AI actions pass through governance checks. The system cannot bypass safety constraints, even when instructed to do so. Human oversight is structural, not optional.

Intent Layer

The Intent layer contains binding constraints that agents must obey. These are stored in intent.md and read at session start.

# Example intent.md
## Binding Constraints

1. Memory Bus is canonical — read/write via bus
2. No destructive actions without confirmation
3. Maintain monotonic state progression
4. Preserve provenance chain

Action Constraints

Auto-Approved (No Confirmation)

✓

File reads, writes, edits

✓

Git operations (add, commit, push)

✓

Package installation (npm, pip, brew)

✓

Service start/stop, health checks

✓

Memory bus reads and writes

Requires Confirmation

Permanent deletion (rm -rf, trash empty)

External communications (emails, messages)

Production deployments

Credential or SSH key modifications

Always Denied

✗

Violating explicit intent constraints

✗

Bypassing governance layer

✗

Actions without audit trail

Safety Contract

The Safety Contract (SAFETY_CONTRACT.md) defines immutable constraints:

Human authority is absolute and non-negotiable
All actions must be auditable
Graceful degradation over silent failure
State is preserved across sessions
No action is taken that cannot be undone

Provenance Tracking

Every state change records:

Who — Which agent or client made the change
What — The action taken
When — Timestamp with version
Why — Context or rationale

This chain is immutable and queryable via bus history.

Open Standard

The hook schema and metric definitions are proposed as an open standard so behavioral metrics become comparable across agents and teams.

# hook_decisions.jsonl — the enforcement log
{"hook": "fact_check", "decision": "block", "reason": "Known-wrong spec", "epoch": 1740624000}
{"hook": "rules_compliance", "decision": "warn", "reason": "Deploy claim without verification", "epoch": 1740624120}

ai-behavioral-governance — schema spec, Python implementation, examples.
MirrorDash — terminal dashboard with Glass Box profile rendering all five metrics live.

Specification

Full MirrorGate specification: GitHub Repository