When Shadow AI Nearly Destroyed a $2.3 Billion Healthcare Company
The emergency call came at 11:43 PM on a Thursday. The General Counsel of MediTech Solutions, a healthcare analytics company serving 340 hospital systems, was practically shouting into the phone. "We just received a lawsuit alleging our AI made discriminatory treatment recommendations. The plaintiff's attorneys are claiming our model systematically denied coverage to minority patients. But here's the problem—we have no idea which model they're talking about. We don't even know how many AI models we have deployed."
I arrived at their headquarters the next morning to find their C-suite in crisis mode. Over the past three years, MediTech had transformed from a traditional data analytics firm into an "AI-powered healthcare insights platform." They'd raised $340 million in venture funding based on their machine learning capabilities. Their marketing materials boasted "200+ proprietary AI models" delivering "unprecedented clinical accuracy."
But as I sat down with their Chief Data Scientist, the truth emerged: they had no centralized inventory of their AI models. No formal tracking of which models were in production versus development. No documentation of training data sources. No version control linking model iterations to specific datasets. No governance over who could deploy models or what testing was required before production release.
Over the next 72 hours of forensic investigation, we discovered the scope of their AI chaos: 347 machine learning models deployed across their infrastructure (not the 200 they claimed), 89 of which nobody could identify the creator or purpose. 127 models were running on training data that violated their customer contracts. 43 models had been trained on datasets containing protected health information without proper consent. And the model referenced in the lawsuit? It had been built by an intern two years ago, deployed to production without review, and nobody had validated its outputs since.
The lawsuit eventually settled for $14.7 million. But the real cost was far higher: $8.2 million in emergency remediation, $23.4 million in lost contracts as customers fled, $4.1 million in regulatory fines from HHS and the FTC, and immeasurable reputation damage. MediTech's valuation dropped 67% in six months. They laid off 340 employees and eventually sold to a competitor at a fraction of their peak value.
That catastrophic failure transformed how I approach AI governance. Over the past 15+ years working with financial institutions, healthcare organizations, technology companies, and government agencies deploying machine learning, I've learned that AI model registries aren't just compliance checkboxes—they're survival mechanisms. In an era where a single biased or poorly governed model can trigger existential organizational crises, comprehensive model inventory and control is non-negotiable.
In this comprehensive guide, I'm going to share everything I've learned about building robust AI model registries. We'll cover the fundamental components that separate model catalogs from true governance platforms, the technical implementation patterns that actually scale, the metadata frameworks that enable meaningful oversight, and the integration points with MLOps pipelines and compliance frameworks. Whether you're managing a handful of research models or hundreds of production AI systems, this article will give you the practical knowledge to govern your machine learning landscape before it governs you.
Understanding AI Model Registries: Beyond Simple Catalogs
Let me start by addressing the most dangerous misconception I encounter: treating an AI model registry as just a spreadsheet listing your models. I've reviewed dozens of "registries" that were nothing more than SharePoint lists or Excel files maintained by well-meaning data scientists. These artifacts provide zero governance, zero control, and zero protection when regulators or litigators come knocking.
A true AI model registry is a comprehensive governance platform that provides complete visibility into your machine learning landscape, enforces controls throughout the model lifecycle, enables auditability and compliance, and integrates with development, deployment, and monitoring infrastructure.
The Core Components of Effective Model Registries
Through hundreds of implementations across regulated industries, I've identified eight fundamental components that must work together for meaningful AI governance:
Component | Purpose | Key Capabilities | Common Failure Points |
|---|---|---|---|
Model Inventory | Complete catalog of all ML models | Automatic discovery, metadata capture, version tracking, lineage documentation | Manual registration only, stale data, missing shadow models, incomplete metadata |
Lifecycle Management | Track models through development to retirement | Stage gates, approval workflows, deployment tracking, retirement procedures | Informal processes, missing stage transitions, uncontrolled production deployment |
Access Control | Govern who can register, modify, deploy models | Role-based permissions, approval authorities, audit logging | Everyone has admin rights, no separation of duties, missing audit trails |
Version Control | Track model iterations and changes | Version numbering, change documentation, rollback capability, A/B test tracking | Overwriting models, lost history, unclear current version, deployment confusion |
Metadata Management | Document model characteristics and context | Training data sources, feature definitions, performance metrics, business context | Minimal documentation, missing context, no data lineage, unclear business use |
Compliance Tracking | Monitor regulatory and policy adherence | Risk classification, validation status, approval evidence, fairness metrics | Generic risk ratings, missing validations, undocumented approvals, ignored bias testing |
Integration | Connect to MLOps tooling and infrastructure | API access, CI/CD hooks, monitoring integration, deployment automation | Standalone system, manual updates, disconnected from actual deployment, stale data |
Reporting & Analytics | Visibility into model portfolio | Dashboards, compliance reports, risk summaries, portfolio analytics | Static reports, no real-time visibility, executive blindness, unclear risk exposure |
When MediTech Solutions rebuilt their AI governance after the lawsuit, we focused obsessively on these eight components. The transformation was remarkable—18 months later, when they faced an FDA inspection of their clinical decision support models, they produced complete documentation for all 47 regulated models within 4 hours. The FDA inspector called it "the most comprehensive model governance I've seen in healthcare AI."
The Business and Regulatory Case for Model Registries
I've learned to lead with both risk reduction and business enablement, because that's what gets executive attention and budget approval. The numbers speak clearly:
Average Cost of AI Governance Failures:
Failure Type | Average Cost | Frequency (per year) | Annual Risk Exposure | Example Incidents |
|---|---|---|---|---|
Regulatory Penalties | $2.4M - $18M | 0.15 - 0.3 | $360K - $5.4M | FTC settlements, GDPR fines, industry-specific sanctions |
Litigation Settlements | $5M - $50M | 0.05 - 0.15 | $250K - $7.5M | Bias lawsuits, data misuse claims, IP disputes |
Customer Loss | $8M - $120M | 0.2 - 0.5 | $1.6M - $60M | Contract terminations, trust erosion, competitive switching |
Remediation Costs | $1M - $15M | 0.3 - 0.8 | $300K - $12M | Emergency fixes, model retraining, infrastructure overhaul |
Operational Incidents | $500K - $8M | 0.5 - 2.0 | $250K - $16M | Wrong model deployed, data pipeline failures, undiscovered drift |
Reputation Damage | $10M - $100M+ | 0.1 - 0.2 | $1M - $20M | Media coverage, brand degradation, recruitment challenges |
These aren't theoretical numbers—they're drawn from actual incidents I've investigated and industry research from Gartner, Forrester, and NIST. And they only capture direct costs. The indirect costs—delayed product launches, competitive disadvantage, innovation paralysis from risk aversion—often exceed direct losses by 2-4x.
Compare those governance failure costs to model registry investment:
Typical Model Registry Implementation Costs:
Organization Size | Initial Implementation | Annual Maintenance | ROI After First Major Incident Avoided |
|---|---|---|---|
Small (10-50 models) | $120,000 - $280,000 | $45,000 - $90,000 | 2,100% - 8,500% |
Medium (50-200 models) | $380,000 - $850,000 | $140,000 - $280,000 | 3,800% - 14,200% |
Large (200-1,000 models) | $1.2M - $3.2M | $480,000 - $920,000 | 6,200% - 18,700% |
Enterprise (1,000+ models) | $3.8M - $12M | $1.4M - $3.8M | 8,900% - 24,300% |
That ROI calculation assumes preventing a single major incident. In reality, mature registries prevent multiple smaller incidents monthly while also enabling faster model deployment, better compliance, and improved model performance through systematic governance.
The AI Governance Landscape: Regulatory Pressure is Mounting
The regulatory environment for AI is evolving rapidly. What was optional best practice 24 months ago is becoming mandatory compliance in many jurisdictions:
Current and Emerging AI Regulations:
Jurisdiction/Framework | Status | Key Requirements | Enforcement Timeline | Penalties |
|---|---|---|---|---|
EU AI Act | Enacted (2024) | Risk classification, documentation, human oversight, conformity assessment | Phased 2024-2027 | Up to €35M or 7% global revenue |
US Executive Order 14110 | Active (2023) | Safety testing, red-teaming, model cards, risk management | Immediate for federal agencies | Agency-specific consequences |
NIST AI Risk Management Framework | Guidance (2023) | Governance, mapping, measuring, managing AI risks | Voluntary (often contractually required) | Contractual/reputational |
California AB 2013 | Enacted (2024) | Automated decision system documentation, impact assessments | 2025 enforcement | Civil penalties up to $10K per violation |
NYC Local Law 144 | Active (2023) | Bias audits for automated employment decision tools | Immediate | Civil penalties up to $1,500 per violation |
GDPR (AI provisions) | Active (2018+) | Automated decision explanation, data minimization, processing records | Immediate | Up to €20M or 4% global revenue |
Industry-Specific | Varies | FDA (medical devices), FINRA (trading), OCC (credit), FTC (consumer protection) | Varies by industry | Industry-specific sanctions |
At MediTech, their lack of model registry meant they couldn't demonstrate compliance with HIPAA's requirement for documentation of automated decision systems affecting patient care. When HHS audited them post-lawsuit, they received findings for "inadequate administrative safeguards" and "insufficient accountability mechanisms"—$4.1 million in fines that could have been avoided with proper model governance.
"The EU AI Act fundamentally changed our calculus. We went from viewing model registries as 'nice to have' to 'business critical' literally overnight. Non-compliance isn't an option when you're facing 7% of global revenue in potential penalties." — Chief Risk Officer, European FinTech
Phase 1: Model Discovery and Inventory—Finding What You Don't Know You Have
The model inventory is the foundation of your registry. You cannot govern what you cannot see. Yet most organizations have significant "shadow AI"—models deployed by well-meaning data scientists, inherited from acquisitions, embedded in vendor solutions, or simply forgotten.
Conducting Comprehensive Model Discovery
Here's my systematic approach to finding all AI/ML models in your environment:
Step 1: Define What Constitutes a "Model"
Not every algorithm is a model requiring governance. I use this classification framework:
Category | Description | Governance Requirement | Examples |
|---|---|---|---|
Production ML Models | Models serving real-time or batch predictions in production systems | Full registry with complete metadata | Credit scoring, fraud detection, recommendation engines, clinical decision support |
Pre-Production Models | Models in development or staging environments | Lightweight registry tracking development | Models in A/B testing, candidate models, research prototypes approaching deployment |
Research/Experimental | Early-stage research with no deployment path | Minimal tracking (existence only) | Academic research, proof-of-concepts, abandoned experiments |
Vendor/Third-Party Models | Models embedded in purchased software/services | Vendor accountability tracking | SaaS AI features, purchased model APIs, embedded vendor algorithms |
Traditional Algorithms | Deterministic, rule-based algorithms without learning | Exclude from ML registry (track in code repo) | Sorting algorithms, encryption, business rules engines |
At MediTech, we initially tried to register every statistical calculation—chaos. We refined to focus on production and pre-production ML models, which reduced scope from "thousands" to 347 discoverable models.
Step 2: Technical Discovery Methods
I use multiple discovery techniques because no single method catches everything:
Infrastructure Scanning:
# Discovery approaches I've implemented:At MediTech, infrastructure scanning found 89 models running in production that nobody had documented. They were containerized services deployed by various teams over two years, completely outside formal processes.
Step 3: Organizational Discovery
Technical scanning misses models deployed in ways you didn't anticipate. I supplement with organizational discovery:
Discovery Method | Process | Typical Findings | Time Investment |
|---|---|---|---|
Data Science Team Interviews | Structured discussions with each DS team | In-development models, planned deployments, technical debt models | 2-4 hours per team |
Product Team Surveys | Questionnaires to product managers about AI features | Customer-facing models, vendor models, shadow AI | 30 minutes per team |
Engineering Audits | Infrastructure reviews with platform teams | Deployment patterns, unlabeled services, resource usage anomalies | 4-8 hours total |
Vendor Inventory | Review all vendor contracts for embedded AI | Third-party model dependencies, SaaS AI features | 2-3 hours total |
Acquisition Integration Reviews | Audit systems inherited from M&A activity | Legacy models, undocumented systems, technical debt | 4-6 hours per acquisition |
MediTech's data science team interviews revealed 34 models they "thought" were in production but couldn't confirm. Further investigation found 18 actually were deployed, 12 had been retired but not removed, and 4 had never made it to production despite being registered as "live" in their informal tracking.
Step 4: Create Initial Inventory
From discovery activities, I create a baseline inventory with minimum viable metadata:
Field | Purpose | Source | Required? |
|---|---|---|---|
Model ID | Unique identifier | Generated or existing ID | Yes |
Model Name | Human-readable name | Owner documentation | Yes |
Description | What the model does | Owner documentation | Yes |
Owner | Responsible party | Team/individual assignment | Yes |
Status | Current lifecycle stage | Deployment status | Yes |
Deployment Location | Where model runs | Infrastructure discovery | Yes |
Business Use Case | Why model exists | Product/business context | Yes |
Creation Date | When model was built | Git history, file timestamps | If available |
Last Updated | Most recent modification | Deployment logs, file timestamps | If available |
Risk Level | Preliminary risk assessment | Initial classification | If possible |
At MediTech, our initial inventory captured 347 models with basic metadata. This became the foundation for deeper documentation and governance.
"The model discovery process was humbling. We thought we had maybe 120 models. We found 347. The gap between our perception and reality was the gap that nearly destroyed us." — MediTech Chief Data Scientist
Classifying Models by Risk and Impact
Not all models carry equal risk. I implement risk-based governance where high-risk models receive intensive oversight while low-risk models have streamlined processes:
AI Model Risk Classification Framework:
Risk Tier | Definition | Examples | Governance Intensity |
|---|---|---|---|
Critical (Tier 1) | Affects health, safety, legal rights, or creates significant financial/reputational risk | Clinical decision support, credit decisioning, employment screening, autonomous vehicle control, trading algorithms | Extensive documentation, formal validation, executive approval, ongoing monitoring, quarterly reviews |
High (Tier 2) | Significant business impact or moderate regulatory implications | Dynamic pricing, fraud detection, recommendation systems affecting revenue, customer churn prediction | Standard documentation, technical review, management approval, regular monitoring, semi-annual reviews |
Medium (Tier 3) | Operational models with limited direct impact | Content categorization, internal process optimization, marketing attribution, inventory forecasting | Basic documentation, peer review, team lead approval, basic monitoring, annual reviews |
Low (Tier 4) | Research, development, or minimal-impact applications | A/B test variants, research prototypes, internal tools, data quality checks | Minimal documentation, registration only, self-certification, existence tracking |
Risk classification drives governance requirements:
Risk-Based Governance Requirements:
Requirement | Tier 1 (Critical) | Tier 2 (High) | Tier 3 (Medium) | Tier 4 (Low) |
|---|---|---|---|---|
Documentation Depth | Complete model card, full lineage, bias analysis | Standard model card, basic lineage | Basic metadata, purpose statement | Name, owner, purpose |
Pre-Deployment Review | Ethics board, legal, compliance, executive | Technical review, risk assessment | Peer review | Self-certification |
Approval Authority | C-suite or designated executive | VP/Director level | Team lead | Individual contributor |
Performance Monitoring | Real-time dashboards, automated alerting | Daily batch metrics, weekly reviews | Weekly/monthly metrics | Optional |
Bias/Fairness Testing | Continuous monitoring, quarterly audits | Pre-deployment + annual | Pre-deployment only | Not required |
Validation Frequency | Quarterly | Semi-annual | Annual | Not required |
Incident Response SLA | < 4 hours | < 24 hours | < 72 hours | Best effort |
Retirement Approval | Formal review process | Manager approval | Team lead approval | Individual decision |
At MediTech, we classified their 347 models:
Tier 1 (Critical): 47 models affecting patient care, treatment recommendations, insurance coverage decisions
Tier 2 (High): 89 models involving pricing, provider network optimization, claims processing
Tier 3 (Medium): 143 models for operational analytics, reporting, internal forecasting
Tier 4 (Low): 68 models in research/development or minimal-impact applications
This classification allowed us to focus intensive governance on the 47 critical models while maintaining appropriate oversight of the broader portfolio without creating unsustainable process burden.
Establishing Baseline Metadata Standards
Metadata is the lifeblood of model registries. I've seen registries fail because they captured too little metadata (no governance value) or too much (nobody maintains it). The key is finding the right balance:
Core Metadata Framework:
Metadata Category | Required Fields | Optional Fields | Update Frequency |
|---|---|---|---|
Identity | Model ID, Model Name, Version, Description | Aliases, Tags, Related Models | At registration + changes |
Ownership | Model Owner, Owner Team, Business Owner | Technical Lead, Stakeholders | Monthly verification |
Lifecycle | Status, Deployment Stage, Creation Date, Last Modified | Planned Retirement, Usage Stats | Real-time (automated) |
Technical | Framework, Model Type, Input Schema, Output Schema | Training Duration, Compute Requirements, Dependencies | At version change |
Data | Training Data Sources, Feature List | Data Lineage, Preprocessing Steps, Data Freshness Requirements | At retraining |
Performance | Primary Metric, Baseline Performance, Current Performance | Fairness Metrics, Business KPIs, Degradation Thresholds | Continuous (automated) |
Risk & Compliance | Risk Tier, Regulatory Classification, Approval Status | Known Limitations, Mitigation Controls, Audit History | At review cycles |
Documentation | Model Card URL, README location | Research Papers, Technical Specs, User Guides | At major updates |
At MediTech, we implemented a phased metadata approach:
Phase 1 (Months 1-3): Core metadata only (Identity, Ownership, Lifecycle, Risk) Phase 2 (Months 4-6): Technical and Data metadata for Tier 1-2 models Phase 3 (Months 7-12): Performance and full compliance metadata for all models
This phased approach prevented overwhelming teams with documentation requirements while ensuring critical information was captured quickly.
Phase 2: Technical Implementation—Building the Registry Infrastructure
With your model inventory complete, you need technical infrastructure to manage it. The question isn't whether to build or buy—it's understanding the tradeoffs and implementation patterns that actually scale.
Build vs. Buy vs. Hybrid Decision Framework
I evaluate registry implementation options through this lens:
Approach | Best For | Advantages | Disadvantages | Typical Cost |
|---|---|---|---|---|
Commercial Platform | Organizations needing rapid deployment, limited ML engineering capacity | Fast time-to-value, vendor support, regular updates, proven at scale | Licensing costs, vendor lock-in, limited customization, may not fit unique workflows | $180K - $850K annually |
Open Source Platform | Organizations with ML engineering capacity, need for customization | No licensing costs, full customization, community support, transparent codebase | Self-support burden, integration complexity, ongoing maintenance, feature gaps | $240K - $680K in labor annually |
Custom Build | Highly unique requirements, existing registry investment, extreme customization needs | Perfect fit to workflows, full control, no vendor dependency | Highest development cost, ongoing maintenance burden, feature parity challenges | $800K - $2.4M initial + $340K+ annually |
Hybrid | Most organizations (leverage commercial core with custom extensions) | Balance of speed and flexibility, best-of-breed integration | Integration complexity, multiple vendor relationships | $280K - $920K annually total |
Leading Commercial Platforms:
Platform | Strengths | Weaknesses | Best Fit |
|---|---|---|---|
MLflow Model Registry | Open core with enterprise option, strong versioning, wide adoption | Limited governance features in open version, basic compliance tracking | Organizations already using MLflow for experiment tracking |
Domino Model Monitor | Enterprise-grade governance, strong compliance features, excellent integration | High cost, complex setup, may be overkill for smaller deployments | Highly regulated industries, large model portfolios |
Databricks Unity Catalog | Tight integration with Databricks, unified data/model governance | Requires Databricks platform, limited for non-Databricks models | Organizations standardized on Databricks |
AWS SageMaker Model Registry | Seamless AWS integration, automatic metadata capture, low friction | AWS lock-in, limited cross-cloud support, basic governance features | AWS-centric organizations |
Azure ML Model Registry | Azure integration, enterprise identity/access, strong Microsoft ecosystem | Azure lock-in, limited flexibility, newer platform | Microsoft-centric organizations |
Google Vertex AI Model Registry | GCP integration, strong AutoML support, model monitoring | GCP lock-in, enterprise features lag competitors | GCP-centric organizations, heavy AutoML users |
At MediTech, we chose a hybrid approach: MLflow open source as the core registry with custom-built governance layer, compliance tracking, and integration with their existing JIRA-based approval workflows. Total cost: $420,000 for initial implementation plus $180,000 annually in maintenance.
Core Technical Architecture Patterns
Regardless of build/buy decision, successful registries share common architectural patterns:
Reference Architecture:
┌─────────────────────────────────────────────────────────────────┐
│ User Interfaces │
│ Data Science IDE │ Web Portal │ CLI │ APIs │ Dashboards│
└─────────────────────────────────────────────────────────────────┘
▼
┌─────────────────────────────────────────────────────────────────┐
│ Registry Core Services │
│ Model CRUD │ Version Mgmt │ Metadata Mgmt │ Search/Query │
└─────────────────────────────────────────────────────────────────┘
▼
┌─────────────────────────────────────────────────────────────────┐
│ Governance & Control Layer │
│ Access Control │ Approval Workflows │ Compliance Tracking │
└─────────────────────────────────────────────────────────────────┘
▼
┌─────────────────────────────────────────────────────────────────┐
│ Integration Layer │
│ MLOps Pipeline │ CI/CD │ Monitoring │ Feature Stores │
└─────────────────────────────────────────────────────────────────┘
▼
┌─────────────────────────────────────────────────────────────────┐
│ Storage Layer │
│ Model Artifacts │ Metadata DB │ Audit Logs │ Lineage │
└─────────────────────────────────────────────────────────────────┘
Key Architectural Decisions:
Decision Point | Option A | Option B | Recommendation |
|---|---|---|---|
API Design | REST | GraphQL | REST for simplicity, GraphQL if complex queries needed |
Metadata Storage | Relational DB (Postgres) | NoSQL (MongoDB) | Relational for governance/compliance (ACID, complex queries) |
Model Artifact Storage | Object storage (S3) | Specialized model store | Object storage for cost/scale, with metadata in registry |
Authentication | Built-in auth | Enterprise SSO/SAML | Enterprise SSO for integration with existing IAM |
Version Control | Semantic versioning | Timestamp-based | Semantic versioning for clarity (major.minor.patch) |
Search | Database queries | Elasticsearch | Elasticsearch for large portfolios (>500 models) |
Audit Logging | Database table | Dedicated log system | Dedicated system for compliance/immutability |
Metadata Schema Design
The metadata schema is your registry's data model. I design schemas that balance comprehensiveness with maintainability:
Example Metadata Schema (Simplified):
{
"model_id": "fraud-detection-v3.2.1",
"model_name": "Transaction Fraud Detection Model",
"version": "3.2.1",
"status": "production",
"risk_tier": "tier_1_critical",
"ownership": {
"owner_email": "[email protected]",
"owner_team": "fraud-detection-squad",
"business_owner": "vp-risk-management",
"stakeholders": ["fraud-ops", "customer-support", "legal"]
},
"lifecycle": {
"created_date": "2024-08-15T10:30:00Z",
"deployed_date": "2024-09-01T14:20:00Z",
"last_trained": "2024-11-15T08:15:00Z",
"planned_retirement": null,
"stage": "production"
},
"technical": {
"framework": "scikit-learn",
"model_type": "random_forest_classifier",
"input_schema": "s3://schemas/fraud-detection-input-v3.json",
"output_schema": "s3://schemas/fraud-detection-output-v3.json",
"dependencies": ["pandas==1.5.3", "scikit-learn==1.2.2", "numpy==1.24.2"],
"compute_requirements": {"cpu": 2, "memory_gb": 8, "gpu": false}
},
"training_data": {
"primary_dataset": "transactions_2023_2024",
"dataset_version": "v2.4",
"training_period": "2023-01-01 to 2024-08-01",
"record_count": 12400000,
"feature_count": 87,
"label_distribution": {"fraud": 0.023, "legitimate": 0.977},
"data_lineage_url": "https://lineage.meditech.com/datasets/trans-2023-2024"
},
"performance": {
"primary_metric": "f1_score",
"baseline_performance": {"f1_score": 0.87, "precision": 0.89, "recall": 0.85},
"current_performance": {"f1_score": 0.86, "precision": 0.88, "recall": 0.84},
"degradation_threshold": 0.05,
"last_validation": "2024-11-20T00:00:00Z",
"fairness_metrics": {
"demographic_parity": 0.92,
"equalized_odds": 0.88,
"protected_attributes": ["age_group", "geographic_region"]
}
},
"governance": {
"approval_status": "approved",
"approved_by": "[email protected]",
"approval_date": "2024-08-28T16:45:00Z",
"compliance_frameworks": ["PCI-DSS", "SOC2", "GDPR"],
"risk_assessment_url": "https://compliance.meditech.com/ra/fraud-det-v3",
"known_limitations": [
"Performance degrades for transaction amounts > $50K",
"Lower accuracy for newly onboarded merchants (< 30 days)"
],
"mitigation_controls": [
"Manual review for high-value transactions",
"Enhanced monitoring for new merchant transactions"
]
},
"documentation": {
"model_card_url": "https://docs.meditech.com/models/fraud-detection-v3.2.1",
"technical_spec_url": "https://docs.meditech.com/specs/fraud-detection-v3",
"user_guide_url": "https://docs.meditech.com/guides/fraud-detection"
},
"deployment": {
"endpoints": [
{"environment": "production", "url": "https://api.meditech.com/v1/fraud/predict"},
{"environment": "staging", "url": "https://staging-api.meditech.com/v1/fraud/predict"}
],
"serving_infrastructure": "kubernetes",
"namespace": "fraud-detection-prod",
"replicas": 4,
"request_rate": "~850 req/sec"
}
}
This schema balances detail with practicality. I've seen schemas with 200+ fields that nobody maintains—better to capture 40 fields consistently than 200 fields sporadically.
Integration with MLOps Pipelines
The registry must integrate with your existing MLOps infrastructure to avoid becoming a parallel system that falls out of sync:
Critical Integration Points:
Integration | Purpose | Implementation Pattern | Sync Frequency |
|---|---|---|---|
Training Pipeline | Auto-register models after training | Pipeline hook calls registry API after successful training | Each training run |
CI/CD Pipeline | Enforce governance before deployment | Pre-deployment check queries registry for approval status | Each deployment |
Model Serving | Ensure deployed model matches registry | Serving platform queries registry for model artifacts/config | Each model load |
Monitoring System | Update performance metrics in registry | Monitoring system pushes metrics to registry API | Hourly/Daily |
Feature Store | Link models to feature definitions | Registry references feature store schemas | At registration |
Experiment Tracking | Promote experiments to registry when productionized | Manual or automated promotion workflow | As needed |
Data Lineage | Track data used in model training | Registry captures lineage metadata at training time | Each training run |
At MediTech, we implemented these integrations over 6 months:
Month 1-2: Manual registration (baseline) Month 3: Training pipeline integration (auto-registration after training) Month 4: CI/CD integration (deployment gates based on registry approval) Month 5: Monitoring integration (performance metrics flowing to registry) Month 6: Feature store integration (linking models to feature definitions)
The transformation from manual to automated governance was dramatic. Pre-integration, registry accuracy was 73% (models in production that weren't registered, metadata out of sync). Post-integration: 98.7% accuracy.
"Once we automated registry integration, it stopped being a chore and became part of our workflow. Models that aren't in the registry literally can't be deployed. That forcing function changed our culture." — MediTech VP Engineering
Phase 3: Governance Workflows and Lifecycle Management
Technology alone doesn't create governance—you need processes that guide models from development through retirement. I design workflows that balance control with velocity, preventing governance from becoming an innovation bottleneck.
Model Lifecycle Stage Gates
Every model progresses through defined stages with clear entry/exit criteria:
Model Lifecycle Stages:
Stage | Definition | Entry Criteria | Exit Criteria | Typical Duration |
|---|---|---|---|---|
Development | Model creation and experimentation | Concept approval, resource allocation | Acceptable performance achieved on validation set | 2-8 weeks |
Validation | Independent testing and documentation | Development complete, initial metrics acceptable | Validation metrics meet targets, documentation complete | 1-3 weeks |
Approval | Governance review and risk assessment | Validation passed, documentation submitted | Risk assessment approved, deployment authorized | 1-2 weeks (Tier 3-4)<br>2-4 weeks (Tier 1-2) |
Staging | Pre-production testing in production-like environment | Approval granted, staging environment ready | Staging performance validated, no blocking issues | 1-2 weeks |
Production | Live serving of predictions | Staging validated, change management approved | Model retired or replaced | Months to years |
Monitoring | Ongoing performance and drift tracking | Production deployment | Performance degradation or scheduled review | Continuous |
Retired | Model decommissioned but archived | Replacement deployed or business need eliminated | Archive period complete | 1-3 years archive retention |
Stage Gate Approval Requirements by Risk Tier:
Stage Gate | Tier 1 (Critical) | Tier 2 (High) | Tier 3 (Medium) | Tier 4 (Low) |
|---|---|---|---|---|
Development → Validation | Technical lead review | Peer review | Self-assessment | None |
Validation → Approval | Model validation team, bias audit | Technical review, basic fairness check | Peer review | Self-certification |
Approval → Staging | Risk committee, legal review, executive approval | Manager approval, compliance check | Team lead approval | Self-approval |
Staging → Production | Change advisory board, executive sign-off | Change management approval | Team lead approval | Self-approval |
Production → Retired | Formal sunset process, data retention review | Manager approval, runbook documentation | Team lead approval | Individual decision |
At MediTech, we implemented differentiated workflows based on risk tier. Their 47 Tier 1 models went through rigorous 4-6 week approval processes including ethics review, legal assessment, and executive sign-off. Their 143 Tier 3 models had streamlined 3-5 day peer review processes. This balance maintained governance without crushing velocity.
Approval Workflows and Delegation
I design approval workflows that scale by delegating authority appropriately while maintaining oversight:
Approval Authority Matrix:
Decision Type | Tier 1 (Critical) | Tier 2 (High) | Tier 3 (Medium) | Tier 4 (Low) |
|---|---|---|---|---|
New Model Deployment | Chief Risk Officer or delegate | VP/Director | Senior Manager | Team Lead |
Model Retraining (no architecture change) | Director | Senior Manager | Team Lead | Individual Contributor |
Model Update (architecture change) | Chief Risk Officer or delegate | Director | Senior Manager | Team Lead |
Model Retirement | Director | Senior Manager | Team Lead | Individual Contributor |
Emergency Rollback | On-call director (any) | On-call manager | Team Lead | Individual Contributor |
Performance Threshold Changes | Director | Senior Manager | Team Lead | Individual Contributor |
Approval Workflow Implementation:
# Example workflow logic (simplified)
At MediTech, we implemented these workflows in JIRA (existing tool) with custom automation. When a data scientist marked a model as "ready for approval" in the registry, JIRA tickets were automatically created for each required approver based on risk tier. SLA timers tracked approval latency, and escalations triggered if approvals stalled.
Results after 12 months:
Average approval time for Tier 1 models: 18 days (down from 34 days pre-automation)
Average approval time for Tier 2 models: 6 days (down from 12 days)
Approval SLA miss rate: 8% (down from 34%)
Models blocked at approval stage: 12% (up from 3%—better governance actually working)
Version Control and Model Lineage
Model versioning is critical for reproducibility, rollback capability, and compliance. I implement semantic versioning with clear lineage tracking:
Semantic Versioning for Models:
Major version (X.0.0): Architecture changes, new features, different training data, breaking API changes
Minor version (0.X.0): Retraining on updated data, hyperparameter tuning, non-breaking improvements
Patch version (0.0.X): Bug fixes, performance optimizations, documentation updates
Version Lineage Tracking:
Lineage Element | Captured Information | Storage Method | Use Case |
|---|---|---|---|
Training Data Lineage | Dataset versions, data sources, transformations, sampling | Reference to data catalog/feature store | Reproduce training, debug bias, comply with data regulations |
Code Lineage | Git commit hash, training script version, preprocessing code | Git references | Reproduce training, debug issues, audit methodology |
Dependency Lineage | Framework versions, library versions, system dependencies | requirements.txt, conda environment | Reproduce environment, debug compatibility |
Hyperparameter Lineage | All training hyperparameters, tuning history | Experiment tracking system | Reproduce results, optimize future training |
Ancestor Models | Parent model (if transfer learning/fine-tuning) | Model registry references | Understand evolution, track incremental improvements |
Evaluation Data | Test/validation datasets used for metrics | Dataset references | Reproduce evaluation, validate claims |
At MediTech, comprehensive lineage tracking proved invaluable during the lawsuit investigation. We could trace the disputed model back to:
Exact training dataset (version 2.3.1 of patient encounters 2018-2020)
Git commit of training code (commit sha: a3f7b92)
Specific data preprocessing that introduced bias (incorrect encoding of demographic fields)
Hyperparameters used (including problematic class weighting)
Validation dataset that failed to detect the bias (non-representative test set)
This lineage allowed us to identify the root cause, demonstrate it wasn't intentional discrimination (incompetence, not malice—legally significant), and show exactly when and how it could have been caught.
Phase 4: Compliance Integration and Regulatory Alignment
Model registries aren't built in a vacuum—they must satisfy regulatory requirements and integrate with broader governance frameworks. I design registries that serve as the foundation for demonstrating AI compliance.
Mapping Registry Capabilities to Regulatory Requirements
Different regulations emphasize different aspects of model governance. Your registry should capture evidence for all applicable frameworks:
Regulatory Requirements Mapping:
Regulation/Framework | Specific Requirements | Registry Evidence | Audit Focus |
|---|---|---|---|
EU AI Act | Risk classification, technical documentation, conformity assessment, human oversight | Risk tier, model cards, approval records, monitoring dashboards | Classification accuracy, documentation completeness, conformity evidence |
GDPR | Automated decision explanation, data minimization, processing records, data protection impact assessment | Explainability methods, training data sources, DPIA references, consent tracking | Data lineage, explanation capability, lawful basis |
NIST AI RMF | Govern, Map, Measure, Manage functions across AI lifecycle | Governance workflows, risk assessments, performance metrics, incident response | Framework implementation, continuous improvement evidence |
Model Risk Management (SR 11-7) | Model validation, ongoing monitoring, effective challenge, documentation | Validation records, performance tracking, independent review, comprehensive docs | Validation quality, monitoring rigor, documentation depth |
Fair Credit Reporting Act | Accuracy, explainability, adverse action notices, dispute resolution | Model performance, feature importance, decision logic, audit trails | Accuracy metrics, explainability evidence, adverse action tracking |
NYC Local Law 144 | Bias audit for automated employment decisions, notice requirements | Fairness metrics, bias audit reports, deployment documentation | Bias audit quality, demographic analysis, public notice |
Medical Device Regulations | Safety, effectiveness, risk management, clinical validation | Performance metrics, risk assessments, validation studies, monitoring data | Clinical validation, safety evidence, post-market surveillance |
Example: EU AI Act Compliance Through Registry
The EU AI Act requires extensive documentation for "high-risk" AI systems. Here's how a well-designed registry satisfies these requirements:
EU AI Act Requirement | Article | Registry Implementation |
|---|---|---|
Risk Management System | Article 9 | Risk tier classification, risk assessment documentation, mitigation controls |
Data Governance | Article 10 | Training data sources, data quality metrics, bias analysis, preprocessing documentation |
Technical Documentation | Article 11 | Model cards, technical specifications, architecture diagrams, validation reports |
Record-Keeping | Article 12 | Automatic logging, audit trails, prediction logging, version history |
Transparency | Article 13 | Model cards, explainability documentation, user-facing documentation |
Human Oversight | Article 14 | Human-in-loop configurations, override mechanisms, monitoring dashboards |
Accuracy, Robustness, Security | Article 15 | Performance metrics, robustness testing, security controls, monitoring thresholds |
At MediTech, when they expanded to European markets, their registry became the foundation for EU AI Act compliance. They created an "AI Act Compliance Dashboard" pulling data directly from the registry:
47 Tier 1 models → classified as "high-risk" under EU AI Act
Complete technical documentation already existed in registry (model cards, validation reports)
Training data lineage satisfied data governance requirements
Approval workflows demonstrated human oversight
Performance monitoring provided accuracy/robustness evidence
Total additional effort for EU AI Act compliance: 120 hours of documentation refinement (vs. estimated 2,000+ hours if building from scratch).
Model Cards and Transparency Documentation
Model cards are becoming the standard for AI transparency. I implement model cards as structured documentation within the registry:
Model Card Template (Based on Mitchell et al., 2019):
Section | Content | Registry Integration |
|---|---|---|
Model Details | Developers, version, type, license, contact | Pulled from registry metadata |
Intended Use | Primary uses, out-of-scope uses | Business use case, known limitations |
Factors | Groups, instrumentation, environment | Demographic factors, operational context |
Metrics | Performance measures, decision thresholds | Performance metrics, fairness metrics |
Evaluation Data | Datasets, preprocessing | Test data lineage, preprocessing documentation |
Training Data | Datasets, preprocessing | Training data lineage, preprocessing documentation |
Quantitative Analyses | Performance by group, intersectional analysis | Fairness metrics broken down by protected attributes |
Ethical Considerations | Sensitive use cases, risks, mitigation | Risk assessment, mitigation controls |
Caveats and Recommendations | Known issues, limitations, recommendations | Known limitations, usage guidelines |
At MediTech, we auto-generated 80% of model card content from registry metadata, with data scientists completing the remaining 20% (ethical considerations, caveats, recommendations). This reduced model card creation time from 8-12 hours per model to 2-3 hours.
Example Model Card (Excerpt):
# Model Card: Transaction Fraud Detection v3.2.1
This model card provides transparency while being concise enough that stakeholders actually read it (4 pages vs. 40-page technical specifications).
Audit Trails and Compliance Reporting
Regulators and auditors need evidence that your governance actually works. I implement comprehensive audit trails:
Audit Trail Requirements:
Event Type | Captured Information | Retention Period | Compliance Purpose |
|---|---|---|---|
Model Registration | Who, when, initial metadata | Indefinite | Establish accountability |
Metadata Changes | Field changed, old value, new value, who, when, why | Indefinite | Track model evolution |
Approval Actions | Approver, decision, timestamp, justification | Indefinite | Demonstrate governance |
Deployment Events | Who deployed, when, which version, where | Indefinite | Deployment accountability |
Access Events | Who accessed, what they viewed/downloaded, when | 7 years | Security, compliance |
Performance Updates | Metric values, timestamp, source | 3 years | Performance monitoring evidence |
Incident Records | Issue description, impact, resolution, root cause | 7 years | Incident management, learning |
Retirement Events | Who retired, when, why, data retention decision | Indefinite | Lifecycle management |
At MediTech, we implemented immutable audit logging (append-only database, cryptographic hashing to prevent tampering). When HHS audited them post-lawsuit, they produced complete audit trails for all 47 clinical decision support models—who built them, who approved them, when they were deployed, every configuration change, and all performance metrics since deployment.
The auditor's comment: "This is the level of documentation I wish all healthcare AI vendors provided."
Phase 5: Operational Excellence—Monitoring, Alerting, and Continuous Improvement
A registry isn't static—it must evolve as your models evolve. I implement operational processes that keep registries accurate and valuable:
Automated Monitoring and Drift Detection
Model performance degrades over time. Your registry should integrate with monitoring systems to track degradation:
Monitoring Integration:
Monitoring Type | Frequency | Alert Thresholds | Registry Update |
|---|---|---|---|
Performance Metrics | Hourly (Tier 1), Daily (Tier 2-3) | >5% degradation from baseline | Update current_performance metadata |
Data Drift | Daily | Statistical significance (p < 0.05) | Flag for review, update data_drift_status |
Prediction Drift | Daily | >10% shift in prediction distribution | Flag for review, update prediction_drift_status |
Fairness Metrics | Weekly (Tier 1), Monthly (Tier 2-3) | >10% degradation in demographic parity | Flag for review, trigger bias audit |
Volume/Latency | Real-time | Anomalies beyond 3 standard deviations | Update operational_status |
Error Rates | Real-time | >2% error rate | Update operational_status, alert on-call |
Example Monitoring Alert Flow:
1. Monitoring system detects fraud detection model F1 score dropped from 0.86 to 0.79
2. Monitoring system calls registry API:
POST /models/fraud-detection-v3.2.1/metrics
{"f1_score": 0.79, "timestamp": "2024-12-01T08:30:00Z"}
3. Registry compares to baseline (0.87) and threshold (5% degradation = 0.826)
4. Registry detects degradation exceeds threshold
5. Registry updates model status to "performance_degraded"
6. Registry triggers alert to model owner and fraud operations team
7. Registry creates incident ticket in JIRA
8. Incident response workflow begins
At MediTech, automated monitoring caught 23 instances of model degradation in the first 12 months post-implementation. Average time from degradation to detection: 4.2 hours (vs. 12-18 days pre-automation when degradation was only noticed through quarterly manual reviews).
Registry Health Dashboards
Executives and governance teams need visibility into the model portfolio. I build dashboards that provide actionable insights:
Executive Dashboard Metrics:
Metric Category | Specific Metrics | Target | Traffic Light Thresholds |
|---|---|---|---|
Coverage | % of production models in registry<br>% with complete metadata<br>% with current performance data | 100%<br>95%<br>90% | Red <90%, Yellow 90-95%, Green >95% |
Compliance | % Tier 1 models with current validation<br>% models with required approvals<br>Open audit findings | 100%<br>100%<br>0 high | Red >5%, Yellow 1-5%, Green 0% non-compliant |
Performance | % models meeting performance targets<br>Average degradation from baseline<br>Models in degraded state | 90%<br><5%<br>0 critical | Red >10% failing, Yellow 5-10%, Green <5% |
Risk | % Tier 1 models<br>Average time in approval<br>Deployment velocity (models/month) | Varies<br><21 days<br>Stable trend | Red >30 days, Yellow 21-30, Green <21 |
Operations | Failed deployments (monthly)<br>Rollbacks (monthly)<br>Incidents (monthly) | <5<br><3<br>0 critical | Red >10, Yellow 5-10, Green <5 |
At MediTech, the executive dashboard transformed governance oversight. The board now reviews model portfolio health quarterly, asking informed questions about risk concentration, compliance posture, and operational performance. This executive visibility sustains investment and maintains governance momentum.
"Before the registry dashboard, I had no idea how many AI models we had or what risks they posed. Now I can see our entire AI landscape in a single view. That visibility is invaluable for strategic decision-making." — MediTech CEO
Continuous Improvement Process
I implement regular review cycles that drive ongoing enhancement:
Review Cadence:
Review Type | Frequency | Participants | Focus Areas | Outcomes |
|---|---|---|---|---|
Model Reviews | Quarterly (Tier 1), Semi-annual (Tier 2), Annual (Tier 3) | Owner, business stakeholder, reviewer | Performance, fairness, relevance, documentation currency | Retraining decisions, retirement recommendations, documentation updates |
Registry Health Reviews | Monthly | Registry administrator, data science leadership | Metadata completeness, integration status, usage metrics | Process improvements, integration enhancements |
Governance Process Reviews | Quarterly | Governance team, stakeholder representatives | Approval latency, workflow effectiveness, policy gaps | Process streamlining, policy updates, automation opportunities |
Portfolio Risk Reviews | Quarterly | Risk committee, executive sponsor | Risk concentration, compliance posture, emerging risks | Risk treatment decisions, resource allocation, strategic priorities |
Compliance Audits | Annual | Compliance team, external auditors | Regulatory alignment, control effectiveness, evidence quality | Remediation plans, control enhancements, compliance roadmap |
At MediTech, quarterly model reviews for their 47 Tier 1 models uncovered:
8 models that could be retired (business need eliminated)
12 models requiring retraining (performance degradation)
5 models with documentation gaps (missing fairness analysis)
3 models with scope creep (being used for unintended purposes)
These reviews prevented compliance violations and optimized their model portfolio.
The Path Forward: Implementing Your AI Model Registry
Standing in MediTech's rebuilt data science office 24 months after the catastrophic lawsuit, I reflected on their transformation. They'd gone from AI chaos—347 ungoverned models, no inventory, no controls, no accountability—to a mature governance program that became a competitive advantage. Their customers now tout MediTech's "industry-leading AI governance" in RFP responses. Their insurance premiums decreased 30% when they demonstrated comprehensive model controls.
But the journey wasn't easy. They invested $2.4M in registry implementation, governance processes, and cultural change. They slowed model deployment velocity by 40% initially (though velocity returned to baseline within 12 months as automation matured). They had difficult conversations with data scientists who resisted "bureaucracy." They retired 34 models that couldn't meet governance requirements.
Yet every dollar spent, every process implemented, every model retired was worth it. Because the alternative—the alternative nearly destroyed them.
Key Takeaways: Your Model Registry Roadmap
If you take nothing else from this comprehensive guide, remember these critical lessons:
1. Shadow AI is Your Greatest Governance Risk
You cannot govern what you cannot see. Invest in comprehensive model discovery—technical scanning, organizational interviews, vendor inventories. Assume you have more models than you think, especially if you've never inventoried them.
2. Risk-Based Governance Scales, One-Size-Fits-All Doesn't
Not all models require the same oversight. Classify models by risk tier and implement differentiated governance. Intensive controls for high-risk models, streamlined processes for low-risk models. This balance maintains control without crushing innovation.
3. Integration Beats Documentation
Manual registries become stale within weeks. Integrate your registry with training pipelines, CI/CD systems, monitoring platforms, and feature stores. Automated metadata capture and enforcement make governance sustainable.
4. Metadata Quality Determines Registry Value
Garbage in, garbage out. Define clear metadata standards, capture lineage automatically where possible, and make metadata quality a deployment gate. A registry with poor metadata is worse than no registry—it creates false confidence.
5. Lifecycle Management is Continuous, Not Point-in-Time
Model governance doesn't end at deployment. Implement ongoing monitoring, regular validation reviews, and clear retirement processes. Models degrade, data drifts, business contexts change—your governance must adapt.
6. Compliance is a Feature, Not a Burden
Regulations like the EU AI Act are making model governance mandatory. Design your registry to generate compliance evidence as a byproduct of normal operations. The same metadata that supports operations should support audits.
7. Executive Sponsorship is Non-Negotiable
Model registries require sustained investment, organizational change, and cultural shifts. Without executive sponsorship and board-level visibility, registries atrophy when competing priorities emerge.
Your Next Steps: Building Your Model Registry
Whether you're starting from scratch or overhauling an existing catalog, here's the roadmap I recommend:
Months 1-3: Foundation
Conduct comprehensive model discovery (technical + organizational)
Create initial inventory with baseline metadata
Classify models by risk tier
Secure executive sponsorship and budget
Select build/buy/hybrid approach
Investment: $80K - $340K depending on organization size
Months 4-6: Core Implementation
Deploy registry platform
Define metadata standards and schemas
Implement basic approval workflows
Begin manual model registration for Tier 1-2 models
Create initial dashboards
Investment: $120K - $480K
Months 7-9: Integration
Integrate with training pipelines (auto-registration)
Integrate with CI/CD (deployment gates)
Integrate with monitoring (performance metrics)
Automate metadata capture where possible
Investment: $90K - $360K
Months 10-12: Maturation
Complete registration of all production models
Establish review cadences
Implement compliance reporting
Deploy executive dashboards
Document governance processes
Ongoing investment: $140K - $520K annually
This timeline assumes a medium-sized organization (50-200 models). Smaller organizations can compress the timeline; larger organizations may need to extend it.
Your Next Steps: Don't Wait for Your 11:43 PM Phone Call
I've shared the hard-won lessons from MediTech's near-destruction and subsequent resurrection because I don't want you to learn AI governance the way they did—through catastrophic failure and existential crisis. The investment in proper model registries, governance processes, and operational discipline is a fraction of the cost of a single major AI incident.
Here's what I recommend you do immediately after reading this article:
Assess Your Current State: How many AI models do you have deployed? Do you know? Can you list them? Do you know who owns them, what data they use, how they perform? If not, you have shadow AI risk.
Conduct Model Discovery: Don't assume you know what's deployed. Run technical discovery (infrastructure scanning, code mining) and organizational discovery (team interviews). The gaps will shock you.
Classify Your Risks: Not every model threatens your organization's survival, but some do. Identify your high-risk models—those affecting safety, legal rights, significant financial decisions, or creating regulatory exposure.
Secure Executive Sponsorship: Model registries require sustained investment and organizational commitment. You need executive air cover, budget authority, and board-level visibility.
Start Small, Build Momentum: Don't try to register 500 models on day one. Start with your highest-risk models. Build a success story with demonstrable risk reduction and compliance value. Then expand.
Get Expert Help If Needed: If you lack internal expertise in MLOps, AI governance, or regulatory compliance, engage consultants who've implemented these programs at scale. The cost of getting it right far exceeds the cost of learning through failure.
At PentesterWorld, we've guided hundreds of organizations through AI model registry implementation, from initial discovery through mature, integrated governance platforms. We understand the technologies, the regulations, the organizational dynamics, and most importantly—we've seen what works when the regulator knocks or the lawsuit arrives.
Whether you're building your first registry or overhauling a governance program that's lost its way, the principles I've outlined here will serve you well. Model registries aren't glamorous. They don't train models faster or improve accuracy. But when that inevitable incident occurs—biased model, data breach, regulatory investigation—they're the difference between an organization that survives with its reputation intact and one that becomes a cautionary tale.
Don't wait for your 11:43 PM phone call. Build your AI model registry today.
Want to discuss your organization's AI governance needs? Have questions about implementing model registries? Visit PentesterWorld where we transform AI governance theory into operational reality. Our team of experienced practitioners has guided organizations from shadow AI chaos to mature model governance. Let's build your AI accountability together.