57. Feature Stores & Model Registries
Chapter 57 — Feature Stores & Model Registries
Overview
Standardize features and model lifecycle; ensure discoverability and governance. Feature stores solve the training-serving skew problem while enabling feature reuse across teams. Model registries provide a single source of truth for model versions, lineage, and lifecycle management, essential for reproducibility and compliance.
Core Concepts
- Feature contracts; online/offline parity; backfills.
- Model versioning, lineage, and stage transitions.
- Discoverability, reuse, and collaboration.
- Point-in-time correctness and temporal joins.
Deliverables
- Feature contracts and registry policies.
- Feature catalog with documentation.
- Model registry with full lineage tracking.
- Promotion workflows and approval gates.
Why It Matters
Consistent features and governed model lifecycles reduce duplication, speed iteration, and simplify compliance. Without these foundational components:
- Feature Duplication: Teams independently compute the same features, wasting resources
- Training-Serving Skew: Models trained on one feature computation perform poorly in production where features are computed differently
- Model Chaos: No single answer to "which model is in production?" or "what data trained this model?"
- Compliance Gaps: Cannot prove what model version made a decision or trace back to training data
- Slow Iteration: Teams spend weeks debugging feature differences instead of experimenting
Organizations with mature feature stores and model registries report 40-60% reduction in time-to-production for new models and 80% fewer production incidents due to training-serving skew.
Feature Store & Model Registry Architecture
graph TB subgraph Feature Store A[Batch Sources<br/>Data Warehouse, S3] --> B[Feature Pipeline] C[Stream Sources<br/>Kafka, Kinesis] --> B B --> D[Offline Store<br/>Historical Features] B --> E[Online Store<br/>Low-latency] D --> F[Training Jobs] E --> G[Inference Services] end subgraph Model Registry F --> H[Model Artifacts] H --> I[Registry Metadata] I --> J[Staging] J --> K{Approval Gates} K -->|Pass| L[Production] K -->|Fail| M[Archive] L --> G end G --> N[Predictions] N -.->|Feedback| C
Feature Store Deep Dive
Feature Lifecycle
flowchart LR A[Define Feature] --> B[Implement Pipeline] B --> C[Materialize Offline] C --> D[Materialize Online] D --> E[Training Uses Offline] E --> F[Inference Uses Online] F --> G[Monitor Drift] G --> H{Drift Detected?} H -->|Yes| I[Backfill & Retrain] H -->|No| F I --> C
Training-Serving Skew Problem & Solution
The Problem:
graph TB subgraph Training (Python/Pandas) A[Raw Data] --> B[Feature Logic v1<br/>Pandas groupby] B --> C[Training Features] end subgraph Production (SQL) D[Raw Data] --> E[Feature Logic v2<br/>SQL window function] E --> F[Serving Features] end C -.->|Different Logic!| G[Model Trained Here] F -.->|Different Values!| H[Model Serves Here] G -.->|Performance Mismatch| H style G fill:#f99 style H fill:#f99
The Solution - Feature Store:
graph TB A[Feature Definition<br/>Single Source of Truth] --> B[Offline Materialization] A --> C[Online Materialization] B --> D[Training] C --> E[Serving] D --> F[Same Features!] E --> F style F fill:#9f9
Feature Contract Example
| Component | Specification | Example |
|---|---|---|
| Feature Group | user_transaction_features | Version 2.1.0 |
| Owner | payments-ml-team | On-call rotation |
| Description | Aggregated transaction features for fraud detection | 7d/30d windows |
| Entities | user_id | Primary key |
| Features | transaction_count_7d, avg_amount_30d, merchant_diversity | 3 features |
| Data Types | int64, float64, float64 | Validated |
| Validation | Non-null, range checks | transaction_count >= 0 |
| Sources | BigQuery (batch), Kafka (stream) | Dual ingestion |
| SLA | Freshness: 15min, Availability: 99.9%, Latency: <10ms | Monitored |
| Upstream | raw.transactions, raw.merchant_info | Lineage tracked |
Point-in-Time Correctness
Without Point-in-Time Join (Data Leakage):
sequenceDiagram participant Training as Training Job participant Features as Feature Store participant Future as Future Data Training->>Features: Get features for 2024-01-15 Features->>Future: Accidentally uses data from 2024-01-20 Future-->>Features: Latest available features Features-->>Training: Features with future leak Training->>Training: Model learns from future! Note over Training: Artificially high accuracy in training<br/>Poor performance in production
With Point-in-Time Join (Correct):
sequenceDiagram participant Training as Training Job participant Features as Feature Store (PIT Join) participant Historical as Historical Data Training->>Features: Get features for 2024-01-15 12:00 Features->>Historical: Query features <= 2024-01-15 12:00 Historical-->>Features: Features as of that timestamp Features-->>Training: Historically accurate features Training->>Training: Model learns from valid data Note over Training: Training mimics production reality
Feature Store Platform Comparison
| Platform | Best For | Strengths | Limitations | Pricing |
|---|---|---|---|---|
| Feast (OSS) | Self-hosted, flexibility | Free, extensible, cloud-agnostic | DIY infrastructure | Free (infra costs) |
| Tecton | Enterprise, managed | Fully managed, excellent UI, real-time | Expensive, vendor lock-in | Usage-based, $$$$ |
| AWS SageMaker Feature Store | AWS ecosystem | Native AWS integration, managed | AWS lock-in, limited features | Storage + requests |
| GCP Vertex AI Feature Store | GCP ecosystem | Managed, scalable, BigQuery integration | GCP lock-in | Storage + requests |
| Azure ML Feature Store | Azure ecosystem | Azure integration, Unity Catalog | Azure lock-in, newer | Storage + requests |
| Databricks Feature Store | Spark/Databricks users | Delta Lake, Unity Catalog, lineage | DBU costs, Databricks required | Included with DBU |
| Hopsworks | ML platform integration | Complete ML platform, Kubernetes | Complex, steep learning curve | OSS + enterprise |
Model Registry Deep Dive
Model Lifecycle Management
stateDiagram-v2 [*] --> Development Development --> Staging: ML Lead Approval Staging --> StagingValidation: Deploy to Staging StagingValidation --> Production: Security + Product Approval StagingValidation --> Staging: Validation Failed Production --> Canary: Deploy Canary Canary --> FullProduction: Metrics Pass Canary --> Production: Metrics Fail (Rollback) FullProduction --> Production: Monitor Production --> Archived: Superseded Staging --> Archived: Abandoned Archived --> [*]
Complete Model Lineage
graph TB subgraph Data Lineage A[Raw Data<br/>transactions_2025_q3] --> B[Cleaned Data<br/>v1.2.3] B --> C[Feature Engineering<br/>v2.1.0] C --> D[Training Dataset<br/>sha256:abc123] end subgraph Model Lineage D --> E[Training Run<br/>run_id:456] E --> F[Model Artifact<br/>fraud_detector:v2.3.1] F --> G[Model Registry<br/>Stage: Staging] G --> H{Promotion Gates} H --> I[Production<br/>fraud_detector:v2.3.1] end subgraph Feature Lineage J[Feature Store<br/>user_features:v3.0] --> C K[Feature Store<br/>transaction_features:v2.5] --> C end subgraph Deployment Lineage I --> L[Canary Deployment<br/>5% traffic] L --> M[Full Deployment<br/>100% traffic] M --> N[Predictions] end style D fill:#e1f5ff style F fill:#fff4e1 style I fill:#e7f5e1
Model Metadata Schema
Essential Metadata Components:
| Category | Fields | Purpose |
|---|---|---|
| Identity | model_id, version, created_at, created_by | Unique identification |
| Training | framework, data_version, data_hash, hyperparameters, seed | Reproducibility |
| Evaluation | test_metrics, fairness_metrics, cost_metrics, eval_date | Quality assurance |
| Governance | risk_category, approvals, compliance_flags, known_limitations | Risk management |
| Deployment | current_stage, endpoints, traffic_percentage, rollback_target | Operations |
| Artifacts | model_uri, container_image, model_card, eval_report | Assets |
| Lineage | parent_version, data_sources, feature_versions, code_commit | Traceability |
Minimal Metadata Example:
{
"model_id": "fraud_detector:v2.3.1",
"training": {
"dataset_version": "1.2.3",
"dataset_hash": "sha256:abc123...",
"features": {"user_features": "v3.0", "txn_features": "v2.5"}
},
"evaluation": {
"f1": 0.89, "precision": 0.91, "recall": 0.87,
"fairness_demographic_parity": 0.05
},
"governance": {
"risk": "high",
"approvals": ["ml_lead", "security"],
"limitations": ["Lower recall on new merchant types"]
},
"deployment": {"stage": "production", "traffic": 100}
}
Promotion Workflow
Approval Requirements by Risk Level:
| Risk Level | Required Approvals | Evaluation Gates | Deployment Strategy | Typical Timeline |
|---|---|---|---|---|
| Low | ML Lead | Basic metrics | Direct to prod | 1 day |
| Medium | ML Lead + Security | Metrics + safety | Canary 5% → 100% | 2-3 days |
| High | ML Lead + Security + Compliance | All evals + fairness | Shadow → Canary → Full | 5-7 days |
| Critical | ML Lead + Security + Compliance + Product + Legal | Comprehensive + external audit | Shadow → Canary 5% → 25% → 50% → 100% (24h stages) | 10-14 days |
Model Version Semantic Versioning
graph LR A[v2.3.1] --> B{Change Type?} B -->|Breaking Change<br/>New architecture| C[v3.0.0] B -->|New Feature<br/>Additional output| D[v2.4.0] B -->|Bug Fix<br/>Same behavior| E[v2.3.2] C --> F[Major Version<br/>Incompatible] D --> G[Minor Version<br/>Backward Compatible] E --> H[Patch Version<br/>Bug Fixes]
Case Study: Banking Feature & Model Standardization
Background: Large bank with 15+ data science teams building credit risk, fraud, and marketing models independently. Each team computed similar features with different logic.
Problems Before Implementation
| Problem | Impact | Frequency | Cost |
|---|---|---|---|
| Duplicate Feature Pipelines | 8 different "transaction velocity" implementations | Ongoing | $200K/year wasted |
| Training-Serving Skew | 25% of models underperformed in production | Every 4th model | Lost business value |
| No Lineage | 3 weeks to trace model decisions for audits | Every audit | Compliance risk |
| Slow Iteration | New models took 6-8 weeks to production | Every model | Opportunity cost |
| Compliance Gaps | Could not prove which version made decisions | Ongoing | Regulatory risk |
Solution Architecture
graph TB subgraph Centralized Feature Store A[150 Standard Features] --> B[Offline: Delta Lake on S3] A --> C[Online: Redis <10ms] B --> D[Training Jobs] C --> E[Inference Services] end subgraph Model Registry D --> F[MLflow Registry] F --> G[Dev → Staging → Prod] G --> H[Approval Workflow] H --> I[Deployment] end J[15 Teams] --> A K[40+ Models] --> F style A fill:#9f9 style F fill:#9f9
Implementation Phases
| Month | Milestone | Features | Teams | Models | Key Win |
|---|---|---|---|---|---|
| 1 | MVP | 20 critical features | 3 pilot teams | 5 models | Proved concept |
| 2 | Expansion | 50 features | 10 teams | 15 models | Hit tipping point |
| 3 | Growth | 100 features | All 15 teams | 30 models | Full adoption |
| 6 | Maturity | 150 features | All teams | 40+ models | Standardized |
Results After 12 Months
| Metric | Before | After | Improvement |
|---|---|---|---|
| Time to Production | 6-8 weeks | 2-3 weeks | 60% reduction |
| Feature Reuse | 10% | 75% | 7.5x increase |
| Training-Serving Skew Incidents | 12/year | 1/year | 92% reduction |
| Duplicate Feature Pipelines | 8 versions | 1 canonical | Eliminated |
| Compliance Audit Prep | 3 weeks | 2 days | 90% reduction |
| Cost Savings | Baseline | $400K/year | Infrastructure consolidation |
| Developer Satisfaction | 6.2/10 | 8.7/10 | 40% improvement |
Key Success Factors
mindmap root((Success Factors)) Executive Sponsorship CTO mandate Cross-team priority Gradual Rollout 20 features first Proved value Then scaled Documentation Feature catalog Examples Best practices Developer Experience Easier than DIY Great tooling Fast support Compliance Focus Risk reduction Audit trail Regulatory win
Implementation Checklist
Feature Store Implementation
Phase 1: Foundation (Weeks 1-3)
- Choose platform (Feast, Tecton, cloud-native)
- Set up offline store (data warehouse, Delta Lake)
- Set up online store (Redis, DynamoDB)
- Define feature schema and contracts
- Build pipeline for 5-10 critical features
Phase 2: Core Features (Weeks 4-6)
- Identify 20-30 most-used features across teams
- Implement batch feature computation
- Implement streaming features (if needed)
- Set up materialization (offline → online)
- Add monitoring for freshness/availability
Phase 3: Integration (Weeks 7-9)
- Integrate with training pipelines
- Integrate with serving infrastructure
- Build feature discovery catalog
- Document all features with examples
- Onboard first 2-3 teams
Phase 4: Expansion (Months 3-6)
- Add point-in-time correctness
- Implement backfill capabilities
- Add feature validation and drift detection
- Expand to 100+ features
- Onboard all ML teams
Model Registry Implementation
Phase 1: Setup (Week 1)
- Choose platform (MLflow, SageMaker, Vertex AI)
- Set up central tracking server
- Define model metadata schema
- Set up artifact storage (S3, GCS, Azure Blob)
Phase 2: Workflows (Weeks 2-3)
- Define lifecycle stages and promotion criteria
- Implement approval workflows
- Set up automated registration from CI/CD
- Create model card templates
Phase 3: Governance (Weeks 4-6)
- Add lineage tracking
- Implement risk-based approval matrix
- Set up compliance evidence collection
- Create rollback procedures
Phase 4: Optimization (Ongoing)
- Add model comparison dashboards
- Implement automated validation
- Set up performance monitoring
- Regular cleanup of archived models
Best Practices
Feature Store
| Practice | Why | How |
|---|---|---|
| Design for Reusability | Avoid duplication | Generic features, not model-specific |
| Document Everything | Enable discovery | Name, description, owner, SLA, examples |
| Monitor Feature Health | Catch issues early | Freshness, availability, drift, usage |
| Version Carefully | Manage breaking changes | Semantic versioning (v2.1.0) |
| Point-in-Time Joins | Prevent data leakage | Use historical feature values |
Model Registry
| Practice | Why | How |
|---|---|---|
| Automate Registration | Reduce errors | Auto-register from training pipeline |
| Enforce Stage Gates | Maintain quality | Validation at each promotion |
| Immutable Lineage | Enable compliance | Never modify, always append |
| Risk-Based Approval | Right oversight level | Match rigor to impact |
| Complete Metadata | Enable debugging | Training, eval, governance, deployment |
Success Metrics
Feature Store Metrics
| Metric | Target | Indicates |
|---|---|---|
| Feature Reuse Rate | >60% | Effective sharing |
| Training-Serving Skew Incidents | <2/year | Consistency achieved |
| Feature Freshness SLA | >99% | Reliable data |
| Time to Add New Feature | <1 week | Operational efficiency |
Model Registry Metrics
| Metric | Target | Indicates |
|---|---|---|
| Models Registered | 100% of production | Complete governance |
| Mean Time to Promote | <3 days | Efficient process |
| Audit Prep Time | <1 day | Automated compliance |
| Rollback Time | <10 min | Operational maturity |