Chapter 57 — Feature Stores & Model Registries

Overview

Standardize features and model lifecycle; ensure discoverability and governance. Feature stores solve the training-serving skew problem while enabling feature reuse across teams. Model registries provide a single source of truth for model versions, lineage, and lifecycle management, essential for reproducibility and compliance.

Core Concepts

Feature contracts; online/offline parity; backfills.
Model versioning, lineage, and stage transitions.
Discoverability, reuse, and collaboration.
Point-in-time correctness and temporal joins.

Deliverables

Feature contracts and registry policies.
Feature catalog with documentation.
Model registry with full lineage tracking.
Promotion workflows and approval gates.

Why It Matters

Consistent features and governed model lifecycles reduce duplication, speed iteration, and simplify compliance. Without these foundational components:

Feature Duplication: Teams independently compute the same features, wasting resources
Training-Serving Skew: Models trained on one feature computation perform poorly in production where features are computed differently
Model Chaos: No single answer to "which model is in production?" or "what data trained this model?"
Compliance Gaps: Cannot prove what model version made a decision or trace back to training data
Slow Iteration: Teams spend weeks debugging feature differences instead of experimenting

Organizations with mature feature stores and model registries report 40-60% reduction in time-to-production for new models and 80% fewer production incidents due to training-serving skew.

Feature Store & Model Registry Architecture

graph TB
    subgraph Feature Store
        A[Batch Sources<br/>Data Warehouse, S3] --> B[Feature Pipeline]
        C[Stream Sources<br/>Kafka, Kinesis] --> B
        B --> D[Offline Store<br/>Historical Features]
        B --> E[Online Store<br/>Low-latency]
        D --> F[Training Jobs]
        E --> G[Inference Services]
    end

    subgraph Model Registry
        F --> H[Model Artifacts]
        H --> I[Registry Metadata]
        I --> J[Staging]
        J --> K{Approval Gates}
        K -->|Pass| L[Production]
        K -->|Fail| M[Archive]
        L --> G
    end

    G --> N[Predictions]
    N -.->|Feedback| C

Feature Store Deep Dive

Feature Lifecycle

flowchart LR
    A[Define Feature] --> B[Implement Pipeline]
    B --> C[Materialize Offline]
    C --> D[Materialize Online]
    D --> E[Training Uses Offline]
    E --> F[Inference Uses Online]
    F --> G[Monitor Drift]
    G --> H{Drift Detected?}
    H -->|Yes| I[Backfill & Retrain]
    H -->|No| F
    I --> C

Training-Serving Skew Problem & Solution

The Problem:

graph TB
    subgraph Training (Python/Pandas)
        A[Raw Data] --> B[Feature Logic v1<br/>Pandas groupby]
        B --> C[Training Features]
    end

    subgraph Production (SQL)
        D[Raw Data] --> E[Feature Logic v2<br/>SQL window function]
        E --> F[Serving Features]
    end

    C -.->|Different Logic!| G[Model Trained Here]
    F -.->|Different Values!| H[Model Serves Here]

    G -.->|Performance Mismatch| H

    style G fill:#f99
    style H fill:#f99

The Solution - Feature Store:

graph TB
    A[Feature Definition<br/>Single Source of Truth] --> B[Offline Materialization]
    A --> C[Online Materialization]

    B --> D[Training]
    C --> E[Serving]

    D --> F[Same Features!]
    E --> F

    style F fill:#9f9

Feature Contract Example

Component	Specification	Example
Feature Group	user_transaction_features	Version 2.1.0
Owner	payments-ml-team	On-call rotation
Description	Aggregated transaction features for fraud detection	7d/30d windows
Entities	user_id	Primary key
Features	transaction_count_7d, avg_amount_30d, merchant_diversity	3 features
Data Types	int64, float64, float64	Validated
Validation	Non-null, range checks	transaction_count >= 0
Sources	BigQuery (batch), Kafka (stream)	Dual ingestion
SLA	Freshness: 15min, Availability: 99.9%, Latency: <10ms	Monitored
Upstream	raw.transactions, raw.merchant_info	Lineage tracked

Point-in-Time Correctness

Without Point-in-Time Join (Data Leakage):

sequenceDiagram
    participant Training as Training Job
    participant Features as Feature Store
    participant Future as Future Data

    Training->>Features: Get features for 2024-01-15
    Features->>Future: Accidentally uses data from 2024-01-20
    Future-->>Features: Latest available features
    Features-->>Training: Features with future leak
    Training->>Training: Model learns from future!

    Note over Training: Artificially high accuracy in training<br/>Poor performance in production

With Point-in-Time Join (Correct):

sequenceDiagram
    participant Training as Training Job
    participant Features as Feature Store (PIT Join)
    participant Historical as Historical Data

    Training->>Features: Get features for 2024-01-15 12:00
    Features->>Historical: Query features <= 2024-01-15 12:00
    Historical-->>Features: Features as of that timestamp
    Features-->>Training: Historically accurate features
    Training->>Training: Model learns from valid data

    Note over Training: Training mimics production reality

Feature Store Platform Comparison

Platform	Best For	Strengths	Limitations	Pricing
Feast (OSS)	Self-hosted, flexibility	Free, extensible, cloud-agnostic	DIY infrastructure	Free (infra costs)
Tecton	Enterprise, managed	Fully managed, excellent UI, real-time	Expensive, vendor lock-in	Usage-based, $$$$
AWS SageMaker Feature Store	AWS ecosystem	Native AWS integration, managed	AWS lock-in, limited features	Storage + requests
GCP Vertex AI Feature Store	GCP ecosystem	Managed, scalable, BigQuery integration	GCP lock-in	Storage + requests
Azure ML Feature Store	Azure ecosystem	Azure integration, Unity Catalog	Azure lock-in, newer	Storage + requests
Databricks Feature Store	Spark/Databricks users	Delta Lake, Unity Catalog, lineage	DBU costs, Databricks required	Included with DBU
Hopsworks	ML platform integration	Complete ML platform, Kubernetes	Complex, steep learning curve	OSS + enterprise

Model Registry Deep Dive

Model Lifecycle Management

stateDiagram-v2
    [*] --> Development
    Development --> Staging: ML Lead Approval
    Staging --> StagingValidation: Deploy to Staging
    StagingValidation --> Production: Security + Product Approval
    StagingValidation --> Staging: Validation Failed
    Production --> Canary: Deploy Canary
    Canary --> FullProduction: Metrics Pass
    Canary --> Production: Metrics Fail (Rollback)
    FullProduction --> Production: Monitor
    Production --> Archived: Superseded
    Staging --> Archived: Abandoned
    Archived --> [*]

Complete Model Lineage

graph TB
    subgraph Data Lineage
        A[Raw Data<br/>transactions_2025_q3] --> B[Cleaned Data<br/>v1.2.3]
        B --> C[Feature Engineering<br/>v2.1.0]
        C --> D[Training Dataset<br/>sha256:abc123]
    end

    subgraph Model Lineage
        D --> E[Training Run<br/>run_id:456]
        E --> F[Model Artifact<br/>fraud_detector:v2.3.1]
        F --> G[Model Registry<br/>Stage: Staging]
        G --> H{Promotion Gates}
        H --> I[Production<br/>fraud_detector:v2.3.1]
    end

    subgraph Feature Lineage
        J[Feature Store<br/>user_features:v3.0] --> C
        K[Feature Store<br/>transaction_features:v2.5] --> C
    end

    subgraph Deployment Lineage
        I --> L[Canary Deployment<br/>5% traffic]
        L --> M[Full Deployment<br/>100% traffic]
        M --> N[Predictions]
    end

    style D fill:#e1f5ff
    style F fill:#fff4e1
    style I fill:#e7f5e1

Model Metadata Schema

Essential Metadata Components:

Category	Fields	Purpose
Identity	model_id, version, created_at, created_by	Unique identification
Training	framework, data_version, data_hash, hyperparameters, seed	Reproducibility
Evaluation	test_metrics, fairness_metrics, cost_metrics, eval_date	Quality assurance
Governance	risk_category, approvals, compliance_flags, known_limitations	Risk management
Deployment	current_stage, endpoints, traffic_percentage, rollback_target	Operations
Artifacts	model_uri, container_image, model_card, eval_report	Assets
Lineage	parent_version, data_sources, feature_versions, code_commit	Traceability

Minimal Metadata Example:

{
  "model_id": "fraud_detector:v2.3.1",
  "training": {
    "dataset_version": "1.2.3",
    "dataset_hash": "sha256:abc123...",
    "features": {"user_features": "v3.0", "txn_features": "v2.5"}
  },
  "evaluation": {
    "f1": 0.89, "precision": 0.91, "recall": 0.87,
    "fairness_demographic_parity": 0.05
  },
  "governance": {
    "risk": "high",
    "approvals": ["ml_lead", "security"],
    "limitations": ["Lower recall on new merchant types"]
  },
  "deployment": {"stage": "production", "traffic": 100}
}

Promotion Workflow

Approval Requirements by Risk Level:

Risk Level	Required Approvals	Evaluation Gates	Deployment Strategy	Typical Timeline
Low	ML Lead	Basic metrics	Direct to prod	1 day
Medium	ML Lead + Security	Metrics + safety	Canary 5% → 100%	2-3 days
High	ML Lead + Security + Compliance	All evals + fairness	Shadow → Canary → Full	5-7 days
Critical	ML Lead + Security + Compliance + Product + Legal	Comprehensive + external audit	Shadow → Canary 5% → 25% → 50% → 100% (24h stages)	10-14 days

Model Version Semantic Versioning

graph LR
    A[v2.3.1] --> B{Change Type?}
    B -->|Breaking Change<br/>New architecture| C[v3.0.0]
    B -->|New Feature<br/>Additional output| D[v2.4.0]
    B -->|Bug Fix<br/>Same behavior| E[v2.3.2]

    C --> F[Major Version<br/>Incompatible]
    D --> G[Minor Version<br/>Backward Compatible]
    E --> H[Patch Version<br/>Bug Fixes]

Case Study: Banking Feature & Model Standardization

Background: Large bank with 15+ data science teams building credit risk, fraud, and marketing models independently. Each team computed similar features with different logic.

Problems Before Implementation

Problem	Impact	Frequency	Cost
Duplicate Feature Pipelines	8 different "transaction velocity" implementations	Ongoing	$200K/year wasted
Training-Serving Skew	25% of models underperformed in production	Every 4th model	Lost business value
No Lineage	3 weeks to trace model decisions for audits	Every audit	Compliance risk
Slow Iteration	New models took 6-8 weeks to production	Every model	Opportunity cost
Compliance Gaps	Could not prove which version made decisions	Ongoing	Regulatory risk

Solution Architecture

graph TB
    subgraph Centralized Feature Store
        A[150 Standard Features] --> B[Offline: Delta Lake on S3]
        A --> C[Online: Redis <10ms]
        B --> D[Training Jobs]
        C --> E[Inference Services]
    end

    subgraph Model Registry
        D --> F[MLflow Registry]
        F --> G[Dev → Staging → Prod]
        G --> H[Approval Workflow]
        H --> I[Deployment]
    end

    J[15 Teams] --> A
    K[40+ Models] --> F

    style A fill:#9f9
    style F fill:#9f9

Implementation Phases

Month	Milestone	Features	Teams	Models	Key Win
1	MVP	20 critical features	3 pilot teams	5 models	Proved concept
2	Expansion	50 features	10 teams	15 models	Hit tipping point
3	Growth	100 features	All 15 teams	30 models	Full adoption
6	Maturity	150 features	All teams	40+ models	Standardized

Results After 12 Months

Metric	Before	After	Improvement
Time to Production	6-8 weeks	2-3 weeks	60% reduction
Feature Reuse	10%	75%	7.5x increase
Training-Serving Skew Incidents	12/year	1/year	92% reduction
Duplicate Feature Pipelines	8 versions	1 canonical	Eliminated
Compliance Audit Prep	3 weeks	2 days	90% reduction
Cost Savings	Baseline	$400K/year	Infrastructure consolidation
Developer Satisfaction	6.2/10	8.7/10	40% improvement

Key Success Factors

mindmap
  root((Success Factors))
    Executive Sponsorship
      CTO mandate
      Cross-team priority
    Gradual Rollout
      20 features first
      Proved value
      Then scaled
    Documentation
      Feature catalog
      Examples
      Best practices
    Developer Experience
      Easier than DIY
      Great tooling
      Fast support
    Compliance Focus
      Risk reduction
      Audit trail
      Regulatory win

Implementation Checklist

Feature Store Implementation

Phase 1: Foundation (Weeks 1-3)

Choose platform (Feast, Tecton, cloud-native)
Set up offline store (data warehouse, Delta Lake)
Set up online store (Redis, DynamoDB)
Define feature schema and contracts
Build pipeline for 5-10 critical features

Phase 2: Core Features (Weeks 4-6)

Identify 20-30 most-used features across teams
Implement batch feature computation
Implement streaming features (if needed)
Set up materialization (offline → online)
Add monitoring for freshness/availability

Phase 3: Integration (Weeks 7-9)

Integrate with training pipelines
Integrate with serving infrastructure
Build feature discovery catalog
Document all features with examples
Onboard first 2-3 teams

Phase 4: Expansion (Months 3-6)

Add point-in-time correctness
Implement backfill capabilities
Add feature validation and drift detection
Expand to 100+ features
Onboard all ML teams

Model Registry Implementation

Phase 1: Setup (Week 1)

Choose platform (MLflow, SageMaker, Vertex AI)
Set up central tracking server
Define model metadata schema
Set up artifact storage (S3, GCS, Azure Blob)

Phase 2: Workflows (Weeks 2-3)

Define lifecycle stages and promotion criteria
Implement approval workflows
Set up automated registration from CI/CD
Create model card templates

Phase 3: Governance (Weeks 4-6)

Add lineage tracking
Implement risk-based approval matrix
Set up compliance evidence collection
Create rollback procedures

Phase 4: Optimization (Ongoing)

Add model comparison dashboards
Implement automated validation
Set up performance monitoring
Regular cleanup of archived models

Best Practices

Feature Store

Practice	Why	How
Design for Reusability	Avoid duplication	Generic features, not model-specific
Document Everything	Enable discovery	Name, description, owner, SLA, examples
Monitor Feature Health	Catch issues early	Freshness, availability, drift, usage
Version Carefully	Manage breaking changes	Semantic versioning (v2.1.0)
Point-in-Time Joins	Prevent data leakage	Use historical feature values

Model Registry

Practice	Why	How
Automate Registration	Reduce errors	Auto-register from training pipeline
Enforce Stage Gates	Maintain quality	Validation at each promotion
Immutable Lineage	Enable compliance	Never modify, always append
Risk-Based Approval	Right oversight level	Match rigor to impact
Complete Metadata	Enable debugging	Training, eval, governance, deployment

Success Metrics

Feature Store Metrics

Metric	Target	Indicates
Feature Reuse Rate	>60%	Effective sharing
Training-Serving Skew Incidents	<2/year	Consistency achieved
Feature Freshness SLA	>99%	Reliable data
Time to Add New Feature	<1 week	Operational efficiency

Model Registry Metrics

Metric	Target	Indicates
Models Registered	100% of production	Complete governance
Mean Time to Promote	<3 days	Efficient process
Audit Prep Time	<1 day	Automated compliance
Rollback Time	<10 min	Operational maturity

Chapter 57: Feature Stores & Model Registries

57. Feature Stores & Model Registries