Chapter 60 — Deployment Patterns

Overview

Choose cloud, on-prem, hybrid, or edge; manage risk with blue/green and canary. Deployment patterns are architectural decisions that fundamentally shape system reliability, cost, compliance posture, and iteration velocity. The right pattern balances business constraints (security, compliance, latency) with operational realities (team skills, budget, scale). Organizations with mature deployment strategies achieve 99.9%+ availability with deployment frequencies of 10-50x per day.

Key Objectives

Select appropriate deployment environments (cloud, on-prem, hybrid, edge)
Implement progressive delivery patterns (canary, blue/green, shadow)
Design multi-region failover and disaster recovery
Establish secrets management and identity controls
Create runbooks for deployment and rollback procedures

Deliverables

Deployment architecture diagrams and decision rationale
Infrastructure-as-code templates (Terraform, Kubernetes manifests)
Deployment runbook and automated rollback procedures
Disaster recovery plan with RTO/RPO targets
Multi-region architecture with failover automation

Why It Matters

Deployment patterns determine three critical business outcomes: reliability (can users access your AI?), velocity (how fast can you improve?), and risk exposure (what's the blast radius of failures?). The right choices enable daily deployments with confidence; wrong choices create week-long release cycles riddled with incidents.

Critical Questions Deployment Patterns Answer:

Where do models run? (Cloud vs on-prem vs edge)
How do updates roll out? (Big bang vs progressive)
What happens when failures occur? (Automatic vs manual recovery)
How quickly can we iterate? (Minutes vs days)
Can we meet compliance requirements? (Data residency, sovereignty)

Common Deployment Mistakes:

Over-Engineering: Small startup builds multi-region Kubernetes when a single managed endpoint would suffice
Under-Engineering: Enterprise deploys mission-critical fraud model with no redundancy or rollback plan
Vendor Lock-In: All infrastructure tightly coupled to one cloud provider, migration becomes impossible
No Rollback Plan: "We'll just fix forward" leads to extended outages during incidents
No Testing in Production: Assumes staging = production, then surprises occur

Deployment Environment Patterns Decision Tree

flowchart TD
    A[Deployment Decision] --> B{Primary Constraint?}

    B -->|Cost Optimization| C{Scale?}
    B -->|Compliance/Data Residency| D[On-Prem or<br/>Regional Cloud]
    B -->|Global Low Latency| E[Multi-Region Cloud<br/>or Edge]
    B -->|Speed to Market| F[Cloud Managed]

    C -->|Small/Variable| F
    C -->|Large/Stable| G[On-Prem]

    D --> H{Experimentation Needed?}
    H -->|Yes| I[Hybrid:<br/>On-Prem + Cloud]
    H -->|No| G

    E --> J{User Device Capable?}
    J -->|Yes| K[Edge Deployment]
    J -->|No| L[Multi-Region Cloud]

    F --> M[Choose Cloud Provider]
    G --> N[Setup Infrastructure]
    I --> O[Dual Environment]
    K --> P[Mobile/IoT Deploy]
    L --> Q[Global CDN + Regional]

Deployment Environment Patterns

1. Cloud Deployment

flowchart TB
    subgraph Cloud["Cloud Provider (AWS/GCP/Azure)"]
        LB[Load Balancer] --> GW[API Gateway]
        GW --> TS{Traffic Split}

        TS -->|90%| PROD[Production v2.3<br/>3 replicas]
        TS -->|10%| CAN[Canary v2.4<br/>1 replica]

        PROD --> REG[Model Registry]
        CAN --> REG
        REG --> FS[Feature Store]
        FS --> VDB[Vector DB]

        PROD --> MON[Monitoring]
        CAN --> MON
        MON --> RB{Metrics OK?}
        RB -->|No| ROLL[Auto-Rollback<br/>to v2.3]
    end

    USERS[Users] --> LB

Pros:

Managed services reduce operational burden (managed K8s, serverless, etc.)
Elasticity: scale from 1 to 10,000 instances automatically
Global reach: deploy to 20+ regions in hours
Pay-as-you-go: no upfront infrastructure investment
Rapid experimentation: spin up/down environments easily

Cons:

Vendor lock-in (hard to migrate once deeply integrated)
Data residency challenges (some countries require local storage)
Cost unpredictability (usage spikes can balloon bills)
Shared infrastructure security concerns
Less control over hardware optimization

Best For:

Startups and scale-ups prioritizing speed over control
Variable workloads with unpredictable traffic patterns
Global applications requiring multi-region presence
Teams lacking deep infrastructure expertise

Cloud Platform Comparison:

Platform	Strengths	ML-Specific Features	Pricing Model	Lock-In Risk
AWS SageMaker	Mature ecosystem, most regions	End-to-end ML platform, model registry	Pay per compute + storage	Medium-High
GCP Vertex AI	Best AI/ML tools, TPU access	Unified platform, AutoML	Pay per compute + storage	Medium-High
Azure ML	Enterprise integration, hybrid	Strong enterprise features	Pay per compute + storage	Medium-High
Modal	Serverless, developer-friendly	Auto-scaling, spot instances	Pay per second of GPU use	Low
Replicate	Simple API, model hosting	Pre-built models, API-first	Pay per prediction	Low

2. On-Premises Deployment

Architecture:

graph TB
    subgraph "On-Prem Data Center"
        A[Hardware Load Balancer] --> B[Reverse Proxy]
        B --> C[Kubernetes Cluster]

        C --> D[Model Pods<br/>10 replicas]
        D --> E[GPU Nodes]

        D --> F[Internal Model Registry]
        F --> G[Internal Feature Store]

        D --> H[On-Prem Monitoring]
        H --> I[AlertManager]
    end

    J[Internal Users<br/>Corp Network] --> A

    subgraph "Security Layer"
        K[VPN Gateway]
        L[Identity Provider]
        M[Secrets Vault]
    end

    J --> K
    D --> M

Pros:

Complete control over infrastructure and data
No vendor lock-in
Predictable costs (capex vs opex)
Compliance and data sovereignty requirements met
Hardware optimization possible (custom GPUs, etc.)

Cons:

High upfront capital expenditure
Slower provisioning (weeks vs minutes)
Requires deep infrastructure expertise
Limited scalability (bound by physical hardware)
Higher operational burden (patching, maintenance)

Best For:

Regulated industries (finance, healthcare, government)
Organizations with strict data residency requirements
Workloads with stable, predictable demand
Companies with existing data center investments

Cost Comparison Example:

Aspect	Cloud (3-year)	On-Prem (3-year)
4x A100 GPUs	$280K-400K	$120K (hardware)
Infrastructure	Included	$50K (network, storage)
Operations	$60K (platform fees)	$180K (staff)
Total	$340K-460K	$350K
Break-Even	N/A	~2 years

On-prem becomes cheaper at scale and long-term commitment, cloud cheaper for variable/bursty workloads

3. Hybrid Deployment

Architecture:

graph TB
    subgraph "On-Prem (Secure Zone)"
        A[Sensitive Model<br/>PII Processing]
        B[Feature Store<br/>Customer Data]
        A --> B
    end

    subgraph "Cloud (Experimentation Zone)"
        C[Experiment Tracking]
        D[Model Training<br/>On Synthetic Data]
        E[Staging Environment]
        C --> D
        D --> E
    end

    subgraph "Edge (Low Latency)"
        F[CDN Endpoints]
        G[Edge Models<br/>Quantized]
    end

    H[Internal Users] --> A
    I[External Users] --> F

    D -->|Approved Models| A
    A -->|Anonymized Metrics| C

    F -->|Fallback| J[Cloud Models]

Pros:

Best of both worlds: control where needed, flexibility elsewhere
Optimize cost by workload (cheap cloud burst, on-prem baseline)
Compliance meets agility (sensitive data on-prem, experiments in cloud)
Gradual cloud migration path

Cons:

Increased complexity managing multiple environments
Network latency between on-prem and cloud
Challenging to maintain consistency across environments
Requires expertise in multiple platforms

Best For:

Large enterprises with legacy infrastructure modernizing
Regulated workloads with some non-sensitive components
Organizations testing cloud before full commitment
Applications with mixed latency/security requirements

Hybrid Strategy Example (Bank):

# Workload placement strategy
workloads:
  fraud_detection:
    location: on-prem
    reason: "PII data, regulatory requirement"
    fallback: none

  customer_support_bot:
    location: cloud
    reason: "Variable load, not sensitive"
    fallback: on-prem (degraded mode)

  model_training:
    location: cloud
    reason: "Bursty compute, synthetic data OK"
    data_sync: "Pull anonymized samples from on-prem nightly"

  real_time_scoring:
    location: edge + cloud
    reason: "Global users, low latency required"
    fallback: "Edge → Regional Cloud → Central Cloud"

4. Edge Deployment

Architecture:

graph TB
    subgraph "Edge Devices"
        A[Mobile App]
        B[IoT Device]
        C[Edge Server]

        A --> D[Quantized Model<br/>50MB]
        B --> E[Tiny Model<br/>5MB]
        C --> F[Medium Model<br/>500MB]
    end

    subgraph "Regional Cloud"
        G[Model Update Service]
        H[Telemetry Collector]
    end

    subgraph "Central Cloud"
        I[Model Training]
        J[Model Registry]
    end

    D -.->|OTA Updates| G
    E -.->|OTA Updates| G
    F -.->|OTA Updates| G

    D -.->|Metrics| H
    E -.->|Metrics| H
    F -.->|Metrics| H

    G --> J
    H --> I
    I --> J

Pros:

Ultra-low latency (<50ms, no network roundtrip)
Privacy (data never leaves device)
Works offline
Reduced cloud costs (fewer API calls)

Cons:

Limited compute/memory on edge devices
Model updates require OTA (over-the-air) deployment
Harder to monitor and debug
Fragmentation (many device types, OS versions)

Best For:

Mobile applications (camera AI, voice assistants)
IoT/robotics (autonomous vehicles, drones)
Privacy-critical applications
Offline-first applications

Edge Deployment Challenges:

Challenge	Solution
Large models won't fit	Quantization (INT8/INT4), pruning, distillation
Different devices	Multiple model variants (mobile, tablet, desktop)
Model updates	OTA with version rollback, gradual rollout
Monitoring gaps	Local telemetry, periodic sync to cloud
Offline functionality	Cache last N results, graceful degradation

Release Safety Patterns

1. Blue/Green Deployment

How It Works:

graph LR
    A[Users] --> B[Load Balancer]

    subgraph "Blue (Current Production)"
        C[v2.3<br/>100% traffic]
    end

    subgraph "Green (New Version)"
        D[v2.4<br/>0% traffic]
    end

    B -->|100%| C
    B -.->|0%| D

    E[Deploy v2.4] --> D
    F[Validate Green] --> D
    G[Switch Traffic] --> B
    G -->|Now 0%| C
    G -->|Now 100%| D

Implementation:

# blue_green_deployment.yaml
apiVersion: v1
kind: Service
metadata:
  name: model-service
spec:
  selector:
    app: model
    version: v2.3  # Switch to v2.4 to cutover
  ports:
    - port: 80
      targetPort: 8080

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: model-blue
spec:
  replicas: 3
  selector:
    matchLabels:
      app: model
      version: v2.3
  template:
    metadata:
      labels:
        app: model
        version: v2.3
    spec:
      containers:
      - name: model
        image: model:v2.3

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: model-green
spec:
  replicas: 3
  selector:
    matchLabels:
      app: model
      version: v2.4
  template:
    metadata:
      labels:
        app: model
        version: v2.4
    spec:
      containers:
      - name: model
        image: model:v2.4

Pros:

Instant rollback (just switch traffic back)
Zero downtime deployments
Full validation before cutover

Cons:

2x infrastructure cost during deployment
Database migrations challenging (both versions must work with same schema)
Not suitable for stateful applications

2. Canary Deployment

How It Works:

sequenceDiagram
    participant Users
    participant LB as Load Balancer
    participant V23 as v2.3 (stable)
    participant V24 as v2.4 (canary)
    participant Monitor

    Users->>LB: Request
    LB->>V23: 95% traffic
    LB->>V24: 5% traffic

    V24->>Monitor: Metrics (latency, errors, quality)

    alt Metrics Good
        Monitor->>LB: Increase canary to 25%
        LB->>V24: 25% traffic
        LB->>V23: 75% traffic
    else Metrics Bad
        Monitor->>LB: Rollback canary
        LB->>V23: 100% traffic
        LB->>V24: 0% traffic (terminate)
    end

Automated Canary with Flagger:

# canary_deployment.yaml
apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
  name: model-canary
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: model
  service:
    port: 80
  analysis:
    interval: 1m
    threshold: 5  # Max failed checks before rollback
    maxWeight: 50 # Max canary traffic
    stepWeight: 10 # Increment by 10% each step

    metrics:
    - name: request-success-rate
      thresholdRange:
        min: 99  # 99% success rate required
      interval: 1m

    - name: request-duration
      thresholdRange:
        max: 500  # P95 latency < 500ms
      interval: 1m

    - name: model-accuracy
      templateRef:
        name: model-accuracy
        namespace: monitoring
      thresholdRange:
        min: 0.85  # F1 score > 0.85
      interval: 5m

  webhooks:
    - name: load-test
      url: http://load-tester/
      timeout: 5s
      metadata:
        type: cmd
        cmd: "hey -z 1m -q 10 -c 2 http://model-canary/"

Progressive Stages:

5% for 10 minutes (detect obvious breakages)
25% for 30 minutes (catch edge cases)
50% for 1 hour (validate at scale)
100% (full rollout if all gates pass)

3. Shadow Deployment

flowchart LR
    A[User Request] --> B[Load Balancer]
    B --> C[Production v2.3<br/>User sees this]
    B -.->|Copy Traffic| D[Shadow v2.4<br/>Silent comparison]

    C --> E[Return to User]
    D --> F[Compare Outputs]

    F --> G{Similarity > 95%?}
    G -->|Yes| H[Promote v2.4]
    G -->|No| I[Investigate<br/>Differences]

    style D stroke-dasharray: 5 5

Simplified Shadow Implementation:

# shadow.py
import asyncio

class ShadowDeployment:
    def __init__(self, prod_model, shadow_model):
        self.prod = prod_model
        self.shadow = shadow_model
        self.comparisons = []

    async def predict(self, input_data):
        # Run both in parallel
        prod_task = asyncio.create_task(self.prod.predict(input_data))
        shadow_task = asyncio.create_task(self.shadow.predict(input_data))

        # Wait for production (user path)
        prod_result = await prod_task

        # Compare async (don't block user)
        asyncio.create_task(self._compare(input_data, prod_task, shadow_task))

        return prod_result

    async def _compare(self, input, prod_task, shadow_task):
        try:
            prod_out = await prod_task
            shadow_out = await shadow_task
            similarity = self.calc_similarity(prod_out, shadow_out)
            self.comparisons.append({
                "input": input,
                "similarity": similarity,
                "prod": prod_out,
                "shadow": shadow_out
            })
            if similarity < 0.9:
                self.alert(f"Divergence detected: {similarity}")
        except Exception as e:
            logger.warning(f"Shadow failed: {e}")  # Don't impact users

    def report(self):
        sims = [c["similarity"] for c in self.comparisons]
        avg_sim = sum(sims) / len(sims)
        return {
            "avg_similarity": avg_sim,
            "recommendation": "PROMOTE" if avg_sim > 0.95 else "INVESTIGATE"
        }

When to Use Shadow:

High-risk changes (model architecture change, provider switch)
Need production traffic data for validation
Can't afford even small user impact

Trade-offs:

Pro: Zero user risk, real production data
Con: 2x compute cost, can't test user-facing metrics

4. Feature Flags

flowchart TD
    A[User Request] --> B{Flag Enabled?}
    B -->|Yes| C{Beta User?}
    B -->|No| D[Production Model]

    C -->|Yes| E[New Model v2.4]
    C -->|No| F{User Hash %?}

    F -->|< Rollout %| E
    F -->|≥ Rollout %| D

    E --> G[Return Result]
    D --> G

    H[Admin Dashboard] -.->|Toggle Flag| B
    H -.->|Adjust Rollout %| F

Simplified Feature Flag Implementation:

# flags.py
import hashlib

class FeatureFlags:
    def __init__(self, flag_service):
        self.flags = flag_service

    def get_model(self, user_id):
        # Beta users get new model
        if self.flags.is_enabled("new_model_beta", user_id):
            return new_model

        # Gradual rollout by percentage
        rollout_pct = self.flags.get_value("rollout_percentage", default=0)
        if self._hash_user(user_id) < rollout_pct:
            return new_model
        return prod_model

    def _hash_user(self, user_id):
        """Deterministically map user to 0-100"""
        return int(hashlib.md5(user_id.encode()).hexdigest(), 16) % 100

# API usage
@app.post("/predict")
async def predict(req):
    model = flags.get_model(req.user_id)
    return await model.predict(req.input)

LaunchDarkly Integration:

# With LaunchDarkly
import ldclient

ld = ldclient.get()
user = {"key": user_id, "custom": {"plan": "enterprise"}}

# Boolean flag
if ld.variation("new-model-enabled", user, False):
    model = new_model

# Multivariate (A/B/C test)
variant = ld.variation("model-variant", user, "control")
# Returns: "control", "variant_a", or "variant_b"

Benefits:

Instant rollback - Toggle flag, no deployment
Targeted rollouts - Beta users, segments, percentages
A/B testing - Built-in experimentation infrastructure
Kill switch - Emergency disable without code changes

Multi-Region & High Availability

graph TB
    subgraph "Region: US-East"
        A[Load Balancer US-E]
        B[Model Pods<br/>3 replicas]
        C[Feature Store<br/>Read Replica]
    end

    subgraph "Region: EU-West"
        D[Load Balancer EU-W]
        E[Model Pods<br/>3 replicas]
        F[Feature Store<br/>Read Replica]
    end

    subgraph "Region: Asia-Pacific"
        G[Load Balancer AP]
        H[Model Pods<br/>3 replicas]
        I[Feature Store<br/>Read Replica]
    end

    J[Global Load Balancer<br/>Geo-Routing] --> A
    J --> D
    J --> G

    K[Primary Feature Store<br/>US-East] -.Replicate.-> C
    K -.Replicate.-> F
    K -.Replicate.-> I

    L[Users Worldwide] --> J

Multi-Region Strategy:

# multi_region_config.yaml
regions:
  us-east-1:
    role: primary
    features:
      - Model serving
      - Feature store primary
      - Model training
    failover_to: us-west-2

  us-west-2:
    role: secondary
    features:
      - Model serving
      - Feature store replica
    failover_to: us-east-1

  eu-west-1:
    role: regional
    features:
      - Model serving (GDPR compliance)
      - Feature store replica
    failover_to: eu-central-1
    data_residency: true

routing:
  strategy: geo_proximity
  health_checks:
    interval: 30s
    timeout: 5s
    unhealthy_threshold: 3

  failover:
    automatic: true
    max_latency_increase: 200ms  # Failover if latency increases >200ms

Case Study: Regulated Bank Hybrid Deployment

Background: A multinational bank needed to deploy fraud detection ML models while meeting:

Data residency requirements (EU customer data must stay in EU)
Real-time latency (<100ms P95)
99.99% availability
Regulatory audit trails

Initial Attempt (Failed):

All models on-prem in single data center
Manual deployments taking 2-3 weeks
No redundancy (single point of failure)
Couldn't scale for peak loads (month-end processing)

Hybrid Solution Implemented:

1. On-Premises (Secure Zone):

workloads:
  - fraud_detection_scoring:
      location: on-prem
      infrastructure: Kubernetes on bare metal
      replicas: 10 (min) to 50 (max)
      data: Production transaction data (PII)
      latency_target: 50ms P95

  - feature_store:
      location: on-prem
      infrastructure: Redis + PostgreSQL
      replication: Active-passive across 2 data centers
      data: Customer features (PII)

2. Cloud (Experimentation Zone):

workloads:
  - model_training:
      location: AWS (eu-central-1)
      infrastructure: SageMaker
      data: Synthetic + anonymized historical data
      cost_optimization: Spot instances for training

  - model_experimentation:
      location: AWS
      infrastructure: SageMaker Studio
      data: Synthetic data only
      users: Data science team

  - ci_cd_pipeline:
      location: AWS
      infrastructure: CodePipeline + ECS
      purpose: Automated testing and validation

3. Deployment Pipeline:

graph LR
    A[Data Scientist] -->|Experiment| B[Cloud: SageMaker]
    B -->|Champion Model| C[Cloud: Model Registry]

    C -->|Approval| D[Compliance Review]
    D -->|Approved| E[On-Prem: Staging]

    E -->|Validation| F{Metrics Pass?}
    F -->|Yes| G[On-Prem: Canary 5%]
    F -->|No| H[Reject]

    G -->|Monitor 1h| I{Canary Metrics OK?}
    I -->|Yes| J[On-Prem: Production 100%]
    I -->|No| K[Auto Rollback]

4. Feature Flags for Safety:

# Gradual rollout with instant rollback
feature_flags:
  new_fraud_model_v2:
    enabled: true
    rollout_percentage: 10  # Start at 10%
    segments:
      - internal_transactions  # Test on internal first
    kill_switch: true  # Can disable instantly

  high_risk_transaction_threshold:
    value: 0.85  # Can tune without deployment
    override_by_region:
      EU: 0.90  # More conservative in EU

Results After 12 Months:

Metric	Before	After	Improvement
Deployment Frequency	1x/month	3x/week	12x faster
Deployment Time	2-3 weeks	2 hours	97% reduction
Mean Time to Rollback	4 hours	2 minutes	99% reduction
Availability	99.5%	99.98%	Higher reliability
False Positive Rate	3.2%	2.1%	34% improvement
P95 Latency	85ms	62ms	27% improvement
Cloud Costs (training)	$0 (limited training)	$12K/month	Investment in velocity
On-Prem Costs	$450K/year	$380K/year	16% reduction (better utilization)

Key Success Factors:

Clear boundary: On-prem for production, cloud for experimentation
Automated testing: Synthetic data in cloud validates models before on-prem deployment
Progressive rollout: Canary + feature flags caught 5 regressions before user impact
Compliance-first: All controls in place before deployment, not retrofitted

Implementation Checklist

Phase 1: Pattern Selection (Week 1)

Document requirements: latency, compliance, budget, scale
Evaluate cloud vs on-prem vs hybrid vs edge
Assess team skills and operational capacity
Choose deployment pattern(s)
Get architectural approval from stakeholders

Phase 2: Infrastructure Setup (Weeks 2-4)

Provision infrastructure (cloud accounts, on-prem hardware, etc.)
Set up CI/CD pipelines
Configure identity and access management
Implement secrets management
Set up monitoring and logging

Phase 3: Release Safety (Weeks 5-6)

Choose release strategy (canary, blue/green, shadow)
Implement progressive rollout automation
Set up feature flag system
Define rollback procedures
Test rollback in staging

Phase 4: Multi-Region (Weeks 7-9, if needed)

Phase 5: Runbooks & Training (Week 10)

Write deployment runbooks
Document rollback procedures
Create incident response playbooks
Train team on deployment procedures
Conduct disaster recovery drill

Phase 6: Production Hardening (Ongoing)

Regular failover testing (quarterly)
Optimize deployment speed
Review and update runbooks after incidents
Continuous cost optimization
Security audits and compliance reviews

Success Metrics

Deployment Frequency: Daily deployments without incidents
Lead Time: <1 hour from code commit to production
MTTR: <5 minutes with automated rollback
Change Failure Rate: <2% of deployments cause incidents
Availability: >99.9% uptime
Deployment Confidence: Team deploys without fear

Release Strategy Comparison

Strategy	Deployment Speed	Rollback Speed	Infrastructure Cost	Risk Exposure	Validation Quality	Best For
Big Bang	5-10 min	10-30 min	0% overhead	100% users	Low	Low-risk changes, small projects
Blue/Green	10-15 min	1-2 min	100% (temporary 2x)	50% momentary	Medium	Critical services, instant rollback
Canary	30-120 min	1-2 min	10-25%	5-25% users	High	Standard practice, most ML deployments
Shadow	2-4 hours	N/A (no user traffic)	100% (permanent)	0% users	Very High	High-risk changes, provider switches
Feature Flags	Instant	Instant	~0%	0-100% (configurable)	Medium-High	Gradual rollouts, A/B tests
Progressive	4-8 hours	1-2 min	25-50%	5% → 100%	Very High	Production best practice

Deployment Pattern Decision Matrix

If You Need...	Choose...	Why
Fastest time to market	Cloud (managed services)	No infrastructure setup, instant scaling
Lowest long-term cost at scale	On-prem	No cloud markups, hardware depreciation
Data must stay in-country	On-prem or regional cloud	Compliance requirements
Global users, low latency	Multi-region cloud or edge	Geo-distributed serving
Variable/unpredictable load	Cloud with autoscaling	Pay for what you use
Privacy (data never leaves device)	Edge deployment	No network transfer
Experimentation + compliance	Hybrid (cloud + on-prem)	Best of both worlds
Instant rollback required	Blue/green or feature flags	Zero-downtime rollback
High-risk model changes	Shadow deployment first	Validate before user impact
Gradual user rollout	Canary + feature flags	Progressive risk exposure

Environment Selection Matrix

Requirement	Cloud	On-Prem	Hybrid	Edge
Setup Time	Days	Months	Months	Weeks
Operational Complexity	Low	High	Very High	Medium
Cost (small scale <10 GPUs)	$$	$$$$	$$$$	$
Cost (large scale >100 GPUs)	$$$$	$$	$$$	N/A
Compliance Control	Medium	High	High	Medium
Latency (global users)	Low (multi-region)	High (single DC)	Medium	Very Low
Data Privacy	Medium	High	High	Very High
Iteration Speed	Very Fast	Slow	Medium	Medium
Vendor Lock-In Risk	High	None	Medium	Low

Chapter 60: Deployment Patterns

60. Deployment Patterns

Chapter 60 — Deployment Patterns

Overview

Key Objectives

Deliverables

Why It Matters

Deployment Environment Patterns Decision Tree

Deployment Environment Patterns

1. Cloud Deployment

2. On-Premises Deployment

3. Hybrid Deployment

4. Edge Deployment

Release Safety Patterns

1. Blue/Green Deployment

2. Canary Deployment

3. Shadow Deployment

4. Feature Flags

Multi-Region & High Availability

Case Study: Regulated Bank Hybrid Deployment

Implementation Checklist

Phase 1: Pattern Selection (Week 1)

Phase 2: Infrastructure Setup (Weeks 2-4)

Phase 3: Release Safety (Weeks 5-6)

Phase 4: Multi-Region (Weeks 7-9, if needed)

Phase 5: Runbooks & Training (Week 10)

Phase 6: Production Hardening (Ongoing)

Success Metrics

Release Strategy Comparison

Deployment Pattern Decision Matrix

Environment Selection Matrix