Deployment Overview¶
What is This Deployment?¶
This project implements a production-ready ML deployment architecture that mimics real-world industry patterns. The deployment consists of three integrated services:
- MLflow Model Registry - Centralized model storage and versioning
- FastAPI Prediction Service - REST API for serving model predictions
- Streamlit Dashboard - User-friendly web interface for business users
This architecture demonstrates how trained ML models transition from experimentation to production serving, following industry best practices for model deployment and inference.
Why This Architecture?¶
Industry-Standard Pattern¶
Modern ML deployments separate concerns across three layers:
- Model Registry Layer: Centralized storage of trained models with versioning and lifecycle management
- API Layer: Scalable prediction service that loads models from registry and serves inference requests
- UI Layer: Business-facing interface that abstracts technical complexity
This separation enables:
- Independent scaling: API can handle high throughput while UI serves fewer users
- Model updates: Deploy new models without changing UI or API code
- Multiple interfaces: Same API can serve web UI, mobile apps, batch jobs, etc.
- Security: API can enforce authentication, rate limiting, and input validation
Learning Objectives¶
This deployment showcases:
- Model serving patterns: Loading registered models and handling prediction requests
- API design: RESTful endpoints with proper validation and error handling
- Feature engineering consistency: Ensuring training and inference use identical transformations
- User experience: Making ML predictions accessible to non-technical stakeholders
- Service orchestration: Managing dependencies between multiple services
Architecture Overview¶
flowchart BT
subgraph Registry["MLflow Model Registry (Port 5000)"]
MR[(Model Storage)]
MV[Version Management]
MS[Stage Tracking]
end
subgraph API["FastAPI Service (Port 8000)"]
direction TB
ML[Model Loader]
PP[Prediction Pipeline]
FE[Feature Engineering]
VAL[Input Validation]
end
subgraph UI["Streamlit Dashboard (Port 8501)"]
direction TB
SP[Single Prediction]
BP[Batch Upload]
VIZ[Results Display]
end
subgraph User["End Users"]
BU[Business Analyst]
DS[Data Scientist]
end
User --> UI
UI -->|HTTP POST| API
API -->|Load Model| Registry
Registry -->|Ensemble Model| API
API -->|Predictions| UI
UI -->|Results| User
style Registry fill:#e1f5ff
style API fill:#fff4e1
style UI fill:#f0f0f0
style User fill:#e8f5e9
Service Responsibilities¶
MLflow Model Registry (Port 5000)¶
Purpose: Centralized model artifact storage and version control
Key Functions:
- Stores trained ensemble models (LightGBM + XGBoost + CatBoost)
- Manages model versions (1, 2, 3, ...)
- Tracks model stages (Production, Staging, Archived)
- Provides API for model retrieval by stage or version
- Maintains model lineage and metadata
In Production: Would be a dedicated MLflow Tracking Server with database backend and cloud artifact storage (S3, Azure Blob, etc.)
In This Project: Runs locally with file-based storage (./mlruns, ./mlartifacts)
FastAPI Prediction Service (Port 8000)¶
Purpose: Serve model predictions via REST API
Key Functions:
- Loads production ensemble model from MLflow on startup
- Accepts prediction requests in simple train.csv format (7 fields)
- Automatically merges store metadata from
store.csv - Applies complete feature engineering pipeline (46 features)
- Returns predictions with model version information
- Supports both single and batch predictions
In Production: Would run behind load balancer with autoscaling, monitoring, and logging
In This Project: Runs as single process with Uvicorn ASGI server
Streamlit Dashboard (Port 8501)¶
Purpose: User-friendly interface for business users
Key Functions:
- Single Prediction: Interactive form for one store/date forecast
- Batch Upload: CSV file upload for multiple predictions
- Auto-calculation: Day of week derived from date
- Flexible dates: Accepts multiple date formats (YYYY-MM-DD, MM/DD/YY, etc.)
- Results download: Export predictions to CSV
- Store summaries: Aggregate statistics by store
In Production: Would be protected behind authentication/authorization
In This Project: Open access for demonstration purposes
Data Flow¶
Single Prediction Request¶
sequenceDiagram
actor User
participant UI as Streamlit UI
participant API as FastAPI
participant FE as Feature Engineering
participant Model as Ensemble Model
participant Registry as MLflow Registry
User->>UI: Enter store, date, promo, etc.
UI->>UI: Validate inputs
UI->>UI: Calculate day of week
UI->>API: POST /predict
API->>Registry: Load model (if not cached)
Registry-->>API: Ensemble model
API->>API: Merge store metadata
API->>FE: Engineer features
FE-->>API: 46 model-ready features
API->>Model: Generate prediction
Model-->>API: Sales forecast
API-->>UI: Response (prediction + version)
UI->>User: Display results
Batch Prediction Request¶
sequenceDiagram
actor User
participant UI as Streamlit UI
participant API as FastAPI
participant FE as Feature Engineering
participant Model as Ensemble Model
User->>UI: Upload CSV (6 fields)
UI->>UI: Validate CSV schema
UI->>UI: Add DayOfWeek column
UI->>UI: Normalize dates
UI->>API: POST /predict (batch)
loop For each row
API->>API: Merge store metadata
API->>FE: Engineer features
FE-->>API: 46 features
API->>Model: Generate prediction
Model-->>API: Sales forecast
end
API-->>UI: Response (predictions array)
UI->>UI: Add predictions to DataFrame
UI->>User: Display table + download
Quick Start¶
Prerequisites Required
You must have a registered model before using the deployment services.
The FastAPI service loads the Production model from MLflow on startup. If no model is registered, the service will fail to start.
First-time setup:
- Complete the DataOps workflow to prepare your data
- Complete the ModelOps workflow to train and register a model
- Ensure at least one model is promoted to Production stage
Verify you have a Production model:
# Option 1: Check via Python
python -c "from mlflow import MlflowClient; \
client = MlflowClient(); \
models = client.search_registered_models(); \
print('Registered models:', [m.name for m in models])"
# Option 2: Start MLflow UI and check manually
bash scripts/start_mlflow.sh
# Visit http://localhost:5000 → Models tab
If you don't have a Production model, see the ModelOps Training Guide to train and register your first model.
Launch All Services¶
Use the unified launcher script to start all three services in the correct order:
This will:
- Start MLflow tracking server (port 5000)
- Start FastAPI backend (port 8000, waits for MLflow)
- Launch Streamlit dashboard (port 8501, waits for API)
Services:
- MLflow UI: http://localhost:5000
- FastAPI Docs: http://localhost:8000/docs
- Streamlit App: http://localhost:8501
Manual Launch (Alternative)¶
Start each service individually if you need more control:
# Terminal 1: MLflow
bash scripts/start_mlflow.sh
# Terminal 2: FastAPI
cd deployment/api
python main.py
# Terminal 3: Streamlit
cd deployment/streamlit
streamlit run Home.py
Stop Services¶
When you stop Streamlit (Ctrl+C), MLflow and FastAPI continue running in the background. To stop them:
Key Design Principles¶
1. Consistency Between Training and Inference¶
The same feature engineering code is used in both training and prediction:
- Training:
src/features/build_features.pyprocesses data for model training - Inference:
src/data/prepare_predictions.pyuses identical feature functions - Result: Predictions use exactly the same transformations as training data
2. Simplified User Input¶
Users only provide 7 basic fields (train.csv format):
- Store, DayOfWeek, Date, Open, Promo, StateHoliday, SchoolHoliday
The API automatically handles:
- Merging store metadata (type, assortment, competition distance)
- Engineering 46 features (lags, rolling averages, calendar features)
- Converting data types and formatting
3. Model Versioning¶
Every prediction includes the model version used:
This enables:
- Reproducibility: Know exactly which model produced each prediction
- A/B testing: Compare predictions from different model versions
- Rollback: Quickly identify if new model performs worse
4. Graceful Error Handling¶
The API validates inputs and returns clear error messages:
- Invalid store IDs (must be 1-1115)
- Invalid dates or formats
- Missing required fields
- Out-of-range values
Users get actionable feedback instead of stack traces.
Documentation Structure¶
- Overview (this page) - Architecture and high-level flow
- FastAPI Service - API endpoints, model loading, feature engineering
- Streamlit Dashboard - UI features, single/batch predictions
- Launcher Script - Automated service orchestration
Next Steps¶
- Explore the individual deployment documentation pages linked above
- Try making predictions through the Streamlit UI
- Experiment with the FastAPI endpoints at http://localhost:8000/docs
- See ModelOps Documentation for model training and registry