Question 1

How do you decide when to retrain a model?

Accepted Answer

We define three trigger types for each model: schedule-based (retrain every 30/60/90 days regardless of drift), drift-based (retrain when statistical drift score exceeds threshold), and event-based (retrain when a known business event occurs — Ramadan season, regulatory change, new product launch). Most production models use all three triggers with OR logic: whichever fires first triggers a retraining run. Trigger thresholds are calibrated to the model's measured drift rate during the first 90 days of production.

Question 2

What is champion/challenger testing?

Accepted Answer

Champion/challenger is a model promotion strategy where the current production model (champion) and a newly trained candidate (challenger) run simultaneously on a split of live traffic — typically 5-10% to the challenger. Both models' predictions and outcomes are logged. After a defined evaluation period (usually 1-4 weeks), if the challenger outperforms the champion on defined metrics, it is promoted to champion. If it underperforms, it is automatically rolled back. This eliminates the risk of a big-bang model swap where a degraded model is immediately deployed to all traffic.

Question 3

Do you retrain from scratch or incrementally?

Accepted Answer

It depends on the model architecture and data volume. For gradient boosting models (XGBoost, LightGBM), retraining from scratch on a rolling window of recent data is typically faster and more reliable than incremental updates. For neural networks and fine-tuned LLMs, incremental fine-tuning on new data is often more practical. For very large models where full retraining is cost-prohibitive, we use continued fine-tuning with catastrophic forgetting mitigation techniques.

Question 4

What happens if a retrained model is worse than the current one?

Accepted Answer

Automated gates prevent promotion. Every retraining pipeline includes performance evaluation gates: the new model must meet minimum thresholds on holdout test data before staging promotion, and must match or exceed the champion's performance on a live traffic sample before production promotion. If the new model fails either gate, it is flagged for human review and the champion remains in production. We investigate root causes: data quality regression, distribution shift in new training data, or labelling errors.

Metric	Before	After
Retraining Effort	Manual retraining: 5-10 engineering days per cycle	Automated pipeline: 2 hours of review per retraining event
Deployment Risk	Big-bang deployment: new model to 100% of traffic immediately	Canary rollout: 5% → 20% → 100% with automated rollback gates
Governance	23 model versions, no registry, unknown production state	Single registry, clear lifecycle stages, automated deprecation

Keep Your AI Models Accurate as the World Changes

You might be experiencing...

Engagement Phases

Retraining Pipeline Automation

A/B Testing & Champion/Challenger

Governance & Reporting

Deliverables

Before & After

Tools We Use

Frequently Asked Questions

Build It. Run It. Own It.