Design, Implementation, and Empirical Evaluation of a Multi-Agent Architecture for E-Commerce Inventory Optimization
A reproducible study on whether a small society of cooperating agents — forecasting, drift detection, replenishment, inventory state, and coordination — can outperform classical inventory policies under non-stationary demand on real e-commerce data.
The problem
Inventory decisions in e-commerce live on a knife's edge. Hold too much and capital is locked in slow-moving SKUs; hold too little and a single stockout drives lost margin, customer churn, and platform penalties. The classical literature — Arrow-style newsvendor, Scarf's (s, S) policies, periodic forecasting — assumes a relatively stable demand distribution and adapts slowly when that assumption breaks.
Real e-commerce demand does not behave that way. Promotions, virality, supply disruptions, and seasonality cause concept drift: the distribution generating today's orders is meaningfully different from last week's. A forecasting model that re-fits on a fixed schedule will trail the change; a policy with static safety stock will either stock out during the shift or sit on excess inventory after it settles.
The approach
We design a five-agent system that splits the inventory loop along its natural seams. Each agent owns a narrow concern and publishes results the next agent consumes, with a coordinator that arbitrates when their recommendations conflict.
- Forecasting agent — tiered MA → SimpleExpSmoothing → Holt-Winters, promoted per-SKU based on data sufficiency.
- Drift-detection agent — ADWIN windowing on per-SKU residuals plus a global ADWIN on the population mean, triggers refits when the data distribution shifts.
- Replenishment agent — EOQ-derived (s, S) policy with dynamic safety stock as a function of forecast uncertainty and recent volatility.
- Inventory-state agent — single source of truth for on-hand, on-order, and pipeline positions.
- Coordinator — orchestrates the per-tick interaction, handles tie-breaking, and emits the decision log used for the audit trail you see in this dashboard.
Evaluation
Demand is replayed from the Olist Brazilian E-Commerce public dataset across cohorts of 50 / 100 / 200 / 500 / 1000 SKUs. Each configuration runs N = 10 seeds. The MAS architecture is compared against two baselines (Static ROP and Periodic Forecasting) under six demand regimes — stationary, gradual drift, seasonal, abrupt, severe abrupt, and catastrophic.
Three hypotheses, all pre-registered before the sweep ran:
MAS reduces stockouts vs both baselines under drift.
MAS recovers service level faster after distribution shifts.
Per-step runtime grows sub-linearly with SKU count.
Statistical tests: Mann-Whitney U as the primary, Welch's t for parametric comparison, and Cohen's d for effect size. Results, raw seed-level data, and the 23k-event decision log are all browsable from this dashboard.
What this dashboard shows
Everything is read from the same SQLite catalog that the thesis manuscript pulls from. No numbers diverge between paper and UI.
All 36 sweep configurations with per-seed aggregates.
Stockout-rate + total-cost comparisons, every scenario, every baseline.
Power-law fits on per-step runtime vs SKU count.
Contribution of each component (drift detector, safety stock, forecast tier).
Per-tick agent log — every action with the inputs that produced it.
Re-run any configuration; results stream live into the catalog.
Every abbreviation, scenario code, metric, and stat term defined.
Reproducing the results
The full source — simulator, backend, frontend, thesis sources, and the canonical sweep outputs — is on GitHub. A typical end-to-end reproduction:
git clone https://github.com/BisRyy/mas.git cd mas make download-data # fetch Olist CSVs make sweep CONFIG=olist_mas_catastrophic SEEDS=001..010 make reports # regenerate H1/H3/ablation tables
The deployed instance you are looking at was built from this exact repository. Hit Settings in the sidebar to see what version is running and to trigger a fresh ingestion.
Author
Bisrat Kebere Derebe — MSc candidate, Software Engineering, Addis Ababa Science and Technology University (AASTU).
If you cite this work, please link to the repository and reference the manuscript in thesis/.