MSc Thesis · Addis Ababa Science and Technology University

Design, Implementation, and Empirical Evaluation of a Multi-Agent Architecture for E-Commerce Inventory Optimization

A reproducible study on whether a small society of cooperating agents — forecasting, drift detection, replenishment, inventory state, and coordination — can outperform classical inventory policies under non-stationary demand on real e-commerce data.

The problem

Inventory decisions in e-commerce live on a knife's edge. Hold too much and capital is locked in slow-moving SKUs; hold too little and a single stockout drives lost margin, customer churn, and platform penalties. The classical literature — Arrow-style newsvendor, Scarf's (s, S) policies, periodic forecasting — assumes a relatively stable demand distribution and adapts slowly when that assumption breaks.

Real e-commerce demand does not behave that way. Promotions, virality, supply disruptions, and seasonality cause concept drift: the distribution generating today's orders is meaningfully different from last week's. A forecasting model that re-fits on a fixed schedule will trail the change; a policy with static safety stock will either stock out during the shift or sit on excess inventory after it settles.

The approach

We design a five-agent system that splits the inventory loop along its natural seams. Each agent owns a narrow concern and publishes results the next agent consumes, with a coordinator that arbitrates when their recommendations conflict.

  • Forecasting agent — tiered MA → SimpleExpSmoothing → Holt-Winters, promoted per-SKU based on data sufficiency.
  • Drift-detection agent — ADWIN windowing on per-SKU residuals plus a global ADWIN on the population mean, triggers refits when the data distribution shifts.
  • Replenishment agent — EOQ-derived (s, S) policy with dynamic safety stock as a function of forecast uncertainty and recent volatility.
  • Inventory-state agent — single source of truth for on-hand, on-order, and pipeline positions.
  • Coordinator — orchestrates the per-tick interaction, handles tie-breaking, and emits the decision log used for the audit trail you see in this dashboard.

Evaluation

Demand is replayed from the Olist Brazilian E-Commerce public dataset across cohorts of 50 / 100 / 200 / 500 / 1000 SKUs. Each configuration runs N = 10 seeds. The MAS architecture is compared against two baselines (Static ROP and Periodic Forecasting) under six demand regimes — stationary, gradual drift, seasonal, abrupt, severe abrupt, and catastrophic.

Three hypotheses, all pre-registered before the sweep ran:

H1Performance

MAS reduces stockouts vs both baselines under drift.

H2Adaptability

MAS recovers service level faster after distribution shifts.

H3Scalability

Per-step runtime grows sub-linearly with SKU count.

Statistical tests: Mann-Whitney U as the primary, Welch's t for parametric comparison, and Cohen's d for effect size. Results, raw seed-level data, and the 23k-event decision log are all browsable from this dashboard.

What this dashboard shows

Reproducing the results

The full source — simulator, backend, frontend, thesis sources, and the canonical sweep outputs — is on GitHub. A typical end-to-end reproduction:

git clone https://github.com/BisRyy/mas.git
cd mas
make download-data           # fetch Olist CSVs
make sweep CONFIG=olist_mas_catastrophic SEEDS=001..010
make reports                 # regenerate H1/H3/ablation tables

The deployed instance you are looking at was built from this exact repository. Hit Settings in the sidebar to see what version is running and to trigger a fresh ingestion.

Author

Bisrat Kebere Derebe — MSc candidate, Software Engineering, Addis Ababa Science and Technology University (AASTU).

If you cite this work, please link to the repository and reference the manuscript in thesis/.