MSc Thesis · Addis Ababa Science and Technology University

Design, Implementation, and Empirical Evaluation of a Multi-Agent Architecture for E-Commerce Inventory Optimization

A reproducible study on whether a small society of cooperating agents — forecasting, drift detection, replenishment, inventory state, and coordination — can outperform classical inventory policies under non-stationary demand on real e-commerce data.

Source on GitHub See the results

The problem

Inventory decisions in e-commerce live on a knife's edge. Hold too much and capital is locked in slow-moving SKUs; hold too little and a single stockout drives lost margin, customer churn, and platform penalties. The classical literature — Arrow-style newsvendor, Scarf's (s, S) policies, periodic forecasting — assumes a relatively stable demand distribution and adapts slowly when that assumption breaks.

Real e-commerce demand does not behave that way. Promotions, virality, supply disruptions, and seasonality cause concept drift: the distribution generating today's orders is meaningfully different from last week's. A forecasting model that re-fits on a fixed schedule will trail the change; a policy with static safety stock will either stock out during the shift or sit on excess inventory after it settles.

The approach

We design a five-agent system that splits the inventory loop along its natural seams. Each agent owns a narrow concern and publishes results the next agent consumes, with a coordinator that arbitrates when their recommendations conflict.

Forecasting agent — tiered MA → SimpleExpSmoothing → Holt-Winters, promoted per-SKU based on data sufficiency.
Drift-detection agent — ADWIN windowing on per-SKU residuals plus a global ADWIN on the population mean, triggers refits when the data distribution shifts.
Replenishment agent — EOQ-derived (s, S) policy with dynamic safety stock as a function of forecast uncertainty and recent volatility.
Inventory-state agent — single source of truth for on-hand, on-order, and pipeline positions.
Coordinator — orchestrates the per-tick interaction, handles tie-breaking, and emits the decision log used for the audit trail you see in this dashboard.

Evaluation

Demand is replayed from the Olist Brazilian E-Commerce public dataset across cohorts of 50 / 100 / 200 / 500 / 1000 SKUs. Each configuration runs N = 10 seeds. The MAS architecture is compared against two baselines (Static ROP and Periodic Forecasting) under six demand regimes — stationary, gradual drift, seasonal, abrupt, severe abrupt, and catastrophic.

Three hypotheses, all pre-registered before the sweep ran:

H1Performance

MAS reduces stockouts vs both baselines under drift.

H2Adaptability

MAS recovers service level faster after distribution shifts.

H3Scalability

Per-step runtime grows sub-linearly with SKU count.

Statistical tests: Mann-Whitney U as the primary, Welch's t for parametric comparison, and Cohen's d for effect size. Results, raw seed-level data, and the 23k-event decision log are all browsable from this dashboard.

What this dashboard shows

Everything is read from the same SQLite catalog that the thesis manuscript pulls from. No numbers diverge between paper and UI.

Experiment catalog

All 36 sweep configurations with per-seed aggregates.

H1 report

Stockout-rate + total-cost comparisons, every scenario, every baseline.

H3 scalability

Power-law fits on per-step runtime vs SKU count.

Ablation

Contribution of each component (drift detector, safety stock, forecast tier).

Decision audit

Per-tick agent log — every action with the inputs that produced it.

Launch a run

Re-run any configuration; results stream live into the catalog.

Glossary

Every abbreviation, scenario code, metric, and stat term defined.

Reproducing the results

The full source — simulator, backend, frontend, thesis sources, and the canonical sweep outputs — is on GitHub. A typical end-to-end reproduction:

git clone https://github.com/BisRyy/mas.git
cd mas
make download-data           # fetch Olist CSVs
make sweep CONFIG=olist_mas_catastrophic SEEDS=001..010
make reports                 # regenerate H1/H3/ablation tables

The deployed instance you are looking at was built from this exact repository. Hit Settings in the sidebar to see what version is running and to trigger a fresh ingestion.

Author

Bisrat Kebere Derebe — MSc candidate, Software Engineering, Addis Ababa Science and Technology University (AASTU).

If you cite this work, please link to the repository and reference the manuscript in thesis/.