8th MBC2 Workshop on
Models and Learning in Clustering and Classification
Catania, 25-28 August 2026
Dipartimento di Economia e Impresa, Università di Catania (Italy)
We are pleased to announce a Data Challenge dedicated to the development of predictive models for inventory management in the retail sector. The initiative aims to foster applied statistical research and collaboration between academia and industry, promoting data-driven solutions for more sustainable retail supply chains. The challenge focuses on fresh dairy products—such as milk, yogurt, and fresh cheeses—sold in the large-scale retail sector (GDO). This product category presents several operational and analytical challenges:
These characteristics make fresh dairy products an ideal testbed for statistical and machine learning approaches to predictive inventory management.
Participants will work with a real-world dataset provided by Astrea Consulting Srl, derived from a network of retail stores operating within the Conad and Todis distribution channels. The dataset includes information on:
for 30 products, each observed over the January 2023-December 2025 time span.
For each inventory movement, the following variables are available:
We gratefully acknowledge Astrea Consulting Srl for making this dataset available and for supporting this initiative that bridges academic research and real-world retail analytics.
The goal of the challenge is to develop unsupervised predictive models capable of identifying inventory conditions for each product in each store. Based on historical inventory movements, participants must classify each observation into one of three states. The predicted classifications should be provided in a .csv file containing a single column with 229,499 rows (one for each observation), without a header. The three categories must be encoded as integers:
To ensure both methodological rigor and practical relevance, all submissions will be evaluated by a blended academia–industry committee, that will select the short list of five best projects according to the following combined criterion:
| Criterion | Description | Weight |
|---|---|---|
| Predictive Performance | Accuracy of the predicted classifications compared to the true labels, measured using the Adjusted Rand Index (ARI) | 60% |
| Scientific Rigor | Soundness of the methodology, appropriateness of the modeling approach, and statistical validity | 20% |
| Clarity and Reproducibility | Quality of the report, clarity of presentation, and reproducibility of the analysis | 20% |
The leaders of the five best projects will be invited to present their work during a dedicated session at the workshop. The final ranking will be determined by a majority vote of the workshop attendees. The winning team will be announced during the workshop dinner (27 August, h. 20).
Teams must be preliminarily register for the Data Challenge through the Registration webform. :
Registration must include:
During the registration process, participants will be required to upload a PDF copy of the payment receipt/proof of payment. A zipped file containing the technical report, the vector of predicted classification and the software script must be submitted through the Submission webform.
Registration fee for the team to the data challenge: € 60. The payment of the fee must be made by either:
Data will be available to candidates after the registration.
Last update: 31 March 2026.