Datasets

This website provides a set of public benchmark datasets to evaluate learning algorithms in nonstationary environments. In particular, it provides datasets with incremental and gradual concept drifts. These datasets are well-suited for evaluation of stream algorithms that do not require actual labels during the online classification phase. A condition known as extreme verification latency.

We hope that this benchmark will encourage other researchers to share their data, code and detailed results, improving the reproducibility in the area.

For a better understanding of the properties of each dataset, see an animated visualization of each dataset.

Download (all datasets ~15MB)


Stream Classification Algorithm Guided by Clustering - SCARGC

Download (Source code)

How to cite this benchmark?

Souza, V.M.A.; Silva, D.F.; Gama, J.; Batista, G.E.A.P.A. : Data Stream Classification Guided by Clustering on Nonstationary Environments and Extreme Verification Latency. SIAM International Conference on Data Mining (SDM), pp. 873-881, 2015.

DOI: http://dx.doi.org/10.1137/1.9781611974010.98


@inproceedings{souzaSDM:2015, title={Data Stream Classification Guided by Clustering on Nonstationary Environments and Extreme Verification Latency}, author={Souza, V. M. A. and Silva, D. F. and Gama, J. and Batista, G. E. A. P. A.}, booktitle={Proceedings of SIAM International Conference on Data Mining (SDM)}, pages={873--881}, year={2015}}

Dataset donnors:

[1] - These datasets were kindly provided by the authors of the following paper: Dyer, K.B., Capo, R., Polikar,R. : COMPOSE: A Semisupervised Learning Framework for Initially Labeled Nonstationary Streaming Data. IEEE Transactions on Neural Networks and Learning Systems, Vol. 25, No. 1, pp. 12-26, 2014.

[2] - Ditzler, G., Polikar, R. : Incremental learning of concept drift from streaming imbalanced data. IEEE Transactions on Knowledge and Data Engineering, Vol. 25, No. 10, pp. 2283-2301, 2013.

[3] - Dataset based on CMU dataset first presented by the authors of the following paper: Killourhy, K., Maxion, R. : Why did my detector do that?! In Recent Advances in Intelligent Data Analysis X, pp. 222-233,2011