Datasets are taken from the UCI Machine Learning Repository ( The format of the data files is as follows:

For some datasets the ground truth clusters (represented by labels) are known. This information can be used to compare the obtained clustering to the ground truth clustering. The labels of a dataset are in a file named datasetName_#objects_#attributes_#clusters_classes.txt. Each i-th line in these files gives the label of the cluster containing the i-th object.

File Type Notes
breastCancer_569_30_2.txt txt
breastCancer_569_30_2_classes.txt txt
glass_214_9_7.txt txt
glass_214_9_7_classes.txt txt
imageSegmentation_2000_19_7.txt txt
ionosphere_351_34_2.txt txt
ionosphere_351_34_2_classes.txt txt
iris_150_4_3.txt txt
iris_150_4_3_classes.txt txt
multipleFeatures_2000_10_6.txt txt
syntheticControl_600_60_6.txt txt
syntheticControl_600_60_6_classes.txt txt
userKnowledge_403_5_4.txt txt
vehicle_846_18_4.txt txt
vehicle_846_18_4_classes.txt txt
waveform-5000_40_3.txt txt
wine_178_13_3.txt txt
wine_178_13_3_classes.txt txt
yeast_1484_8_10.txt txt
yeast_1484_8_10_classes.txt txt