The website for the UC Irvine Machine Learning Repository is http://archive.ics.uci.edu/ml/.
As of 12/29/2016 they state that they currently maintain 360 data sets as a service to the machine learning community. You may view all data sets through our searchable interface.
The datasets available range across many topics and vary quite a bit in terms of size from only a few cases (or “instances”) up to over 43 million and from only 1 or 2 variables (or “attributes”) to over 3 million variables (although most have fewer than 100 up to about 1000 or so variables).
Each dataset has a link with a page describing the data’s origins and any relevant information on how it was obtained and its intended use. Often previous papers published using the dataset or on the originating study are also listed and are helpful for understanding the dataset and how to analyze it. Each dataset’s webpage had a link to “Data Set Description” and a “Data Folder”. The Data Folder is where you will find a listing and links for downloading the data.
The “data” provided is often in multiple files and many are compressed or zipped. Usual decompression software (such as available on Windows systems for ZIP files) should work to access these. However, some are provided as *.tar
or *.tar.Z
files. For these you will need software such as:
Copyright © Melinda Higgins, Ph.D.. All contents under (CC) BY-NC-SA license, unless otherwise noted.