AutoClust

College of Engineering

Autonomous Database Partitioning Using Data Mining for High Performance Computing

This material is based upon work supported by the National Science Foundation under Grant No. 0954310. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation (NSF).

Right click on the listed file names to download the test queries and the test data.

1. Test Queries and Database Table Information

All Test queries and test database table description: TPC-H.pdf file.

More information about TPC-H benchmark: TPC-H.org

2. Test Data

Data set for the database table LINEITEM (six million rows):

First one million rows: LINEITEM_1ST1M.csv file.

Second one million rows: LINEITEM_2ND1M.csv file.

Third one million rows: LINEITEM_3RD1M.csv file.

Fourth one million rows: LINEITEM_4TH1M.csv file.

Fifth one million rows: LINEITEM_5TH1M.csv file.

Sixth one million rows: LINEITEM_6TH1M.csv file.

Data set for the database table ORDERS (one point five million rows):

First one million rows: ORDERS_1ST1M.csv file.

Second half million rows: ORDERS_2ND1M.csv file.

Data set for the database table CUSTOMER:

CUSTOMER.xlsx file.

Data set for the database table PART:

PART.xlsx file.

Data set for the database table SUPPLIER:

SUPPLIER.xlsx file.

Data set for the database table PARTSUPP:

PARTSUPP.xlsx file.