Rhiza supports uploaded .csv files. Before you upload your data, make sure the file is formatted correctly and is free from errors.
Format and Size
The first line of the file contains data fields that become the names of the columns for the rest of the rows in the file. Each column name must be unique. The rest of the lines contain data fields to fill out the dataset.
With a few exceptions, most columns become attributes that you can use when creating target series. So, in the example below, the column Income would be an attribute that you could select when defining your target, or an attribute that you could group to in your target series.
For best results, datasets should have a maximum of 25 columns and 10,000 rows.
Clean and Eliminate Errors
Before you upload a dataset, take a careful look at all of the cells. The most common problems when uploading datasets are:
- A ZIP code column has cells that contain more than five numbers. This can happen if you created the dataset in Excel® (sometimes Excel adds an extra zero to certain ZIP codes in the northeastern United States) or if you've perhaps cut and pasted a seven-digit ZIP code into the column.
- There are unreadable characters in your dataset. Again, this might happen if you are manipulating the data in Excel before saving it as a .csv and uploading it, or it might happen if you've imported the data from another source. The most likely culprit is a hidden unicode character.
- Your dataset uses inconsistent cases for the attribute values in a column. treats attribute values as case sensitive. That means that if you have both Female and female as values for an attribute, the upload can fail.
If your dataset has errors, you might notice that it either does not finish uploading or that you cannot select the expected data type for a column. In either of these cases, examine your .csv file to ensure the data is clean.