Uploading single data files¶
You can upload single data files by dropping them over the home screen. The data file will be analyzed and the data will be converted into a new dataset with a corresponding schema and database.
Preparing data files¶
Spreadsheet files are useful for storing table data but they must be formatted in a way that allows Rational BI to read and convert data to table and columns in a database.
Each sheet in the notebook will be mapped to a separate table. The name of the sheet will be the basis for the table name and the system will retain the sheet title as the table title.
Within each sheet, data should be formatted as tabular data starting in the top left position where the first row contains column names.
The following rows should contain the data to be loaded. Columns should be formatted with correct data types and format masks.
Formatting such as colors and cell sizes will generally be ignored, but cells should not be made to span multiple rows or columns. Charts, pivot tables, formulas and other non-tabular expressions should not be used and may interfere with the data upload.
The system will make note of sheets that can’t be converted in the dataset page.
Rational BI is able to read data in spreadsheet format but certain care must be taken to ensure that the data contained within the spreadsheet is formatted in a way that can be converted into table format.
- Each sheet in the spreadsheet corresponds to a table. The sheet name should be the same as the table name.
- Data should start at coordinate A1 (top left) without any empty rows or columns.
- The first row should be a header row with each column corresponding to a column in the database. The name in the header column should be the same as the table column in the dataset metadata.
- Each column should ideally have an explicit format set. In other words, columns with dates should be real dates as represented by the spreadsheet. Strings with date-like formats will not work.
Uploading the file¶
You can upload data by dropping files in various formats into the dataset designer.
Data uploads are atomic in that they must contain all the data required for all the defined tables and columns in specified dataset and with this in mind, the formats that support multiple sheets or equivalent work best since they are able to generate data for datasets with more than one defined table.
If your dataset has only a single table defined, you'll be able to upload data through any of the formats that support a single sheet or table.
Supported file types¶
|File type||Multiple tables||File extension|
|Excel 2007+ Workbook||yes||.xlsx|
|Excel 97-2004 Workbook||yes||.xls|
|Excel 5.0/95 Workbook||yes||.xls|
|Excel 2.0 Worksheet||no||.xls|
|Excel 2003-2004 (SpreadsheetML)||yes||.xls|
|Comma Separated Values||no||.csv|
|Tab Separated Values||no||.tsv|
The dataset schema will automatically be reverse engineered from an existing file. The system will analyze the file and try to extract tables, columns and column formats based on the available data.
Spreadsheets such as Excel Workbooks are most suited for schema autogeneration since they alllow you to format each column with the appropriate data type and format mask. Whenever a column is detected without a specified type, the system will attempt to infer a datatype based on the data found in the column.
The schema generation is typically more accurate the more carefully formatted the input file is.
Columns should preferably be named as they should appear in the user interface and the system will try to construct an appropriate database column name by converting the column header to lower case and replacing spaces with underscore characters ("_"). The system will also attempt to create a standardized naming scheme by adjusting casing in the columns if required.
Size limits of data uploaded through the user interface¶
Data uploaded through the user interface is processed and converted to a database in the web browser. This means that although large files can be uploaded, there are certain limitations to be aware of.
When uploading data, consider that any consumers of your data will be downloading a database that will have a size corresponding to your original source file. The actual database file may differ in size depending on the data and how it is stored, but generally more concise data will yield a better user experience.
It is typically reasonable to upload spreadsheets and files that are up to approximately 100MB in size. If your data is larger than that, consider using a remote database connection to avoid excessive memory and network load for report consumers. For analytics and data science purposes where the target users have computers with more memory and better connectivity, databases in excess of 1GB can be used. Uploading data via the API
It is possible to use the Rational BI API to upload databases for use with datasets. This can be beneficial in certain scenarios such as when automating construction of reporting datamarts. By having an ETL process create and upload databases according to a schedule, it’s possible to tightly control the reporting data that is made available to users, to build targeted databases for specific uses and allow very wide usage of data and queries against the uploaded data without placing any load on the original data stores.
Rational BI will manage the uploaded databases and enforce security based on the ACLs applied to the dataset. Since a single dataset can contain multiple databases, it’s easy to maintain a robust reporting system by maintaining a trail of historical databases that are only replaced when a new database has been uploaded and integrity checked.
When a dataset has more than one database uploaded it will serve up the database marked active to report consumers. If a problem is detected, an user with sufficient privileges can navigate to the data tab of the dataset in the user interface and mark an alternative database active.