The dataset schema¶
Each dataset is associated with a database schema which is essentially a description of the tables and columns accessible through the dataset.
The dataset itself doesn’t contain any data but rather acts as a proxy that is able to provide database files that conform to the specified schema. The user interface uses the schema to drive the data studio by classifying data columns by their type and format.
It is still possible for data to exist that isn’t registered in the metadata and such data can be accessed through direct database queries through SQL, but it will be harder to discover and the user interface will not be able to assist as much with formatting or data discovery.
In most cases, the schema will be automatically reverse-engineered from the source data when uploading data files or connecting databases, but there are scenarios where you may want to hand-craft or modify the database schema.
In particular, it is sometimes necessary to adjust the schema to set appropriate data types and formatting options.
Inspecting and changing the dataset schema¶
To see the dataset schema, navigate to the dataset from the home page or by searching for it in the header search bar. On the dataset screen, select the schema tab.
The left side of the tab shows a list of tables and clicking on a table will update the right panel to display the columns defined for the selected table.
You can edit the schema of datasets sourced from direct data uploads. Datasets syndicated from cloud drives, open data or remote SQL databases can not be modified. To update the schema of these types of datasets, change the source data instead.
When a schema is editable, additional icons will appear in the interface. You can add and delete tables and columns.
Select the data type that most closely resembles the source data. Since data can come in any number of formats and data types, it is not always possible for Rational BI to have an exact correspondence with the source data type. In such cases, select the closest match.
It is especially important to properly select dates and currencies since the system applies special purpose processing to certain columns. Dates can be grouped and truncated in various ways and currencies are calculated with decimal arithmetic to ensure that rounding errors do not occur.
Table and column metadata¶
Database-level column name used in SQL statements and in the schema.
Since Rational BI uses SQLite as a transfer format, the column name must follow the restrictions enforced by the database engine.
Specifically, the name should not start with a number, should not contain any spaces and should not be any of the SQLite reserved keywords.
Column caption as it shows up in the user interface. The caption is the default name of columns in the views rendered into widgets in the user interface, unless overridden in the data designer.
Controls how column data is stored, presented and filtered. The type decideds which underlying SQLite data type is used, controls if a column is considered a dimension or a mesaure, how it is aggregated and how it is presented in views.
It is important that column types are correctly configured since an incorrect type setting can lead to a poor user experience or incorrect data.
The following column types are recognized:
|Type||Database Type||Storage format||Classification||Default Formatting||Filtering|
|String||Text||Stored as-is in the database||Dimension||No special formatting||Dropdown filter with distinct column values|
|Number||Numeric||Stored as-is in the database||Measure||Formatted based on locale settings for numbers||Numerical selection|
|Currency||Integer||Stored as currency value * 100 in order to ensure accurate aggregations of decimal values. Example: $4.84 is stored as 484.||Measure||Formatted based on locale settings for currencies||Numerical selection|
|Date||Integer||Stored as UNIX Time||Dimension||Formatted based on locale settings for dates||Date range selector|
|Boolean||Integer||Stored as a number where 1 = true and 0 = false||Dimension||Shown as true/false or an equivalent representation||Checkbox|
|Latitude||Numeric||Stored as a decimal number||Measure||Like number columns||Numerical selection|
|Longitude||Numeric||Stored as a decimal number||Measure||Like number columns||Numerical selection|
Data is by default formatted based on the mechanisms built into the browser, such as properly localized date, number and currency formats.
Column formatting in Rational BI uses Excel formatting rules. The system will make an attempt to format data according to local conventions based on the currency and location information specified in the account settings but sometimes, there’s a need to set a custom data format.
The default formatting can be overridden by specifying a format string using the Excel formatting language.
For convenience, a reference guide can be found here.
Explanatory information about the column, such as what it represents and how to interpret it. Shown in the user interface.
Merging the schema with upload¶
When you upload a data file into a dataset with a specified schema, the system will attempt to ensure that the uploaded data conforms to the existing schema. If there are differences, such as new tables or columns that were not previously configured, the system will present you with options to merge the new data format with the existing schema such that both existing and new data can be accommodated.
This is appropriate only when the new data deviates slightly from the existing schema and you’ll want to update the existing dataset. If you are uploading distinctly different data, consider creating a new dataset instead. There is no limit to the number of datasets you can create and it’s often better to leave the existing dataset in place in order to not disrupt existing reporting that relies on particular tables and columns being present.
You can also merge a new data file into an existing schema without uploading new data. To do so, navigate to the schema tab of the dataset page and upload a new datafile at the indicated area. This will attempt to perform the same schema merge as when uploading a file with data, but without replacing existing databases.