This type of dataset is created any time you upload a file directly into Rational BI by dropping it into the web browser. The system will analyze the file and build a database from it. The result of this process will be a set of metadata: The name of the dataset and a list of tables and the columns of each table.
Depending on the type of the source file the system may be able to extract more or less metadata from the source file. Spreadsheets have some of the most data available, and Rational BI can extract formatting and column types from the source files.
The system will also load the data itself into a self-contained database that conforms to the dataset data schema. This database file will be stored securely within Rational BI and will be subject to the access controls configured within the account.
Each dataset can contain multiple databases¶
When you upload additional data (such as a new file) into a stored dataset, the system will make sure that the new file conforms to the tables and columns already configured. If the new data deviates from the established schema, it may reject the data or give you options to merge the schema and make changes to accommodate both existing and new data within the same database.
More than one physical database can map to the same dataset. You can inspect the stored databases in the user interface by navigating to the dataset and selecting the Data tab. The data tab will only show up for datasets where Rational BI is storing physical data.
The active database¶
When more than one database is uploaded within the dataset, the system will default to use the one that is marked active. This is typically the most recently uploaded database but the active database can be overridden through the user interface and the API.
To mark a specific database as active, navigate to the data tab, select the database you would like to activate and select Activate from the hover menu. You can also mark a database as active when you upload it through the API.
Data in Rational BI should be considered a reporting copy.
The system will make an effort to ensure the durability of uploaded data but it should not act as the system of record for any data.
Rational BI stores reporting databases in highly durable systems (such as Amazon S3) which make it very unlikely for data loss to occur, but such a scenario is still possible. In such cases, the data must be re-uploaded into the system before reports relying on the dataset can be accessed.
Since the datasets associated with the dataset are transferred to the web browser when used in a report, it’s important to keep data limited to that which is reasonable to perform real-time reporting from. Rational BI recommends a maximum database size of 100MB or less, but appropriate data size will be dependent on the audience and intended usage. Database files will be compressed when transferred from the Rational BI servers to the data consumer’s web browser, and the on-the-wire download size can be seen in the database tab on the dataset page in the “compressed size” column.
Report performance is directly related to the transferred data size and keep in mind that some users and mobile devices may be subject to data caps, so it’s important to limit the size of the uploaded databases for best performance and an optimal experience.
If you need to query across large volumes of data, consider using a remote dataset instead where queries are evaluated on the server instead of in the user’s web browser.