Cloud Drive Connections¶
Connecting Rational BI to your cloud drive is a convenient way to make data available for reporting and analytics. Services like Google Drive and Dropbox work like virtual drives, not unlike USB drives that can be connected to your computer and shared between any number of users.
Rational BI can be given permission to access these drives and is able to monitor the files and make most common data files available to you for reporting. The software will automatically monitor the file system for changes and make sure that any reports and analytics that make use of the data is always up-to-date with the latest changes.
When data files are discovered on the connected drive, they will be made available as Rational BI as datasets, virtual containers of data that act as databases with tables and columns mirroring the source data.
When files are discovered, they will be listed as syndicated datasets, initially without a schema or data. This allows them to be discovered by browsing or searching for them in the interface.
Data is seamlessly loaded when the dataset is first accessed and table and column metadata will be built from the source data. At the same time, the system creates a relational database from the file data which will automatically be loaded when data is requested.
The generated database is cached by Rational BI for a period and will be served on demand to any user of the data. If the source file changes, the cached database will be discarded, and a new database will be built before it is provided to the requester.
Connecting to a cloud drive¶
To connect to a cloud drive, navigate to the home screen and select New under the Cloud Drive Connections section. You will get a choice of supported third-party providers to connect to.
Rational BI needs to be granted permission to access the remote file system and you will be redirected to the provider’s consent screen for approval before the connection can be established. After approval, you will be redirected back to the Rational BI user interface, and the connection is established.
The system will queue the file system for an initial file scan to look for any files that can be used for to build reports.
If you’re unable to approve the connection, Rational BI will still create it, but it will be left in a disconnected state. If that happens, either delete the connection or establish the broken connection.
When you create a connection to a cloud drive in Rational BI, you will potentially provide all users with access to your account with access to the files accessible to the user you used to set up the connection. Since cloud drives can have extensive folder structures, it’s easy to accidentally index files that should not be accessible.
To prevent such a scenario, consider creating a separate service account for the connection and only grant access to specific files as required. Google Drive connections have access to all files accessible to the connecting user. Dropbox connections are limited to a specific application folder.
Rational BI does have access to the username or password of the connecting user and is only indirectly accessing files on behalf of the connecting user.
Connections can be in multiple states depending on the remote system. When a connection is first created, it will be in an initial state, waiting for a connection to be established. The connection is not yet active and can’t access any files.
After the authentication session with the remote system completes successfully, the connection moves to an active (connected) state. When the connection is active, it will construct an initial index and make any files in the remote system available within Rational BI as well.
The initial scan can take some time depending on the number of files in the remote system but should typically conclude with a few minutes.
The system will also start a monitoring thread and watch for new files, file deletions and file changes and update the mirrored/syndicated datasets in Rational BI accordingly.
If the connection for some reason no longer can be maintained, the connection will move to a disconnected state. Datasets can become disconnected for a variety of reasons, including lost privileges of the establishing user, authentication expiration, removal of the connection on the third party side and so on. If the connection is disconnected, it can be reestablished by reconnecting it.
Reestablishing a broken connection¶
If a connection has become disconnected, navigate to it from the home screen or account administration and click Reconnect. The system will attempt to reset the connection. Follow the prompts and click to provide access when indicated.
Rational BI is designed to automatically monitor the remote servers and ensure that any changes to files or file contents are automatically reflected in the user interface.
In certain scenarios it’s possible for files to change without Rational BI being notified and in such cases it can be necessary to initiate a manual synchronization.
To synchronize manually, navigate to the connection from the home screen or account administration and click Synchronize Files. The system will scan the remote drive and update any changed files.
Deleting a connection¶
Connections can be deleted at any time. Deleting a connection also deletes any syndicated datasets and cached databases.
To delete a connection, navigate to the connection from the home screen or account administration and click Delete. The system will remove the connection and any datasets syndicated through the connection will be deleted.
Deleting a connection will make any reports that depend on the syndicated data inoperative.
Don’t delete a connection that is in use.
File and database names¶
The system will automatically create a virtual dataset for each remote datafile and since a dataset is essentially a database with data the system will also need to assign a unique database name that can uniquely identify the dataset when requested.
Database names will be generated from the remote file names by converting the name to conform to programmatic limitations. Since more than one file can have the same name, either in different remote directories or between different connections, the system will append additional identifiers when more than one file would map to the same dataset.
This can be confusing when trying to access datasets and it is recommended that file names be kept unique for data where reliable access is required.
There are some restrictions on valid database names. Specifically , the name should not start with a number, should not contain any spaces and should not be any of the SQLite reserved keywords. The system will make adjustments as nessessary to ensure that the file names are valid.
Duplicate file names
If more than one file with the same name exists, additional identifiers will be appended to the end of the database name. When a connection is recreated, the identifiers can change and reports relying on them may no longer work.
Make sure to provide a unique name for each data file.
The system limits the maximum number of files indexed per connection. If you have a large number of files available, some files may not appear in Rational BI. If this is the case, consider establishing the connection through a user with more limited access or move files out of the accessible locations in order to avoid indexing too many files.
There is also a limitation to the maximum size of files that can be mirrored. This limitation applies separately for each file and exists to protect the system and to ensure that the generated mirrored databases retain a size that is practical to load into the browser database.
If a file is too large, it will not show up in the user interface. In certain cases, the file may still appear in Rational BI but will be marked as oversized in the interface.
Rational BI will monitor changes to files and refresh the data when the original source files change. Since some files change with a very high cadence, there’s a limit build into the system to ensure that data is getting rebuilt at most with a certain maximum frequency.
Data that changes more than a few times a day may not immediately reflect changes if the data is simultaneously accessed by reports very frequently. If the dataset has not been accessed recently, it will be immediately synchronized. In other words, this limitation only comes into effect when there is a high frequency of both data access and change.
If you find that the dataset is not keeping up with the source data, navigate to the dataset and press the Synchronize Schema button since this will also trigger a refresh of the underlying database if required.
Connections to Google Drive will index files made available to the connecting user. Rational BI will only search for files with certain file types file types. To limit the scope of visible files, consider creating a service account and share only specific files with the service user.
Connections to Google Drive will automatically index Google Sheets documents. Google Sheets are updated in real-time and Rational BI will mirror any changes made to the spreadsheets in the reporting databases to ensure that your data is always up-to-date.
Connections to Dropbox create an app-specific directory (/rationalbi) in the Dropbox file system. Rational BI can only access files stored in the application directory. Move files you wish to index into the application directory.
Indexed File Types¶
Files are indexed by their file type. The system currently searches for the following files:
- Excel files (application/vnd.ms-excel)
- CSV files (text/csv)