Brainstorm's Database Structure

Authors: Martin Cousineau, Raymundo Cassani

Understanding how data is organized and managed in Brainstorm is paramount for scripting your own pipelines, and adding new processes in the software. This page describes the structure of the Brainstorm database, and the metadata that is necessary to manage it.

On the hard drive

Brainstorm database resides on the hard drive at the database directory (brainstorm_db/). The database consists in an organized collection of protocols, where each protocol each with its own directory, which has a strict hierarchy, explained in the bullet list below:

tree_db.png

The protocol metadata files (protocol.mat and protocol.db) contain all the metadata that is required to manage the protocol data. In case that any of these files is accidentally deleted, or gets corrupted, the protocol metadata can regenerated from all the brainstormsubject.mat and brainstormstudy files in the protocol directory.

All required metadata should always be saved on the hard drive outside of the protocol.mat / protocol.db files, such that if they get corrupted, deleted or one does a fresh reload of the database, there is no loss of information. This is why things like: which cortex file is selected, or which trials are marked as bad are also saved in separate files (brainstormsubject.mat and brainstormstudy.mat).

The filename of each file should always clearly indicate the basic type of the file, hence the required prefixes (e.g. data_*.mat).

For the other anatomy files and other functional data files, the prefix in the filename indicates its content. For example, a file surface_*.mat contains cortex surface information. The data is stored as a Matlab structure, according to the different file types.

The content of the structure of each .mat files is defined by the function db_template(). The mat suffix is used to differentiate from the in-memory metadata structure and the in-momery data structure of each type. For example:

Terms that need definition

Protocol metadata

A brief paragraph describing why there are two versions, the main differences, and when is the user expected to see one or the other.

Matlab structure: protocol.mat

Once a protocol is loaded, Brainstorm has Matlab structures in memory that contain the metadata of all the files in this protocol. These are located in the GlobalData.DataBase global variable, such that it is easily accessible in any function of the software. It is defined db_template('GlobalData'). Let's walk through some of the important variables in this structure:

The two main variables of interest are ProtocolSubjects and ProtocolStudies. Indeed, together they contain the metadata of every single file in the protocol! Let's dive a bit deeper into both of them.

ProtocolSubjects

This structure contains the metadata of every subjects and anatomy files.

ProtocolStudies

This structure contains the metadata of every studies (also known as first level folders, or conditions) and every functional files inside each study.

SQLite structure: protocol.db

With the new database structure, the metadata of the protocol files is no longer in memory in Matlab structures. It is now saved in a SQLite relational database. This means that when the protocol is first created, an empty database is created and is populated with SQL rows. Whenever metadata needs to be retrieved, the metadata is no longer available in memory and needs to be queried instead. While a bit slower, this gives us the ability to support concurrency which is essential for sharing protocols across contributors and running multiple jobs on the same protocol at the same time.

SQLite stores all database information in a single file, the protocol.db file. Since we no longer need the metadata stored in Matlab structures, the protocol.mat is no longer used. That means we can easily detect if a protocol was created from Brainstorm v3.4 (protocol.mat) vs Brainstorm v.4.0 (protocol.db) by looking at the root of its folder in the database folder. For debugging purposes, you can explore the .db file using the application DB Browser for SQLite. The file system handles concurrency with regards to SQL access (i.e. if an instance of Brainstorm is currently inserting metadata, the file system won't let another instance access the .db file until the modification is complete).

To avoid having long wait times, both due to concurrency or remote databases (when SQL queries need to be sent through a network), it's important to optimize as much as possible the database calls such that (1) active connections are closed as quickly as possible and (2) queries only return information that is absolutely required. Therefore, when writing SQL queries, make sure to only include columns that are needed (avoid SELECT * when possible) and return only rows that are needed (for example, if you only need the first row, add LIMIT 1 at the end of your query).

DataBase GlobalData

The GlobalData.DataBase variable is fairly similar to the one described above, with a few notable differences:

SQL Schema

TODO: Image

A schema defines all tables, the columns of every table and the relation between tables in the database. Refer to the sql_generate_db() function: the name & type of the tables columns are defined in db_template(), and additional properties like foreign keys, primary keys and NOT NULL are hardcoded in sql_generate_db().

Querying the SQLite database

To query the SQLite database, you need an active connection. This is defined

Tutorials/Database (last edited 2021-03-26 01:33:08 by ?MartinCousineau)