Brainstorm's Database Structure

Authors: Martin Cousineau, Raymundo Cassani

Understanding how data is organized and managed in Brainstorm is paramount for scripting your own pipelines, and adding new processes in the software.

Conceptually, Brainstorm database consists in a collection of Protocols. In turn, a Protocol consists in a set of Subjects from whom Anatomical data (MRI) and recordings (MEG, EEG, fNIRS, SEEG, etc.) were acquired. Lastly, for each subject, the sensor locations, the recordings, the computed sources and all the data derived from their processing receive the name of Functional data.

This page describes the structure of the Brainstorm database on the hard drive, and the metadata that is necessary to manage it.

On the hard drive

Brainstorm database resides on the hard drive at the database directory (brainstorm_db/). The database consists in an organized collection of protocols, where each protocol each with its own directory, which has a strict hierarchy, explained in the bullet list below:

tree_db.png

In anatomy data files and functional data files, the prefix in the filename indicates its content. For example, an anatomy file named surface_*.mat contains cortex surface information. Data in these files is stored as a Matlab structure, according to the different file types.

The content of the structure of each type_*.mat files is defined by the function db_template(), using type plus the mat suffix as argument. When the data contained in the type_*.mat file is loaded in memory in Brainstorm, it follows the Matlab structure given by db_template() using loaded plus the type. Lastly, the metadata related to the file type_*.mat is kept in memory as a Matlab structure given by db_template() using type as argument. For example:

The protocol metadata files (protocol.mat and protocol.db) contain all the metadata that describe the relationship among the data files in the protocol directory. In case that any of the protocol metadata files is accidentally deleted or gets corrupted, the protocol metadata can be regenerated from all the brainstormsubject.mat and brainstormstudy.mat in the protocol directory as these files contain the metadata for the anatomy and functional data files for each subject in the protocol directory.

Protocol metadata

In Brainstorm versions <= 3.x, the protocol metadata was stored as a Matlab structure in the protocol.mat file, which content was kept in memory. Starting from version 4.0, the protocol metadata is stored as a SQLite database which is frequently queried, i.e., the protocol metadata is not kept in memory. Below a description of these two protocol metadata approaches.

Matlab structure: protocol.mat

Once a protocol is loaded, Brainstorm has Matlab structures in memory that contain the metadata of all the files in this protocol. These are located in the GlobalData.DataBase global variable, such that it is easily accessible in any function of the software. The structure of this variable is defined by db_template('GlobalData'). Let's walk through some of the important variables in this structure:

The two main variables of interest are ProtocolSubjects and ProtocolStudies. Indeed, together they contain the metadata of every single file in the protocol! Let's dive a bit deeper into both of them.

ProtocolSubjects This structure contains the metadata for the anatomy files of every subject. This metadata is organized in one structure per subject and has the following fields:

ProtocolStudies

This structure contains the metadata for the functional data files of every subject. This metadata is organized in one structure per study (also known as first level folder, or condition) and has the following fields:

SQLite structure: protocol.db

Starting from Brainstorm version 4.0 the protocol metadata is no longer available in memory as Matlab structures. Instead, it is saved in a SQLite relational database. This means that when the protocol is first created, an empty SQLite database is created and is populated with the metadata files in the protocol directory. Whenever metadata needs to be retrieved, it is queried from the SQLite database. While a bit slower, this gives us the ability to support concurrency which is essential for sharing protocols across contributors and running multiple jobs on the same protocol at the same time. More information here.

SQLite implements most of the SQL standard syntax, and stores all database information in a single file, the protocol.db file. Since we no longer need the metadata stored in Matlab structures, the protocol.mat is no longer used. For debugging purposes, you can explore the protocol.db file using the application DB Browser for SQLite. Concurrency is handled with regards to SQL access (i.e. if a Brainstorm instance is currently inserting metadata, another instance will not be able to access the SQLite database file until the modification is complete).

To avoid having long wait times, both due to concurrency or remote databases (when SQL queries need to be sent through a network), it is important to optimize as much as possible the database calls such that (1) active connections are closed as quickly as possible and (2) queries only return information that is absolutely required. Therefore, when writing SQL queries, make sure to only include columns that are needed (avoid SELECT * when possible) and return only rows that are needed (for example, if you only need the first row, add LIMIT 1 at the end of your query).

Despite the fact that the protocol metadata is now stored in SQLite database, the global variable GlobalData.DataBase is still available in Brainstorms and its content is fairly similar to the one described above, but with a few notable differences:

SQL Schema

A SQL schema defines all the tables, the columns of every table and the relation between tables in the database. Refer to the sql_generate_db() function: the name, type and NOT NULL properties of the tables columns are defined in db_template(), and additional properties like foreign keys and primary keys are hardcoded in sql_generate_db().

sqlite_schema.png

Below a description of the tables in the SQLite database.

SQLite API in Brainstorm

This API consists on functions that wrap the functionality of SQL queries to read and write the protocol metadata SQLite database. These functions are divided in two types according to their abstraction level.

High level

The most important functions to read and write the SQLite database are:

A simple example of using db_get and db_set

% Get all info for default subject with Id = 1 
sSubject = db_get('Subject', 1);
% Update the name of this subject.
sSubject.Name = 'J.Doe';;
db_set('Subject', sSubject);

Low level

The db_get and db_set functions rely in lower level functions that interact with the database:

Here are some specific SQL query examples:

1. Direct query

This allows you to write manual SQL queries and execute them directly. The drawback is that it returns a java.sql.ResultSet object so you need to be familiar with this specific syntax, and you need to handle the SQL connection yourself. Example:

sqlConnection = sql_connect();
resultSet = sql_query(sqlConnection, 'SELECT * FROM Subject');
while resultSet.next()
    disp(resultSet.getString('Name'));
end
resultSet.close();
sql_close(sqlConnection);

2. SELECT query

This lets you read data from the database. It generates a SELECT query, executes it for you, and returns a db_template() structure of the proper table. Example:

% Get all info of subjects that use the default anatomy.
sSubject = sql_query('SELECT', 'Subject', '*', struct('UseDefaultAnat', 1));
% Get the name of the first subject.
sSubject = sql_query('SELECT', 'Subject', 'Name', [], 'LIMIT 1');
% Get the name and ID of all subjects.
sSubject = sql_query('SELECT', 'Subject', {'Id', 'Name'});

3. UPDATE query

This lets you modify rows in the database. It generates an UPDATE query, executes it for you, and returns the number of rows modified. Example:

% Makes subject #1 use the default anatomy
numRows = sql_query('UPDATE', 'Subject', struct('UseDefaultAnat', 1), struct('Id', 1));
% Makes all subjects use the default anatomy
numRows = sql_query('UPDATE', 'Subject', struct('UseDefaultAnat', 1));

4. INSERT query

This lets you insert rows into the database. It generates an INSERT query, executes it for you, and returns the generated ID of the inserted rows. It is strongly recommended to use db_template() to define the values to insert. Example:

sSubject = db_template('Subject');
sSubject.Name = 'MyNewSubject';
SubjectId = sql_query('INSERT', 'Subject', sSubject);

5. DELETE query

This lets you delete rows from the database. It generates a DELETE query, executes it for you, and returns the number of rows deleted. Be careful, this cannot be undone! Example:

% Delete Subject #1
numRows = sql_query('DELETE', 'Subject', struct('Id', 1));
% Delete all subjects
numRows = sql_query('DELETE', 'Subject');

Tutorials/Database (last edited 2022-12-05 23:21:05 by RaymundoCassani)