.backtick {font-size: 14px;})>>
= Brainstorm's Database Structure =
''Authors: Martin Cousineau, Raymundo Cassani''
Understanding how data is organized and managed in Brainstorm is paramount for [[Tutorials/Scripting|scripting your own pipelines]], and [[Tutorials/TutUserProcess|adding new processes]] in the software.
Conceptually, '''Brainstorm database''' consists in a collection of '''Protocols'''. In turn, a '''Protocol''' consists in a set of '''Subjects''' from whom '''Anatomical data''' (MRI) and recordings (MEG, EEG, fNIRS, SEEG, etc.) were acquired. Lastly, for each subject, the sensor locations, the recordings, the computed sources and all the data derived from their processing receive the name of '''Functional data'''.
This page describes the structure of the Brainstorm database on the hard drive, and the metadata that is necessary to manage it.
<>
== On the hard drive ==
Brainstorm database resides on the hard drive at the [[Tutorials/CreateProtocol#Brainstorm_folders|database directory]] (`brainstorm_db/`). The database consists in an organized collection of '''protocols''', where each protocol each with its own directory, which has a strict hierarchy, explained in the bullet list below:
{{attachment:tree_db.png}}
In anatomy data files and functional data files, the prefix in the filename indicates its content. For example, an anatomy file named `surface_*.mat` contains cortex surface information. Data in these files is stored as a Matlab structure, according to the [[Tutorials/Scripting#File_structures|different file types]].
The content of the structure of each `type_*.mat` files is defined by the function `db_template()`, using `type` plus the `mat` suffix as argument. When the data contained in the `type_*.mat` file is loaded in memory in Brainstorm, it follows the Matlab structure given by `db_template()` using `loaded` plus the `type`. Lastly, the metadata related to the file `type_*.mat` is kept in memory as a Matlab structure given by `db_template()` using `type` as argument. For example:
* `db_template(`'''`'surfacemat'`'''`)`: Structure saved on the hard drive as `surface_*.mat` file.
* `db_template(`'''`'loadedsurface'`'''`)`: Structure saved in memory containing all the data loaded from the `surface_*.mat` file.
* `db_template(`'''`'surface'`'''`)`: Structure saved in memory containing all the metadata for the `surface_*.mat` file, loaded from the `protocol.db` or `protocol.mat` files rather than the `surface_*.mat` file itself.
The [[#Protocol_metadata|protocol metadata]] files (`protocol.mat` and `protocol.db`) contain all the metadata that describe the relationship among the data files in the protocol directory. In case that any of the protocol metadata files is accidentally deleted or gets corrupted, the protocol metadata can be regenerated from all the `brainstormsubject.mat` and `brainstormstudy.mat` in the protocol directory as these files contain the metadata for the anatomy and functional data files for each subject in the protocol directory.
== Protocol metadata ==
In Brainstorm versions <= 3.x, the protocol metadata was stored as a Matlab structure in the `protocol.mat` file, which content was kept in memory. Starting from version 4.0, the protocol metadata is stored as a SQLite database which is frequently queried, i.e., the protocol metadata is not kept in memory. Below a description of these two protocol metadata approaches.
=== Matlab structure: protocol.mat ===
Once a protocol is loaded, Brainstorm has Matlab structures in memory that contain the metadata of all the files in this protocol. These are located in the `GlobalData.DataBase` global variable, such that it is easily accessible in any function of the software. The structure of this variable is defined by `db_template(`'''`'GlobalData'`'''`)`. Let's walk through some of the important variables in this structure:
* `GlobalData.DataBase.`'''`ProtocolInfo`''': Basic information about the protocols in the Brainstorm database folder, refer to `db_template(`'''`'ProtocolInfo'`'''`)`. Fields:
* '''Comment''': Name of the protocol
* '''STUDIES''': Absolute path to the folder containing the functional data (/data/).
* '''SUBJECTS''': Absolute path to the folder containing the anatomical data (/anat/).
* '''iStudy''': Index of the currently selected study, if applicable.
* '''UseDefaultAnat''': Protocol setting, whether new subjects should use the default anatomy by default.
* '''UseDefaultChannel''': Protocol setting, whether new subjects should share channel files by default.
* `GlobalData.DataBase.`'''`ProtocolSubjects`''': Metadata of all subjects and their anatomical files of each loaded protocol, refer to `db_template(`'''`'ProtocolSubjects'`'''`)`. Fields:
* '''Subject''': Metadata of all regular subjects, refer to `db_template(`'''`'subject'`'''`)`.
* '''DefaultSubject''': Special subject used to store default anatomy. Same structure as regular subjects.
* `GlobalData.DataBase.`'''`ProtocolStudies`''': Metadata of all studies and their functional files of each loaded protocol, refer to `db_template(`'''`'ProtocolStudies'`'''`)`. Fields:
* '''Study''': Metadata of all regular studies, refer to `db_template(`'''`'study'`'''`)`.
* '''DefaultStudy''': Special study used to store Common files, e.g. a global channel file. Same structure as regular studies.
* '''AnalysisStudy''': Special study to store Inter-subject files, e.g. stats results. Same structure as regular studies.
* `GlobalData.DataBase.`'''`isProtocolLoaded`''': List of bool, whether the metadata of the protocol is already in memory or not. This is useful when you're running Brainstorm from the command line.
* `GlobalData.DataBase.`'''`isProtocolModified`''': List of bool, whether the metadata of the protocol was modified and therefore the `protocol.mat` file needs to be overwritten when Brainstorm closes.
* `GlobalData.DataBase.`'''`iProtocol`''': The index of the currently selected protocol.
* `GlobalData.DataBase.`'''`BrainstormDbDir`''': Absolute path to the brainstorm database directory.
* `GlobalData.DataBase.`'''`DbVersion`''': Version number of the currently loaded protocol. Used to update the protocol structure when new fields are added to a specific file for example. Refer to `db_update()`.
* `GlobalData.DataBase.`'''`isReadOnly`''': Flag whether the user has write access to the protocol database directory. Used in `tree_callbacks()` to remove GUI option menus that would modify files.
* `GlobalData.DataBase.`'''`LastSavedTime`''': Timestamp of the last time the protocol.mat file was overwritten. There used to be an automatic save when it had been a few hours.
* `GlobalData.DataBase.`'''`Searches`''': Information about the search tabs of the database. Fields:
* '''iActive''': Index of the currently selected active search tab (0 if none).
* '''Active''': List of all active search tabs, with the first one being the whole database.
* '''Name''': Name of the tab, or empty if whole database
* '''SearchNode''': Tree structure representing the search query
* '''AnatRootNode'''/'''FuncRootNode''': The root node of the database tree, to avoid rebuilding the Java tree when you switch between tabs.
* '''AnatSelNode'''/'''FuncSelNode''': The currently selected node, to remember the last selected object before you switched between tabs.
* '''All''': If searches were saved between Brainstorm sessions, they are stored here.
* '''Name''': Name of the search
* '''Search''': Tree structure representing the search query
The two main variables of interest are '''`ProtocolSubjects`''' and '''`ProtocolStudies`'''. Indeed, together they contain the metadata of every single file in the protocol! Let's dive a bit deeper into both of them.
'''__ProtocolSubjects__''' This structure contains the metadata for the anatomy files of every subject. This metadata is organized in one structure per subject and has the following fields:
* '''Name''': Name of the subject. Two special subjects have specific names:
* @default_subject: Default anatomy subject.
* Group_analysis: Subject for statistical results or projections to a common space.
* '''Comments''': Additional information about the subject set manually by the user.
* '''FileName''': Path to the brainstormsubject.mat file where the subject specific metadata is stored in case the protocol's metadata file is corrupted or needs to be reloaded.
* '''DateOfAcquisition''': Unused, legacy field.
* '''Anatomy''': List of all MRI files, refer to `db_template(`'''`'Anatomy'`'''`)`.
* '''Surface''': List of all surface and fibers files, refer to `db_template(`'''`'Surface'`'''`)`.
* '''iAnatomy''': Index of the MRI file in the ''Anatomy ''list that is currently selected for this subject. This will show in green in the database viewer. Any operation related to this subject that requires an MRI file will load this selected one.
* '''iScalp''': Index of the selected scalp surface file from the ''Surface'' list.
* '''iCortex''': Index of the selected cortex surface file from the ''Surface ''list.
* '''iInnerSkull''': Index of the selected inner skull surface file from the ''Surface ''list.
* '''iOuterSkull''': Index of the selected outer skull surface file from the ''Surface ''list.
* '''iFibers''': Index of the selected fibers file from the ''Surface ''list.
* '''iFEM''': Index of the selected surface file used for FEM from the ''Surface ''list.
* '''iOther''': Index of the selected sub-structure surface file from the ''Surface ''list.
* '''UseDefaultAnat''': Whether this subject uses the default anatomy files rather than its own.
* '''UseDefaultChannel''': Whether this subject shares its channel file across all runs or uses the global channel file.
'''__ProtocolStudies__'''
This structure contains the metadata for the functional data files of every subject. This metadata is organized in one structure per study (also known as first level folder, or condition) and has the following fields:
* '''Name''': Name of the study. If it starts with "@raw", this is a special study that can only contain unprocessed files and is created when you create a new link to a raw file. Three special studies exist:
* @default_study: Common files shared either by the whole protocol (see @default_subject) or the whole subject. Typically channel files.
* @inter: Inter-subject statistical results.
* @intra: Intra-subject statistical results.
* '''FileName''': Path to the brainstormstudy.mat file where the study specific metadata is saved should the protocol metadata file get corrupted or need to be reloaded.
* '''DateOfStudy''': If this is a raw study, this is the acquisition date of the raw file. Otherwise, this is the date this study was created by the user.
* '''BrainstormSubject''': Path to the file of the metadata of the subject this study belongs to.
* '''Condition''': Either a duplicate of the ''Name ''field, or a cell array representing a sub-folder structure if this is a "sub-study", e.g. Subject01/ParentStudy/ChildStudy -> Condition = {"ParentStudy", "ChildStudy"}. This is not fully implemented.
* '''Channel''': List of channel files, refer to `db_template(`'''`'Channel'`'''`)`.
* '''iChannel''': Index of the currently selected channel in ''Channel ''list.
* '''Data''': List of all data files, either raw or imported recordings/trials. Refer to `db_template(`'''`'Data'`'''`)` and the DataType field.
* '''HeadModel''': List of all head model files, refer to `db_template(`'''`'HeadModel'`'''`)` and `db_template(`'''`'HeadModelType'`'''`)`.
* '''iHeadModel''': Index of the currently selected head model in HeadModel list.
* '''Result''': List of all source files, including links, kernels and full sources, refer to `db_template(`'''`'Results'`'''`)`. The filename of links start with "link" and actually contain two files: the kernel and the data file.
* '''Stat''': List of all statistics files, refer to `db_template(`'''`'Stat'`'''`)` and the Type field.
* '''Image''': List of all image or video files, refer to `db_template(`'''`'Image'`'''`)`.
* '''NoiseCov''': Noise and data covariance files. The first file in the list is always the noise covariance, and the second file in the list is always the data covariance (if available). Refer to `db_template(`'''`'NoiseCov'`'''`)`.
* '''Dipoles''': List of dipole files, refer to `db_template(`'''`'Dipoles'`'''`)`.
* '''Timefreq''': List of all time-frequency files, refer to `db_template(`'''`'Timefreq'`'''`)`.
* '''Matrix''': List of all matrix files, refer to `db_template(`'''`'Matrix'`'''`)`.
=== SQLite structure: protocol.db ===
Starting from Brainstorm version 4.0 the protocol metadata is no longer available in memory as Matlab structures. Instead, it is saved in a [[https://en.wikipedia.org/wiki/SQLite|SQLite relational database]]. This means that when the protocol is first created, an empty SQLite database is created and is populated with the metadata files in the protocol directory. Whenever metadata needs to be retrieved, it is queried from the SQLite database. While a bit slower, this gives us the ability to support concurrency which is essential for sharing protocols across contributors and running multiple jobs on the same protocol at the same time. More information [[https://www.sqlite.org/lockingv3.html|here]].
SQLite implements most of the [[https://en.wikipedia.org/wiki/SQL_syntax|SQL standard syntax]], and stores all database information in a single file, the `protocol.db` file. Since we no longer need the metadata stored in Matlab structures, the `protocol.mat` is no longer used. For debugging purposes, you can explore the `protocol.db` file using the application [[https://sqlitebrowser.org/|DB Browser for SQLite]]. Concurrency is handled with regards to SQL access (i.e. if a Brainstorm instance is currently inserting metadata, another instance will not be able to access the SQLite database file until the modification is complete).
To avoid having long wait times, both due to concurrency or remote databases (when SQL queries need to be sent through a network), it is important to optimize as much as possible the database calls such that (1) active connections are closed as quickly as possible and (2) queries only return information that is absolutely required. Therefore, when writing SQL queries, make sure to only include columns that are needed (avoid `SELECT *` when possible) and return only rows that are needed (for example, if you only need the first row, add `LIMIT 1` at the end of your query).
Despite the fact that the protocol metadata is now stored in SQLite database, the global variable `GlobalData.DataBase` is still available in Brainstorms and its content is fairly similar to the one described [[#Matlab_structure:_protocol.mat|above]], but with a few notable differences:
* `GlobalData.DataBase.`'''`ProtocolSubjects`''' and `.`'''`ProtocolStudies`''': These variables no longer exist as this information is stored in the SQLite database instead of in memory.
* `GlobalData.DataBase.`'''`ProtocolInfo`''': Have three additional fields:
* '''iSubject''': The ID of the subject that is currently active. Only the study used to be saved, but since study metadata is no longer in memory this avoids querying the database to get the subject of this study.
* '''iReadFiles''': Hash map containing all files that have already been "read" (selected). This allows the database to highlight files that are "new" but only until they are interacted with.
* ''' Key''': FunctionalFile.Id
* '''Value''': FunctionalFile.LastModified
* '''Database''': Structure describing the information required to connect to the metadata database, refer to `db_template(`'''`'DatabaseInfo'`'''`)`.
* '''Rdbms''': Relational database management system of the active database (only "sqlite" supported).
* '''Location''': Location of the database. For SQLite: the path to the protocol.db file.
* '''Name''': Name of the database to use (not applicable for SQLite).
* '''Username''': Username to connect to the database (not applicable for SQLite).
* '''Password''': Password to connect to the database (not applicable for SQLite).
'''__SQL Schema__'''
A SQL schema defines all the tables, the columns of every table and the relation between tables in the database. Refer to the `sql_generate_db()` function: the name, type and NOT NULL properties of the tables columns are defined in `db_template()`, and additional properties like foreign keys and primary keys are hardcoded in `sql_generate_db()`.
{{attachment:sqlite_schema.png}}
Below a description of the tables in the SQLite database.
* '''Protocol''': Table containing the metadata for the current protocol. '''It always has a single row!'''
* '''Name''': Name of the protocol.
* '''UseDefaultAnat''': Same as `GlobalData.DataBase.ProtocolInfo`.
* '''UseDefaultChannel''': Same as `GlobalData.DataBase.ProtocolInfo`.
* '''Owner''': The PC username of the person that created this protocol, refer to `bst_get('UserName')`.
* '''CreationTime''': When this protocol was created.
* '''DatabaseVersion''': Version of this database schema. If new fields are added to the database in subsequent versions, old protocols should be detected using this field and their schema should be updated.
* '''Subject''': Table containing the metadata for all subjects. Very similar to `GlobalData.DataBase.`'''`ProtocolSubjects`''' in the [[#Matlab_structure:_protocol.mat|previous metadata structure]], with a few notable differences:
* '''Id''': Unique identifier number for each subject. This is used to link this table with other tables (foreign keys).
* '''iAnatomy''', '''iCortex''', etc.: These indices are now unique AnatomyFile IDs rather than the position in the list of all anatomy/surface files.
* '''Anatomy''', '''Surface''': These lists no longer exist and are represented as entries in the AnatomyFile table instead.
* '''AnatomyFile''': Table containing the metadata for all anatomy files (MRI, surface, fibers, etc), refer to `db_template(`'''`'AnatomyFile'`'''`)`.
* '''Id''': Unique identifier number for each anatomy file. This is used to link this table with other tables (foreign keys).
* '''Subject''': ID of the subject this file belongs to (foreign key).
* '''Type''': Type of anatomy file ("volume", "surface"). Used to split back into Anatomy and Surface lists if needed.
* '''FileName''': Relative path to the anatomy file.
* '''Comment''': Name of the entry in the database.
* '''SubType''': Additional field to identify the subtypes of volumes and surfaces.
* '''Study''': Table containing the metadata for all studies. Very similar to '''ProtocolStudies ''' in the [[#Matlab_structure:_protocol.mat|previous metadata structure]], with a few notable differences:
* '''Id''': Unique identifier number for each study. This is used to link this table with other tables (foreign keys).
* '''Subject''': ID of the subject this study belongs to (foreign key).
* '''iChannel ''' and '''iHeadModel''': These indices are now unique FunctionalFile IDs rather than the position in the list of all Channel/HeadModel files.
* '''Channel ''', '''Data ''', etc: These lists no longer exist and are represented as entries in the FunctionalFile table instead.
* '''FunctionalFile''': Table containing the metadata for all functional files (data, sources, etc), refer to `db_template(`'''`'FunctionalFile'`'''`)`.
* '''Id''': Unique identifier number for each functional file. This is used to link this table with other tables (foreign keys).
* '''Study''': ID of the study this file belongs to (foreign key).
* '''Parent''': The ID of the parent functional file this file belongs to, or NULL if it's directly under the study.
* '''Type''': Type of functional file (data, matrix, result, etc).
* '''FileName''': Path to the functional file.
* '''Comment''': Name of the file in the database.
* '''NumChildren''': Number of child files (i.e. the number of files where ParentFile = this ID). This is used to display the number of files in a list and avoids COUNT queries that might slow down displaying the database.
* '''SubType''': Some file types have sub types, they are defined here. E.g.: data -> raw vs recordings.
* '''ExtraNum '''/ '''ExtraStr1 '''/ '''ExtraStr2''': Additional numerical or string columns used by specific file types. These are defined in `db_get().getFuncFileStruct()`.
* '''LastModified''': [[https://en.wikipedia.org/wiki/Unix_time|Unix time]] timestamp of the last time this file was modified.
* '''Lock''': Table containing all active locks on protocol entities, refer to `db_template(`'''`'Lock'`'''`)`.
* '''Id''': Unique identifier number for this lock.
* '''Subject''': ID of the subject this entity belongs to (foreign key).
* '''Study''': ID of the study this entity belongs to (foreign key), or NULL if the lock is on the whole subject.
* '''File''': ID of the locked functional file (foreign key), or NULL if the lock is on the whole subject or study.
* '''Username''': PC username of the person that acquired this lock, refer to `bst_get(`'''`'UserName'`'''`)`.
* '''Computer''': PC name of the person that acquired this lock, refer to `bst_get(`'''`'ComputerName'`'''`)`.
* '''Time''': UTC Date & time when this lock was acquired.
* '''Operation''': Description of the operation that acquired this lock, typically the process name.
=== SQLite API in Brainstorm ===
This API consists on functions that wrap the functionality of SQL queries to read and write the protocol metadata SQLite database. These functions are divided in two types according to their abstraction level.
==== High level ====
The most important functions to read and write the SQLite database are:
* '''`db_get()`''': Get (read) data from the DB. It presents useful cases. It is similar to `bst_get()`.
* '''`db_set()`''': Set (write) data from the DB. It presents useful cases. It is similar to `bst_set()`.
A simple example of using `db_get` and `db_set`
{{{
% Get all info for default subject with Id = 1
sSubject = db_get('Subject', 1);
% Update the name of this subject.
sSubject.Name = 'J.Doe';;
db_set('Subject', sSubject);
}}}
==== Low level ====
The `db_get` and `db_set` functions rely in lower level functions that interact with the database:
* '''`sql_connect()`''': To get an active connection to the database. It also sets the PRAGMAs that are needed for each connection. Optional argument of type db_template('DatabaseInfo') to define how to connect to the database, otherwise it looks into the active ProtocolInfo.
* '''`sql_close()`'''(sqlConnection): Close an active connection to the database.
* '''`sql_create()`'''(sqlConnection, tables): Generate and executes `SQL CREATE` queries for every table as input. Tables input structure defined in `db_template(`'''`'sqltable'`'''`)`.
* '''`sql_query()`''': This is versatile function that:
* Execute direct queries created by the user.
* Execute "SELECT", "EXIST", "INSERT", "UPDATE", "DELETE" and "COUNT" SQL queries to the SQLite database. For the proper syntax, check the documentation of `sql_query()`.
* Can handle the opening and closing of the database if it's necessary.
Here are some specific SQL query examples:
'''1. Direct query'''
This allows you to write manual SQL queries and execute them directly. The drawback is that it returns a [[https://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html|java.sql.ResultSet]] object so you need to be familiar with this specific syntax, and you need to handle the SQL connection yourself. Example:
{{{
sqlConnection = sql_connect();
resultSet = sql_query(sqlConnection, 'SELECT * FROM Subject');
while resultSet.next()
disp(resultSet.getString('Name'));
end
resultSet.close();
sql_close(sqlConnection);
}}}
'''2. SELECT query'''
This lets you read data from the database. It generates a `SELECT` query, executes it for you, and returns a `db_template()` structure of the proper table. Example:
{{{
% Get all info of subjects that use the default anatomy.
sSubject = sql_query('SELECT', 'Subject', '*', struct('UseDefaultAnat', 1));
% Get the name of the first subject.
sSubject = sql_query('SELECT', 'Subject', 'Name', [], 'LIMIT 1');
% Get the name and ID of all subjects.
sSubject = sql_query('SELECT', 'Subject', {'Id', 'Name'});
}}}
'''3. UPDATE query'''
This lets you modify rows in the database. It generates an `UPDATE` query, executes it for you, and returns the number of rows modified. Example:
{{{
% Makes subject #1 use the default anatomy
numRows = sql_query('UPDATE', 'Subject', struct('UseDefaultAnat', 1), struct('Id', 1));
% Makes all subjects use the default anatomy
numRows = sql_query('UPDATE', 'Subject', struct('UseDefaultAnat', 1));
}}}
'''4. INSERT query'''
This lets you insert rows into the database. It generates an `INSERT` query, executes it for you, and returns the generated `ID` of the inserted rows. It is strongly recommended to use `db_template()` to define the values to insert. Example:
{{{
sSubject = db_template('Subject');
sSubject.Name = 'MyNewSubject';
SubjectId = sql_query('INSERT', 'Subject', sSubject);
}}}
'''5. DELETE query'''
This lets you delete rows from the database. It generates a `DELETE` query, executes it for you, and returns the number of rows deleted. '''Be careful, this cannot be undone!''' Example:
{{{
% Delete Subject #1
numRows = sql_query('DELETE', 'Subject', struct('Id', 1));
% Delete all subjects
numRows = sql_query('DELETE', 'Subject');
}}}