.backtick {font-size: 16px;})>>
= Brainstorm's Database Structure =
''Authors: Martin Cousineau, Raymundo Cassani''
Understanding how data is organized and managed in Brainstorm is paramount for [[Tutorials/Scripting|scripting your own pipelines]], and [[Tutorials/TutUserProcess|adding new processes]] in the software.
Conceptually, '''Brainstorm database''' consists in a collection of '''Protocols'''. In turn, a '''Protocol''' consists in a set of '''Subjets''' from whom '''Anatomical data''' (MRI) and recordings (MEG, EEG, fNIRS, SEEG, etc.) were acquired. Lastly, for each subject, the sensor locations, the recordings, the computed sources and all the data derived from their processing receive the name of '''Functional data'''.
This page describes the structure of the Brainstorm database on the hard drive, and the metadata that is necessary to manage it.
<>
== On the hard drive ==
Brainstorm database resides on the hard drive at the [[Tutorials/CreateProtocol#Brainstorm_folders|database directory]] (`brainstorm_db/`). The database consists in an organized collection of '''protocols''', where each protocol each with its own directory, which has a strict hierarchy, explained in the bullet list below:
{{attachment:tree_db.png}}
In anatomy data files and functional data files, the prefix in the filename indicates its content. For example, an anatomy file named `surface_*.mat` contains cortex surface information. Data in these files is stored as a Matlab structure, according to the [[Tutorials/Scripting#File_structures|different file types]].
The content of the structure of each `type_*.mat` files is defined by the function `db_template()`, using `type` plus the `mat` suffix as argument. When the data contained in the `type_*.mat` file is loaded in memory in Brainstorm, it follows the Matlab structure given by `db_template()` using `loaded` plus the `type`. Lastly, the metadata related to the file `type_*.mat` is kept in memory as a Matlab structure given by `db_template()` using `type` as argument. For example:
* `db_template(`'''`surfacemat`'''`)`: Structure saved on the hard drive as `surface_*.mat` file.
* `db_template(`'''`loadedsurface`'''`)`: Structure saved in memory containing all the data loaded from the `surface_*.mat` file.
* `db_template(`'''`surface`'''`)`: Structure saved in memory containing all the metadata for the `surface_*.mat` file, loaded from the `protocol.db` or `protocol.mat` files rather than the `surface_*.mat` file itself.
The [[#Protocol_metadata|protocol metadata]] files (`protocol.mat` and `protocol.db`) contain all the metadata that describe the relationship among the data files in the protocol directory. In case that any of the protocol metadata files is accidentally deleted or gets corrupted, the protocol metadata can be regenerated from all the `brainstormsubject.mat` and `brainstormstudy.mat` in the protocol directory as these files contain the metadata for the anatomy and functional data files for each subject in the protocol directory.
== Protocol metadata ==
In Brainstorm versions <= 3.x, the protocol metadata was stored as a Matlab structure in the `protocol.mat` file, which content was kept in memory. Starting from version 4.0, the protocol metadata is stored as a SQLite database which is frequently queried, i.e., the protocol metadata is not kept in memory. Below a description of these two protocol metadata approaches.
=== Matlab structure: protocol.mat ===
Once a protocol is loaded, Brainstorm has Matlab structures in memory that contain the metadata of all the files in this protocol. These are located in the `GlobalData.DataBase` global variable, such that it is easily accessible in any function of the software. The structure of this variable is defined by `db_template(`'''`GlobalData`'''`)`. Let's walk through some of the important variables in this structure:
* `GlobalData.DataBase.`'''`ProtocolInfo`''': Basic information about the protocols in the Brainstorm database folder, refer to `db_template(`'''`ProtocolInfo`'''`)`. Fields:
* '''Comment''': Name of the protocol
* '''STUDIES''': Absolute path to the folder containing the functional data (/data/).
* '''SUBJECTS''': Absolute path to the folder containing the anatomical data (/anat/).
* '''iStudy''': Index of the currently selected study, if applicable.
* '''UseDefaultAnat''': Protocol setting, whether new subjects should use the default anatomy by default.
* '''UseDefaultChannel''': Protocol setting, whether new subjects should share channel files by default.
* `GlobalData.DataBase.`'''`ProtocolSubjects`''': Metadata of all subjects and their anatomical files of each loaded protocol, refer to `db_template(`'''`ProtocolSubjects`'''`)`. Fields:
* '''Subject''': Metadata of all regular subjects, refer to db_template('subject').
* '''DefaultSubject''': Special subject used to store default anatomy. Same structure as regular subjects.
* `GlobalData.DataBase.`'''`ProtocolStudies`''': Metadata of all studies and their functional files of each loaded protocol, refer to `db_template(`'''`ProtocolStudies`'''`)`. Fields:
* '''Study''': Metadata of all regular studies, refer to db_template('study').
* '''DefaultStudy''': Special study used to store Common files, e.g. a global channel file. Same structure as regular studies.
* '''AnalysisStudy''': Special study to store Inter-subject files, e.g. stats results. Same structure as regular studies.
* `GlobalData.DataBase.`'''`isProtocolLoaded`''': List of bool, whether the metadata of the protocol is already in memory or not. This is useful when you're running Brainstorm from the command line.
* `GlobalData.DataBase.`'''`isProtocolModified`''': List of bool, whether the metadata of the protocol was modified and therefore the `protocol.mat` file needs to be overwritten when Brainstorm closes.
* `GlobalData.DataBase.`'''`iProtocol`''': The index of the currently selected protocol.
* `GlobalData.DataBase.`'''`BrainstormDbDir`''': Absolute path to the brainstorm database directory.
* `GlobalData.DataBase.`'''`DbVersion`''': Version number of the currently loaded protocol. Used to update the protocol structure when new fields are added to a specific file for example. Refer to db_update().
* `GlobalData.DataBase.`'''`isReadOnly`''': Flag whether the user has write access to the protocol database directory. Used in tree_callbacks() to remove GUI option menus that would modify files.
* `GlobalData.DataBase.`'''`LastSavedTime`''': Timestamp of the last time the protocol.mat file was overwritten. There used to be an automatic save when it had been a few hours.
* `GlobalData.DataBase.`'''`Searches`''': Information about the search tabs of the database. Fields:
* '''iActive''': Index of the currently selected active search tab (0 if none).
* '''Active''': List of all active search tabs, with the first one being the whole database.
* '''Name''': Name of the tab, or empty if whole database
* '''SearchNode''': Tree structure representing the search query
* '''AnatRootNode'''/'''FuncRootNode''': The root node of the database tree, to avoid rebuilding the Java tree when you switch between tabs.
* '''AnatSelNode'''/'''FuncSelNode''': The currently selected node, to remember the last selected object before you switched between tabs.
* '''All''': If searches were saved between Brainstorm sessions, they are stored here.
* '''Name''': Name of the search
* '''Search''': Tree structure representing the search query
The two main variables of interest are '''`ProtocolSubjects`''' and '''`ProtocolStudies`'''. Indeed, together they contain the metadata of every single file in the protocol! Let's dive a bit deeper into both of them.
'''__ProtocolSubjects__''' This structure contains the metadata for the anatomy files of every subject. This metadata is organized in one structure per subject and has the following fields:
* '''Name''': Name of the subject. Two special subjects have specific names:
* @default_subject: Default anatomy subject.
* Group_analysis: Subject for statistical results or projections to a common space.
* '''Comments''': Additional information about the subject set manually by the user.
* '''FileName''': Path to the brainstormsubject.mat file where the subject specific metadata is stored in case the protocol's metadata file is corrupted or needs to be reloaded.
* '''DateOfAcquisition''': Unused, legacy field.
* '''Anatomy''': List of all MRI files, refer to db_template('Anatomy').
* '''Surface''': List of all surface and fibers files, refer to db_template('Surface').
* '''iAnatomy''': Index of the MRI file in the ''Anatomy ''list that is currently selected for this subject. This will show in green in the database viewer. Any operation related to this subject that requires an MRI file will load this selected one.
* '''iScalp''': Index of the selected scalp surface file from the ''Surface'' list.
* '''iCortex''': Index of the selected cortex surface file from the ''Surface ''list.
* '''iInnerSkull''': Index of the selected inner skull surface file from the ''Surface ''list.
* '''iOuterSkull''': Index of the selected outer skull surface file from the ''Surface ''list.
* '''iFibers''': Index of the selected fibers file from the ''Surface ''list.
* '''iFEM''': Index of the selected surface file used for FEM from the ''Surface ''list.
* '''iOther''': Index of the selected sub-structure surface file from the ''Surface ''list.
* '''UseDefaultAnat''': Whether this subject uses the default anatomy files rather than its own.
* '''UseDefaultChannel''': Whether this subject shares its channel file across all runs or uses the global channel file.
'''__ProtocolStudies__'''
This structure contains the metadata for the functional data files of every subject. This metadata is organized in one structure per study (also known as first level folder, or condition) and has the following fields:
* '''Name''': Name of the study. If it starts with "@raw", this is a special study that can only contain unprocessed files and is created when you create a new link to a raw file. Three special studies exist:
* @default_study: Common files shared either by the whole protocol (see @default_subject) or the whole subject. Typically channel files.
* @inter: Inter-subject statistical results.
* @intra: Intra-subject statistical results.
* '''FileName''': Path to the brainstormstudy.mat file where the study specific metadata is saved should the protocol metadata file get corrupted or need to be reloaded.
* '''DateOfStudy''': If this is a raw study, this is the acquisition date of the raw file. Otherwise, this is the date this study was created by the user.
* '''BrainstormSubject''': Path to the file of the metadata of the subject this study belongs to.
* '''Condition''': Either a duplicate of the ''Name ''field, or a cell array representing a sub-folder structure if this is a "sub-study", e.g. Subject01/ParentStudy/ChildStudy -> Condition = {"ParentStudy", "ChildStudy"}. This is not fully implemented.
* '''Channel''': List of channel files, refer to db_template('Channel).
* '''iChannel''': Index of the currently selected channel in ''Channel ''list.
* '''Data''': List of all data files, either raw or imported recordings/trials. Refer to db_template('Data') and the DataType field.
* '''HeadModel''': List of all head model files, refer to db_template('HeadModel') and the HeadModelType field.
* '''iHeadModel''': Index of the currently selected head model in HeadModel list.
* '''Result''': List of all source files, including links, kernels and full sources, refer to db_template('Results'). The filename of links start with "link" and actually contain two files: the kernel and the data file.
* '''Stat''': List of all statistics files, refer to db_template('Stat') and the Type field.
* '''Image''': List of all image or video files, refer to db_template('Image').
* '''NoiseCov''': Noise and data covariance files. The first file in the list is always the noise covariance, and the second file in the list is always the data covariance (if available). Refer to db_template('NoiseCov').
* '''Dipoles''': List of dipole files, refer to db_template('Dipoles').
* '''Timefreq''': List of all time-frequency files, refer to db_template('Timefreq').
* '''Matrix''': List of all matrix files, refer to db_template('Data').
=== SQLite structure: protocol.db ===
Starting from version 4.0 the protocol metadata is no longer available in memory as Matlab structures. Instead, it is saved in a SQLite relational database. This means that when the protocol is first created, an empty SQLite database is created and is populated with the metadata files in the protocol directory. Whenever metadata needs to be retrieved, it is queried from the SQLite database. While a bit slower, this gives us the ability to support concurrency which is essential for sharing protocols across contributors and running multiple jobs on the same protocol at the same time.
SQLite stores all database information in a single file, the `protocol.db` file. Since we no longer need the metadata stored in Matlab structures, the `protocol.mat` is no longer used. For debugging purposes, you can explore the `protocol.db` file using the application [[https://sqlitebrowser.org/|DB Browser for SQLite]]. Concurrency is handled with regards to SQL access (i.e. if a Brainstorm instance is currently inserting metadata, another instance will not be able to access the SQLite database file until the modification is complete).
To avoid having long wait times, both due to concurrency or remote databases (when SQLite queries need to be sent through a network), it is important to optimize as much as possible the database calls such that (1) active connections are closed as quickly as possible and (2) queries only return information that is absolutely required. Therefore, when writing SQL queries, make sure to only include columns that are needed (avoid SELECT * when possible) and return only rows that are needed (for example, if you only need the first row, add LIMIT 1 at the end of your query).
Despite the fact taht hte metadata is now stored in SQLite database, the global variable `GlobalData.DataBase` is availabe in Brainstorms and its content fairly similar to the one described [[Linko|above]], with a few notable differences:
* `GlobalData.DataBase.`'''`ProtocolSubjects`''' and `.`'''`ProtocolStudies`''': These variables no longer exist as this information is stored in the SQLite database instead of in memory.
* `GlobalData.DataBase.`'''`ProtocolInfo`''': Have three additional fields:
* '''iSubject''': The ID of the subject that is currently active. Only the study used to be saved, but since study metadata is no longer in memory this avoids querying the database to get the subject of this study.
* '''iReadFiles''': Hash map containing all files that have already been "read" (selected). This allows the database to highlight files that are "new" but only until they are interacted with.
* ''' Key''': FunctionalFile.Id
* '''Value''': FunctionalFile.LastModified
* '''Database''': Structure describing the information required to connect to the metadata database, refer to `db_template(`'''`DatabaseInfo`'''`)`.
* '''Rdbms''': Relational database management system of the active database (only "sqlite" supported).
* '''Location''': Location of the database. For SQLite: the path to the protocol.db file.
* '''Name''': Name of the database to use (not applicable for SQLite).
* '''Username''': Username to connect to the database (not applicable for SQLite).
* '''Password''': Password to connect to the database (not applicable for SQLite).
'''__SQL Schema__'''
{{attachment:sqlite_schema.png}}
A schema defines all tables, the columns of every table and the relation between tables in the database. Refer to the sql_generate_db() function: the name & type of the tables columns are defined in db_template(), and additional properties like foreign keys, primary keys and NOT NULL are hardcoded in sql_generate_db().
* '''Protocol''': Table containing the metadata for this protocol. It always has a single row!
* '''Name''': Name of the protocol.
* '''UseDefaultAnat''': Same as ProtocolInfo above.
* '''UseDefaultChannel''': Same as ProtocolInfo above.
* '''Owner''': The PC username of the person that created this protocol, refer to bst_get('UserName').
* '''CreationTime''': When this protocol was created.
* '''DatabaseVersion''': Version of this database schema. If new fields are added to the database in subsequent versions, old protocols should be detected using this field and their schema should be updated.
* '''Subject''': Table containing the metadata for all subjects. Very similar to '''ProtocolSubjects '''above, with a few notable differences:
* '''Id''': Unique identifier number for each subject. This is used to link this table with other tables (foreign keys).
* '''iAnatomy '''/ '''iCortex '''/ etc: These indices are now unique AnatomyFile IDs rather than the position in the list of all anatomy/surface files.
* '''Anatomy '''/ '''Surface''': These lists no longer exist and are represented as entries in the AnatomyFile table instead.
* '''AnatomyFile''': Table containing the metadata for all anatomy files (MRI, surface, fibers, etc), refer to db_template('AnatomyFile').
* '''Id''': Unique identifier number for each anatomy file. This is used to link this table with other tables (foreign keys).
* '''Subject''': ID of the subject this file belongs to (foreign key).
* '''Type''': Type of anatomy file ("anatomy" = MRI, "surface"). Used to split back into Anatomy and Surface lists if needed.
* '''FileName''': Path to the anatomy file.
* '''Name''': Name of the file in the database (also known as "Comment").
* '''SurfaceType''': Additional field for surface files, refer to db_template('Surface').
* '''Study''': Table containing the metadata for all studies. Very similar to '''ProtocolStudies '''above, with a few notable differences:
* '''Id''': Unique identifier number for each study. This is used to link this table with other tables (foreign keys).
* '''Subject''': ID of the subject this study belongs to (foreign key).
* '''iChannel '''/ '''iHeadModel''': These indices are now unique FunctionalFile IDs rather than the position in the list of all Channel/HeadModel files.
* '''Channel '''/ '''Data '''/ etc: These lists no longer exist and are represented as entries in the FunctionalFile table instead.
* '''FunctionalFile''': Table containing the metadata for all functional files (data, sources, etc), refer to db_template('FunctionalFile').
* '''Id''': Unique identifier number for each functional file. This is used to link this table with other tables (foreign keys).
* '''Study''': ID of the study this file belongs to (foreign key).
* '''ParentFile''': The ID of the parent functional file this file belongs to, or NULL if it's directly under the study.
* '''Type''': Type of functional file (data, matrix, result, etc).
* '''FileName''': Path to the functional file.
* '''Name''': Name of the file in the database (also known as "Comment").
* '''NumChildren''': Number of child files (i.e. the number of files where ParentFile = this ID). This is used to display the number of files in a list and avoids COUNT queries that might slow down displaying the database.
* '''SubType''': Some file types have sub types, they are defined here. E.g.: data -> raw vs recordings.
* '''ExtraNum '''/ '''ExtraStr1 '''/ '''ExtraStr2''': Additional numerical or string columns used by specific file types. These are defined in db_get().getFuncFileStruct().
* '''LastModified''': Unix UTC timestamp of the last time this file was modified.
* '''Lock''': Table containing all active locks on protocol entities, refer to db_template('Lock').
* '''Id''': Unique identifier number for this lock.
* '''Subject''': ID of the subject this entity belongs to (foreign key).
* '''Study''': ID of the study this entity belongs to (foreign key), or NULL if the lock is on the whole subject.
* '''File''': ID of the locked functional file (foreign key), or NULL if the lock is on the whole subject or study.
* '''Username''': PC username of the person that acquired this lock, refer to bst_get('UserName').
* '''Computer''': PC name of the person that acquired this lock, refer to bst_get('ComputerName').
* '''Time''': UTC Date & time when this lock was acquired.
* '''Operation''': Description of the operation that acquired this lock, typically the process name.
=== SQLite API in Brainstorm ===
* '''sql_connect''': To get an active connection to the database. Optional argument of type db_template('DatabaseInfo') to define how to connect to the database, otherwise it looks into the active ProtocolInfo.
* '''sql_close'''(sqlConnection): Closes an active connection to the database.
* '''sql_create'''(sqlConnection, tables): Generates and executes SQL CREATE queries for every table as input. Tables input structure defined in db_template('sqltable').
* '''sql_query''': Executes a query to the SQL database.
* '''SqlConnection''': An active connection to the database, or empty if you want the function to handle it for you.
* '''Type''': "select", "insert", "update" or "delete".
* '''Table''': The SQL table to query.
* '''Data''': The data to select, either a single column name or a list of multiple columns. '*' or [] to get all columns.
* '''Condition''': A structure where each non-empty field will be a condition to the query (WHERE field).
* '''AddQuery''': Additional text to add at the end of a query before execution (e.g. LIMIT 1).
Here are some specific query examples:
'''1. Direct query'''
This allows you to write manual SQL queries and execute them directly. The drawback is that it returns a [[https://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html|java.sql.ResultSet]] object so you need to be familiar with this specific syntax, and you need to handle the SQL connection yourself. Example:
{{{
sqlConnection = sql_connect();
resultSet = sql_query(sqlConnection, 'SELECT * FROM Subject');
while resultSet.next()
disp(resultSet.getString('Name'));
end
resultSet.close();
sql_close(sqlConnection);
}}}
'''2. Select query'''
This lets you read data from the database. It generates a SELECT query, executes it for you, and returns a db_template() structure of the proper table. Example:
{{{
% Get all info of subjects that use the default anatomy.
sSubject = sql_query([], 'select', 'subject', '*', struct('UseDefaultAnat', 1));
% Get the name of the first subject.
sSubject = sql_query([], 'select', 'subject', 'Name', [], 'LIMIT 1');
% Get the name and ID of all subjects.
sSubject = sql_query([], 'select', 'subject', {'Id', 'Name'});
}}}
'''3. Update query'''
This lets you modify rows in the database. It generates an UPDATE query, executes it for you, and returns the number of rows modified. Example:
{{{
% Makes subject #1 use the default anatomy
numRows = sql_query([], 'update', 'subject', struct('UseDefaultAnat', 1), struct('Id', 1));
% Makes all subjects use the default anatomy
numRows = sql_query([], 'update', 'subject', struct('UseDefaultAnat', 1));
}}}
'''4. Insert query'''
This lets you insert rows into the database. It generates an INSERT query, executes it for you, and returns the generated ID of the inserted rows. We strongly recommend you use db_template() to define the values to insert. Example:
{{{
sSubject = db_template('Subject');
sSubject.Name = 'MyNewSubject';
SubjectId = sql_query([], 'insert', 'subject', sSubject);
}}}
'''5. Delete query'''
This lets you delete rows from the database. It generates a DELETE query, executes it for you, and returns the number of rows deleted. Be careful, this cannot be undone! Example:
{{{
% Delete Subject #1
numRows = sql_query([], 'delete', 'subject', struct('Id', 1));
% Delete all subjects
numRows = sql_query([], 'delete', 'subject');
}}}