Size: 24819
Comment:
|
Size: 24820
Comment:
|
Deletions are marked like this. | Additions are marked like this. |
Line 6: | Line 6: |
<<TableOfContents(2,2)>> | <<TableOfContents(3,2)>> |
Line 252: | Line 252: |
sSubject = sql_query([], 'select', 'subject', {'Id', 'Name'}; | sSubject = sql_query([], 'select', 'subject', {'Id', 'Name'}); |
Brainstorm's Database Structure
Authors: Martin Cousineau, Raymundo Cassani
Understanding how data is organized and managed in Brainstorm is paramount for scripting your own pipelines, and adding new processes in the software. This page describes the structure of the Brainstorm database, and the metadata that is necessary to manage it.
Contents
On the hard drive
Brainstorm database resides on the hard drive at the database directory (brainstorm_db/). The database consists in an organized collection of protocols, where each protocol each with its own directory, which has a strict hierarchy, explained in the bullet list below:
anat: This is where Anatomy files (e.g. MRI, surfaces) and subject's metadata are stored.
One folder per subject
brainstormsubject.mat: Metadata of the subject, see db_template('subjectmat')
subjectimage_*.mat: MRI files, see db_template('mrimat')
tess_*.mat: Surface files, see db_template('surfacemat')
data: This is where Functional files (e.g. channels, recordings) and study's metadata are stored.
protocol.mat: All metadata of this protocol, as Matlab structs. See next section for details. Refer to variable ProtocolMat of function db_save().
protocol.db: All metadata of this protocol, as an SQLite database. See next section for details. Refer to sql_generate_db().
One folder per subject
One folder per study
brainstormstudy.mat: Metadata of the study, see db_template('studymat')
channel_*.mat: Channel files, see db_template('channelmat')
data_*.mat: Data (recording) files, see db_template('datamat')
- Etc.
The protocol metadata files (protocol.mat and protocol.db) contain all the metadata that is required to manage the protocol data. In case that any of these files is accidentally deleted, or gets corrupted, the protocol metadata can regenerated from all the brainstormsubject.mat and brainstormstudy files in the protocol directory.
All required metadata should always be saved on the hard drive outside of the protocol.mat / protocol.db files, such that if they get corrupted, deleted or one does a fresh reload of the database, there is no loss of information. This is why things like: which cortex file is selected, or which trials are marked as bad are also saved in separate files (brainstormsubject.mat and brainstormstudy.mat).
The filename of each file should always clearly indicate the basic type of the file, hence the required prefixes (e.g. data_*.mat).
For the other anatomy files and other functional data files, the prefix in the filename indicates its content. For example, a file surface_*.mat contains cortex surface information. The data is stored as a Matlab structure, according to the different file types.
The content of the structure of each .mat files is defined by the function db_template(). The mat suffix is used to differentiate from the in-memory metadata structure and the in-momery data structure of each type. For example:
db_template('surfacemat'): Structure saved on the hard drive as a surface_*.mat file.
db_template('loadedsurface'): Structure saved in memory containing all the data of a loaded surface_*.mat file.
db_template('surface'): Structure saved in memory containing all the metadata of a surface_*.mat file, loaded from the protocol.db / protocol.mat file rather than the surface_*.mat file itself.
Terms that need definition
Brainstorm database: Collection of Protocols
Protocol:
Anatomy data:
Functional data:
Subject:
Study:
Dataset:
Folder:
Sub-folder:
Protocol metadata
A brief paragraph describing why there are two versions, the main differences, and when is the user expected to see one or the other.
Matlab structure: protocol.mat
Once a protocol is loaded, Brainstorm has Matlab structures in memory that contain the metadata of all the files in this protocol. These are located in the GlobalData.DataBase global variable, such that it is easily accessible in any function of the software. It is defined db_template('GlobalData'). Let's walk through some of the important variables in this structure:
GlobalData.DataBase.ProtocolInfo: Basic information about the protocols in the Brainstorm database folder, refer to db_template('ProtocolInfo').
Comment: Name of the protocol
STUDIES: Absolute path to the folder containing the functional data (/data/).
SUBJECTS: Absolute path to the folder containing the anatomical data (/anat/).
iStudy: Currently selected study, if applicable.
UseDefaultAnat: Protocol setting, whether new subjects should use the default anatomy by default.
UseDefaultChannel: Protocol setting, whether new subjects should share channel files by default.
GlobalData.DataBase.ProtocolSubjects: Metadata of all subjects and their anatomical files of each loaded protocol, refer to db_template('ProtocolSubjects').
Subject: Metadata of all regular subjects, refer to db_template('subject').
DefaultSubject: Special subject used to store default anatomy. Same structure as regular subjects.
GlobalData.DataBase.ProtocolStudies: Metadata of all studies and their functional files of each loaded protocol, refer to db_template('ProtocolStudies').
Study: Metadata of all regular studies, refer to db_template('study').
DefaultStudy: Special study used to store Common files, e.g. a global channel file. Same structure as regular studies.
AnalysisStudy: Special study to store Inter-subject files, e.g. stats results. Same structure as regular studies.
GlobalData.DataBase.isProtocolLoaded: List of bool, whether the metadata of the protocol is already in memory or not. This is useful when you're running Brainstorm from the command line.
GlobalData.DataBase.isProtocolModified: List of bool, whether the metadata of the protocol was modified and therefore the protocol.mat needs to be overwritten when Brainstorm closes.
GlobalData.DataBase.iProtocol: The index of the currently selected protocol.
GlobalData.DataBase.BrainstormDbDir: Absolute path to the brainstorm database directory.
GlobalData.DataBase.DbVersion: Version number of the currently loaded protocol. Used to update the protocol structure when new fields are added to a specific file for example. Refer to db_update().
GlobalData.DataBase.isReadOnly: Flag whether the user has write access to the protocol database directory. Used in tree_callbacks() to remove GUI option menus that would modify files.
GlobalData.DataBase.LastSavedTime: Timestamp of the last time the protocol.mat file was overwritten. There used to be an automatic save when it had been a few hours.
GlobalData.DataBase.Searches: Information about the search tabs of the database:
iActive: Index of the currently selected active search tab (0 if none).
Active: List of all active search tabs, with the first one being the whole database.
Name: Name of the tab, or empty if whole database
SearchNode: Tree structure representing the search query
AnatRootNode/FuncRootNode: The root node of the database tree, to avoid rebuilding the Java tree when you switch between tabs.
AnatSelNode/FuncSelNode: The currently selected node, to remember the last selected object before you switched between tabs.
All: If searches were saved between Brainstorm sessions, they are stored here.
Name: Name of the search
Search: Tree structure representing the search query
The two main variables of interest are ProtocolSubjects and ProtocolStudies. Indeed, together they contain the metadata of every single file in the protocol! Let's dive a bit deeper into both of them.
ProtocolSubjects
This structure contains the metadata of every subjects and anatomy files.
Name: Name of the subject. Two special subjects have specific names:
- @default_subject: Default anatomy subject.
- Group_analysis: Subject for statistical results or projections to a common space.
Comments: Additional information about the subject set manually by the user.
FileName: Path to the brainstormsubject.mat file where the subject specific metadata is stored in case the protocol's metadata file is corrupted or needs to be reloaded.
DateOfAcquisition: Unused, legacy field.
Anatomy: List of all MRI files, refer to db_template('Anatomy').
Surface: List of all surface and fibers files, refer to db_template('Surface').
iAnatomy: Index of the MRI file in the Anatomy list that is currently selected for this subject. This will show in green in the database viewer. Any operation related to this subject that requires an MRI file will load this selected one.
iScalp: Index of the selected scalp surface file from the Surface list.
iCortex: Index of the selected cortex surface file from the Surface list.
iInnerSkull: Index of the selected inner skull surface file from the Surface list.
iOuterSkull: Index of the selected outer skull surface file from the Surface list.
iFibers: Index of the selected fibers file from the Surface list.
iFEM: Index of the selected surface file used for FEM from the Surface list.
iOther: Index of the selected sub-structure surface file from the Surface list.
UseDefaultAnat: Whether this subject uses the default anatomy files rather than its own.
UseDefaultChannel: Whether this subject shares its channel file across all runs or uses the global channel file.
ProtocolStudies
This structure contains the metadata of every studies (also known as first level folders, or conditions) and every functional files inside each study.
Name: Name of the study. If it starts with "@raw", this is a special study that can only contain unprocessed files and is created when you create a new link to a raw file. Three special studies exist:
- @default_study: Common files shared either by the whole protocol (see @default_subject) or the whole subject. Typically channel files.
- @inter: Inter-subject statistical results.
- @intra: Intra-subject statistical results.
FileName: Path to the brainstormstudy.mat file where the study specific metadata is saved should the protocol metadata file get corrupted or need to be reloaded.
DateOfStudy: If this is a raw study, this is the acquisition date of the raw file. Otherwise, this is the date this study was created by the user.
BrainstormSubject: Path to the file of the metadata of the subject this study belongs to.
Condition: Either a duplicate of the Name field, or a cell array representing a sub-folder structure if this is a "sub-study", e.g. Subject01/ParentStudy/ChildStudy -> Condition = {"ParentStudy", "ChildStudy"}. This is not fully implemented.
Channel: List of channel files, refer to db_template('Channel).
iChannel: Index of the currently selected channel in Channel list.
Data: List of all data files, either raw or imported recordings/trials. Refer to db_template('Data') and the DataType field.
HeadModel: List of all head model files, refer to db_template('HeadModel') and the HeadModelType field.
iHeadModel: Index of the currently selected head model in HeadModel list.
Result: List of all source files, including links, kernels and full sources, refer to db_template('Results'). The filename of links start with "link" and actually contain two files: the kernel and the data file.
Stat: List of all statistics files, refer to db_template('Stat') and the Type field.
Image: List of all image or video files, refer to db_template('Image').
NoiseCov: Noise and data covariance files. The first file in the list is always the noise covariance, and the second file in the list is always the data covariance (if available). Refer to db_template('NoiseCov').
Dipoles: List of dipole files, refer to db_template('Dipoles').
Timefreq: List of all time-frequency files, refer to db_template('Timefreq').
Matrix: List of all matrix files, refer to db_template('Data').
SQLite structure: protocol.db
With the new database structure, the metadata of the protocol files is no longer in memory in Matlab structures. It is now saved in a SQLite relational database. This means that when the protocol is first created, an empty database is created and is populated with SQL rows. Whenever metadata needs to be retrieved, the metadata is no longer available in memory and needs to be queried instead. While a bit slower, this gives us the ability to support concurrency which is essential for sharing protocols across contributors and running multiple jobs on the same protocol at the same time.
SQLite stores all database information in a single file, the protocol.db file. Since we no longer need the metadata stored in Matlab structures, the protocol.mat is no longer used. That means we can easily detect if a protocol was created from Brainstorm v3.4 (protocol.mat) vs Brainstorm v.4.0 (protocol.db) by looking at the root of its folder in the database folder. For debugging purposes, you can explore the .db file using the application DB Browser for SQLite. The file system handles concurrency with regards to SQL access (i.e. if an instance of Brainstorm is currently inserting metadata, the file system won't let another instance access the .db file until the modification is complete).
To avoid having long wait times, both due to concurrency or remote databases (when SQL queries need to be sent through a network), it's important to optimize as much as possible the database calls such that (1) active connections are closed as quickly as possible and (2) queries only return information that is absolutely required. Therefore, when writing SQL queries, make sure to only include columns that are needed (avoid SELECT * when possible) and return only rows that are needed (for example, if you only need the first row, add LIMIT 1 at the end of your query).
DataBase GlobalData
The GlobalData.DataBase variable is fairly similar to the one described above, with a few notable differences:
ProtocolSubjects / ProtocolStudies: These variables no longer exist as this information is stored in the SQLite database instead of in memory.
ProtocolInfo.iSubject: The ID of the subject that is currently active. Only the study used to be saved, but since study metadata is no longer in memory this avoids querying the database to get the subject of this study.
ProtocolInfo.iReadFiles: Hash map containing all files that have already been "read" (selected). This allows the database to highlight files that are "new" but only until they are interacted with.
Key: FunctionalFile.Id
Value: FunctionalFile.LastModified
ProtocolInfo.Database: Structure describing the information required to connect to the database, refer to db_template('DatabaseInfo').
Rdbms: Relational database management system of the active database (only "sqlite" supported).
Location: Location of the database. For SQLite: the path to the protocol.db file.
Name: Name of the database to use (not applicable for SQLite).
Username: Username to connect to the database (not applicable for SQLite).
Password: Passwordto connect to the database (not applicable for SQLite).
SQL Schema
TODO: Image
A schema defines all tables, the columns of every table and the relation between tables in the database. Refer to the sql_generate_db() function: the name & type of the tables columns are defined in db_template(), and additional properties like foreign keys, primary keys and NOT NULL are hardcoded in sql_generate_db().
Protocol: Table containing the metadata for this protocol. It always has a single row!
Name: Name of the protocol.
UseDefaultAnat: Same as ProtocolInfo above.
UseDefaultChannel: Same as ProtocolInfo above.
Owner: The PC username of the person that created this protocol, refer to bst_get('UserName').
CreationTime: When this protocol was created.
DatabaseVersion: Version of this database schema. If new fields are added to the database in subsequent versions, old protocols should be detected using this field and their schema should be updated.
Subject: Table containing the metadata for all subjects. Very similar to ProtocolSubjects above, with a few notable differences:
Id: Unique identifier number for each subject. This is used to link this table with other tables (foreign keys).
iAnatomy / iCortex / etc: These indices are now unique AnatomyFile IDs rather than the position in the list of all anatomy/surface files.
Anatomy / Surface: These lists no longer exist and are represented as entries in the AnatomyFile table instead.
AnatomyFile: Table containing the metadata for all anatomy files (MRI, surface, fibers, etc), refer to db_template('AnatomyFile').
Id: Unique identifier number for each anatomy file. This is used to link this table with other tables (foreign keys).
Subject: ID of the subject this file belongs to (foreign key).
Type: Type of anatomy file ("anatomy" = MRI, "surface"). Used to split back into Anatomy and Surface lists if needed.
FileName: Path to the anatomy file.
Name: Name of the file in the database (also known as "Comment").
SurfaceType: Additional field for surface files, refer to db_template('Surface').
Study: Table containing the metadata for all studies. Very similar to ProtocolStudies above, with a few notable differences:
Id: Unique identifier number for each study. This is used to link this table with other tables (foreign keys).
Subject: ID of the subject this study belongs to (foreign key).
iChannel / iHeadModel: These indices are now unique FunctionalFile IDs rather than the position in the list of all Channel/HeadModel files.
Channel / Data / etc: These lists no longer exist and are represented as entries in the FunctionalFile table instead.
FunctionalFile: Table containing the metadata for all functional files (data, sources, etc), refer to db_template('FunctionalFile').
Id: Unique identifier number for each functional file. This is used to link this table with other tables (foreign keys).
Study: ID of the study this file belongs to (foreign key).
ParentFile: The ID of the parent functional file this file belongs to, or NULL if it's directly under the study.
Type: Type of functional file (data, matrix, result, etc).
FileName: Path to the functional file.
Name: Name of the file in the database (also known as "Comment").
NumChildren: Number of child files (i.e. the number of files where ParentFile = this ID). This is used to display the number of files in a list and avoids COUNT queries that might slow down displaying the database.
SubType: Some file types have sub types, they are defined here. E.g.: data -> raw vs recordings.
ExtraNum / ExtraStr1 / ExtraStr2: Additional numerical or string columns used by specific file types. These are defined in db_get().getFuncFileStruct().
LastModified: Unix UTC timestamp of the last time this file was modified.
Lock: Table containing all active locks on protocol entities, refer to db_template('Lock').
Id: Unique identifier number for this lock.
Subject: ID of the subject this entity belongs to (foreign key).
Study: ID of the study this entity belongs to (foreign key), or NULL if the lock is on the whole subject.
File: ID of the locked functional file (foreign key), or NULL if the lock is on the whole subject or study.
Username: PC username of the person that acquired this lock, refer to bst_get('UserName').
Computer: PC name of the person that acquired this lock, refer to bst_get('ComputerName').
Time: UTC Date & time when this lock was acquired.
Operation: Description of the operation that acquired this lock, typically the process name.
SQLite API in Brainstorm
sql_connect: To get an active connection to the database. Optional argument of type db_template('DatabaseInfo') to define how to connect to the database, otherwise it looks into the active ProtocolInfo.
sql_close(sqlConnection): Closes an active connection to the database.
sql_create(sqlConnection, tables): Generates and executes SQL CREATE queries for every table as input. Tables input structure defined in db_template('sqltable').
sql_query: Executes a query to the SQL database.
SqlConnection: An active connection to the database, or empty if you want the function to handle it for you.
Type: "select", "insert", "update" or "delete".
Table: The SQL table to query.
Data: The data to select, either a single column name or a list of multiple columns. '*' or [] to get all columns.
Condition: A structure where each non-empty field will be a condition to the query (WHERE field).
AddQuery: Additional text to add at the end of a query before execution (e.g. LIMIT 1).
Here are some specific query examples:
1. Direct query
This allows you to write manual SQL queries and execute them directly. The drawback is that it returns a java.sql.ResultSet object so you need to be familiar with this specific syntax, and you need to handle the SQL connection yourself. Example:
sqlConnection = sql_connect(); resultSet = sql_query(sqlConnection, 'SELECT * FROM Subject'); while resultSet.next() disp(resultSet.getString('Name')); end resultSet.close(); sql_close(sqlConnection);
2. Select query
This lets you read data from the database. It generates a SELECT query, executes it for you, and returns a db_template() structure of the proper table. Example:
% Get all info of subjects that use the default anatomy. sSubject = sql_query([], 'select', 'subject', '*', struct('UseDefaultAnat', 1)); % Get the name of the first subject. sSubject = sql_query([], 'select', 'subject', 'Name', [], 'LIMIT 1'); % Get the name and ID of all subjects. sSubject = sql_query([], 'select', 'subject', {'Id', 'Name'});
3. Update query
This lets you modify rows in the database. It generates an UPDATE query, executes it for you, and returns the number of rows modified. Example:
% Makes subject #1 use the default anatomy numRows = sql_query([], 'update', 'subject', struct('UseDefaultAnat', 1), struct('Id', 1)); % Makes all subjects use the default anatomy numRows = sql_query([], 'update', 'subject', struct('UseDefaultAnat', 1));
4. Insert query
This lets you insert rows into the database. It generates an INSERT query, executes it for you, and returns the generated ID of the inserted rows. We strongly recommend you use db_template() to define the values to insert. Example:
sSubject = db_template('Subject'); sSubject.Name = 'MyNewSubject'; SubjectId = sql_query([], 'insert', 'subject', sSubject);
5. Delete query
This lets you delete rows from the database. It generates a DELETE query, executes it for you, and returns the number of rows deleted. Be careful, this cannot be undone! Example:
% Delete Subject #1 numRows = sql_query([], 'delete', 'subject', struct('Id', 1)); % Delete all subjects numRows = sql_query([], 'delete', 'subject');