Converting EEG data from .h5 file for import

zoeherri · June 10, 2020, 12:12am

Hello,

I am trying to import simulated EEG data produced by The Virtual Brain (TVB) into brainstorm for analysis. The simulated data follows the standard 10-20 montage with 63 surface electrodes. Any data exported from TVB comes out in a .h5 file. I can read these in Python using h5py.File('filename.h5', 'r') and have determined that the simulated EEG data consists of 2 datasets, ['data', 'time']. Viewing each individually reveals that data is of shape (6000, 2, 63, 1), while time is of shape (6000,). For specifics, I have included the code I ran and the subsequent output below.

The code:

import h5py
file = h5py.File('filename.h5','r')

print('file name')
print(file.name)
print('file keys')
print(file.keys())
print('structure of data')
print(file['/data'])
print('structure of time')
print(file['/time'])

The output:

file name
/
file keys
<KeysViewHDF5 ['data', 'time']>
structure of data
<HDF5 dataset "data": shape (6000, 2, 63, 1), type "<f8">
structure of time
<HDF5 dataset "time": shape (6000,), type "<f8">

All the data is there but I am unsure how to put it into a format that I can import into Brainstorm. The data is structured like a numpy array. I can get the first row of data with file['/data'][0, :, :, :]

Additionally, if anyone can help me get the head model data from TVB into Brainstorm, that would be extremely helpful. I am not the one who created the simulation in TVB and have not found the MRI data as of yet. However, both Brainstorm and TVB can import data from FreeSurfer and this appears to be the preferred method to prepare head model data for both. From TVB I can export .h5 files containing the cortical surface, face surface, brain skull interface, skin air interface, and skull skin interface data. The .h5 file containing the cortical surface data contains the following datasets ['triangle_normals', 'triangles', 'vertex_normals', 'vertices'] and was prepared using the following scripts (available on GitHub: https://github.com/timpx/scripts).

If anyone has recommendations or advice on either point it would be greatly appreciated. Stay safe and healthy.

Francois · June 10, 2020, 9:39am

We recently added to Brainstorm an external HDF5 reading library:
https://github.com/brainstorm-tools/brainstorm3/tree/master/external/easyh5
Can you read these .h5 files with the function loadh5.m?

If it works, the best would be to write reading functions for the recordings based on this library:

Creating a file reader for the TVB .h5 file: in_fopen_tvb.m, in_fread_tvbh5.m (plenty of examples available)
Reference the file format TVB-H5 in bst_get('FileFilters') > 'data' and 'raw'
Add the file format TVB-H5 to the big switch in in_fopen.m and in_fread.m

For the head model, it would be very complicated to import the surfaces to the Brainstorm database without the MRI, as everything in Brainstorm is based on anatomical landmarks set on the MRI.
A lot of the information we need from the FreeSurfer is probably probably useless to the purposes of the TVB and therefore not imported (registration information with the MRI, spherical atlas coregistration, atlases...). The changes of coordinate systems needed for anatomical registration can be an endless nightmare, just so you know before digging further in this direction

We're evaluating the possibility to use SPM12/CAT12 routinely instead of FreeSurfer. If you find the MRI somewhere, you might find a way to export it as a .nii file, then you could import it into Brainstorm and reprocess it with CAT instead of having to re-run FreeSurfer.
https://neuroimage.usc.edu/brainstorm/Tutorials/SegCAT12

zoeherri · June 10, 2020, 4:21pm

Hello Francois,

I ran loadh5() on the .h5 file from TVB and it returned the following:

ans = 

  struct with fields:

    data: [1×63×2×6000 double]
    time: [6000×1 double]

This is consistent with what I received as python results when viewing the file.
Will you provide more clarification regarding this portion of your post?

My understanding is that you recommend I write functions to read this file into Brainstorm. I am uncertain what format I would need to organize the data into for Brainstorm to read it properly and am not sure where to find the examples/references you speak of. Will you point me to some of these? This is my first time adding anything to an existing software and my first time coding directly in Matlab in a few years.

On the head model, I may have found the T1 weighted MRI in a .h5 file format. I will see if I can convert it to something I can use, otherwise I will rely on the default anatomy. Thank you for the heads up

Francois · June 15, 2020, 4:33pm

I ran loadh5() on the .h5 file from TVB and it returned the following:

    data: [1×63×2×6000 double]
    time: [6000×1 double]

This is OK, but not sufficient.
You need to find a file that tells you at least the name of these 63 electrodes.

Do these files have standardized names?
Could you point at some online documentation describing this file format / the way to access these files?

It would be useful to invest time in writing a proper TVB .h5 reader in Brainstorm only if any TVB user could benefit from it in an easy way.
If this is the case, please send me a short example file (data + channel names), I'll see how I can organize this.

Otherwise, you can import these files in Brainstorm with the following procedure:

Permute the dimensions to have [Nchannels, Ntime, Nepochs]:
test.data = permute(test.data, [2,4,3,1])
Save the structure as a .mat file:
save('test_file.mat', '-struct', 'test')
Right-click on your subject in the database > Import MEG/EEG > Select file format "EEG: Matlab matrix (.mat)", select the file test_file.mat
Configure the import so that it matches what you have in the file (the field "time" will be ignored and reconstructed from the information you set in the import options):
Right-click on the channel file > Edit channel file : Edit the types and names of the channels

zoeherri · June 15, 2020, 5:42pm

Hello Francois,

Thank you for getting back to me. There is a separate file containing the labels and locations of the EEG sensors which is also a .h5 file. I found that I can import both the data and the channel .h5 files into EEGLab and create a .set file which I can then import to brainstorm.

I did look into the Brainstorm software and find the files you referred to above, but since the data and channel information from TVB are stored in different files, I was unsure how to create an import file for them. If you would like, I will send you the two .h5 files so that you can view them and determine a way to import this data for future. The pairing of these two powerful tools for brain analysis has great potential for future discoveries in my opinion. Let me know if you would like the files.

Best,
Zoe

Francois · June 16, 2020, 6:28am

Yes, please.

And please share as much information as you can together with this file format:

Do these files have standardized names? Or does it go through some sort of export where you have to define the file name manually?
Could you point at some online documentation describing this file format / the way to access these files?
If this is not documented online, could you describe briefly the structure of all these files (folders/subfolders) => I'd need this info to know how to pair the data and channel information.
If I develop something for the files you send me, will this work also so other TVB users, or are there local particularities that would prevent the exact same code to run in other places?

zoeherri · June 17, 2020, 1:44pm

Hello Francois,

Here is the link to a Google drive folder I created and dropped the 2 files in: https://drive.google.com/drive/folders/1G1438mfArsG367wlVgLm2Mvs91xQx4em?usp=sharing

The Virtual Brain (TVB) provides some documentation regarding the file formats, however I have not found any specifics regarding the internal structure of the files. Here is the main page on import/export file format: https://www.thevirtualbrain.org/tvb/zwei/brainsimulator-data

That being said, I will explain what I know below. If anything is unclear or there is missing information, let me know and I will do my best to communicate it to you clearly/ find the answer.

TVB only exports data in .h5 file format. The overall name of the file is determined by the contents and the date exported but this is something the user can change later.

These .h5 files contain datasets which have names associated with them. All .h5 files are structured in groups and subgroups. The main group name for all the .h5 files I have looked at from TVB has been empty, so a simple '/' is all that needs to precede the name of a subgroup. The number, names and contents of these subgroups (which are each datasets) vary depending on the file you are looking at. That is to say, they will be different for a file containing EEG data than they would be for sensor array, sEEG, MEG or BOLD data.

Within the type of data you are looking at, the subgroup names and structures appear consistent at least for EEG and sensor data (these are the ones I have worked with). I exported and inspected multiple files of each of these, taken from different simulations and different projects in TVB.

Breakdown of EEG data format:

'/data' contains an array of the structure [1×63×2×6000 double]. The first value remains constant at 1. The second value is the number of electrodes in the sensor array. The third value will be either 1 or 2; if the value is 1 then only the EEG data is present; if the value is 2, then metabolic data is also present and the EEG data will be the first. The last value is the number of time points taken and is in miliseconds. So my data contains 6 seconds of simulated recording.
'/time' contains an array of the structure [6000×1 double]. The first value is the number of time points, the second value will always be 1. For my data, the time starts at 0.5 and increases by 1. I am not sure why that is, but since I set the sampling to 1ms in the simulation, and it increases by what I would expect, I am guessing this is a quirk of the system that has to do with starting at time 0.

Breakdown of the sensor montage data format:

'/labels' contains an array of the structure [5×63 char]. The first value is the number of characters in the labels and the second is the number of electrodes. The data contained in this will depend upon the montage uploaded to TVB by the user to run the simulation in the first place. The names of the electrodes may or may not be consistent with a standard format as this is dependent on what the user input originally.
'/locations' contains an array of the structure [3×63 double]. The first value should always be 3 as these are coordinates in a 3D space ordered x,y,z. The second is again the number of electrodes. The space these coordinates are in (MNI, patient unique...) again depends upon the user.

The EEG data and the sensor data can only be exported separately. To my knowledge, the formats of each are consistent in their basic structure to the point where the import structure could be used by other TVB users working on other projects.

There is a Google group which serves as the forum for the TVB sorftware and is open to join. The most responsive person on the forum who is directly involved with the software development is Marmaduke. Here is a link to the list of contributors for the software: https://www.thevirtualbrain.org/tvb/zwei/teamwork-contributors

Please let me know if you need anything else. I am new to TVB but will share all that I have learned thus far.

Best,
Zoe

Francois · June 18, 2020, 4:53pm

Thanks for the very detailed description!

Here is your importer:

Just update Brainstorm, then select "EEG: The Virtual Brain" in the list of file formats in the options for the menu "Import MEG/EEG".

Let me know if there is anything to fix.

Francois · June 18, 2020, 4:57pm

zoeherri · June 19, 2020, 4:31pm

Hello Francois,

Thank you for the importer!

Sorry for the delay, my internet has been down. I imported the EEG data and can see the 2 time series as shown in your image above. The sensor import is giving me an error though.

I right click Channel file (63) > Add EEG postitions > Import from file, choose the new format you created and the file from my directory, then I get a message stating "There is a transformation to subject coordinates available in the MRI. Would you like to use it to align the sensors with the MRI?" Regardless of whether I select "Yes" or "No" I then get the error message "No channel matching the loaded cap." I am unsure if this has to do with the head model I have generated, or the importer. Also, mine says Channel file (63) rather than TVB channels (63) as in your image above. Can you clarify?

Thank you!

Best,
Zoe

zoeherri · June 19, 2020, 6:04pm

I just tried the imports again with a different protocol and the import worked. I am refining my protocol and must have introduced an error in the other. I am now determining how to import the sensors and refine to the head model without generating 2 sets of sensors on the scalp, one from the EEG import and one from the sensor import. When I import the EEG data, it assumes sensor locations and it seems that, if I refine the registration, I end up with 2 sets after importing the sensors from the file. I believe this is all from a lack of familiarity on my part when importing the sensors separately though. I will read through your tutorials again

Thank you again for the new importer!

Francois · June 20, 2020, 8:24am

"No channel matching the loaded cap."

The sensors file will be loaded automatically together with the recordings in one of these two cases:

The SensorsEEG file has exactly the same name as the data file
There is one and only one SensorsEEG file in the same folder as the data file

Otherwise, the data channels can't be labelled, and they all end up named "E001", "E002"...
The menu "Add EEG positions" can't be used because it matches the positions with the data channels using the labels, and the labels do not match (eg. "Fp1" in the sensors file, "E001" in the data file).
In that case, you need to use the menu "Import channel file" instead, which will overwrite the empty channel file generated automatically when importing the data, without any check on the number of channels or their names.
You can't add the positions

zoeherri · June 20, 2020, 9:53pm

The second bullet point is true for me so it was loading the sensors with the data. I did not know this relationship existed. Thank you for the clarification.

Giansu · May 31, 2021, 4:16pm

Nice feature!..It works well with EEG data..
Would there be the possibility of adapting it to MEG data too?

Best

Francois · June 1, 2021, 7:55am

If there are dedicated structures for MEG recordings in TVB .h5 files, yes, the reading functions in_data_tvb.m and `'in_channel_tvb.m' could be extend to support them.

Is it something you can help us doing?