Combine multiple fif files from same session (into *.bst format)

pantazis · November 18, 2019, 6:48pm

Hi Francois and Martin,

As you already know, Elekta devices produce multiple fif files in typical ~1h sessions because there is a 2Gb fif format limitation (not operating system limitation). This is actually a pain for many reasons: i) we need to repeat the same analysis multiple times; ii) it creates complications with registration so we are forced to use 'one channel file per subject' to avoid registering every single fif file; iii) we miss trials when the files are arbitrarily split; iv) it is tricky to work with eye blink projectors, heart beat projectors etc, because they can be assigned to one or all files etc.

I recently found out that MNE has a very easy solution to that. They simply type: combined_raw = file1.fif + file2.fif + file3.fif, and because they overloaded the '+' operator, the resulting file is raw data with all the recordings concatenated.

Perhaps Brainstorm could over a similar solution? In particular, there is already the option to store raw data in the database (*.bst format; e.g. resample or low-pass the fif data). Perhaps you could design a process that allows us to select multiple fif files and simply concatenates them in .bst format. It may be wasteful in storage. But I cannot stress enough how amazingly helpful that would be for data analysis! I can provide example data if needed.

Thank you,
Dimitrios

Francois · November 22, 2019, 9:56am

Yes, please send us example datasets with multiple contiguous FIF files, I don't have any.
Does MNE save another .fif file > 2Gb, or does it just create a Python object in memory that lets you manipulate the concatenated files as if it were one block only?

A solution for handling such files would be to simply handle all the file*.fif files as a single file automatically, from the low-level reading functions, like we do with CTF recordings (one .ds folder can contain multiple 2Gb .meg4 files).
When you link file1.fif to your Brainstorm database, it would check if it reached the max size for a single file, and check for the existence of file2.fif, file3.fif, etc. If these additional files exist, it would create only one link in your database which would represent the concatenation of all the file*.fif files.

Would that make sense?
Is there any strict naming convention form Elekta for the multiple .fif files that have to be processed together?

pantazis · November 26, 2019, 4:27pm

Hi Francois,

Matti has told me that the <2Gb limit is due to an internal fif format definition and not due to operating system, and thus there cannot exist any fif files with file size >2Gb. So it is impossible to join them. MNE concatenates fif files using its own Python format.

The solution you propose (low-level reading functions) is ideal if possible! But I proposed the .bst file as an easier solution (though memory wasteful) to achieve the same goal. Any of these will change our lives.

Elekta's naming convention is: myfile.fif, myfile-1.fif, myfile-2.fif, etc. After maxfilter, you can get files like myfile_tsss_mc.fif, myfile-1_tsss_mc.fif, myfile-2_tsss_mc.fif, etc. But often people like to rename the first file by adding '-0' so they sort better (myfile-0.fif, myfile-1.fif, etc) and this is what I share in the server.

Checking if the fif has reached its max size and automatically searching for related files with the above naming convention makes a lot of sense. Each file stores internally the time segment it corresponds to, which can be used to verify correct concatenation.

Since I run out of dropbox space, I will send you an email with credentials to access fif files in our server.

Cheers,
Dimitrios

juangpc · December 19, 2019, 8:26pm

Hi Francois and Dimitrios,
Because of the fif anonymizer application I did some months ago, I had to go deep into the fiff documentation and developed all the reading/writing functions of a fif file from the byte level upwards. If you need help accessing internal fif files info, let me know.

Yes, the 2gb limitation is because fif files are a linked list, and the holder for "next" link is an int32 specifying memory offset from the beginning of the file, thus the 2gb "maximum offset" limitation.

There is no need to look if the file has maximum size to assume it has been continued in a different file, there is actually a structure to store that info. In a measurement info block (code 101) there should be a raw data block (code 102) since where are dealing with raw data files, right? Inside this block there should be a file-reference tag. This reference file tag id has a value of 116 (file id is a type of tag in fiff).

In a single file you see: measurement block containing raw data only.
In the 1st file of a multifile series you see: measurement block containing raw data and ref to file. ** you can look for it now.
In a middle file of a multifile series you see: measurement block containing: ref to file + raw data + ref to file.
In the last file of a multifile series you see: measurement block containing: ref to file + raw data.

Hope this helps.

Francois · December 20, 2019, 10:23am

Yes, it does help, thanks.

We had a separate thread of conversation by email with Matti and Alex regarding the way this chaining is done in the FIF files, and how this is handled in MNE-C and MNE-Python. Additional conclusions:

The linking by filenames works only if the files have not been renamed after they have been saved by an Elekta or MNE program
There is an option to generate chained file names that are compatible with BIDS conventions in MNE-Python: https://github.com/mne-tools/mne-python/blob/master/mne/io/base.py#L1413

I think I have all the information I need for the Brainstorm reader. I'll work on it soon(-ish).

Francois · December 23, 2019, 8:01pm

Here it is: https://github.com/brainstorm-tools/brainstorm3/commit/3b8f20dc6f6423ef50840c37791131fd98597c7b

@pantazis How do you like your christmas present?
Can you try it and let me know if it works? Otherwise you can return it to the store.
You should just link the first file to the database, it would create only one new entry in the database explorer, but the command window would tell you that additional files were linked (or not found...) and when you open the file, you get access to the full recordings (across all the files).

Hopefully it's not too buggy...

pantazis · December 24, 2019, 10:52am

Hi Francois,

Thank you for working so fast! I do not like returning Christmas presents, and your present is the best I had in years! But I need to report a bug... I imported a few data sets and found this issue consistently. For example, I made a link to "KMO1_subj20_sess02-0_tsss_mc.fif", and then imported trials to the database with epoch time -300 to 1000 ms. During importation, I get the message: "Some epochs (2769) are shorter than others, ignore them?" and if I select yes it basically ignores all trials excluding a few (4 in this case) that have wrong duration -300 to 2301ms.

Another issue: the prompt window presents the files in opposite order, from last to first (3_tsss_mc.fif, then 2_tsss_mc.fif, then 1_tsss_mc.fif). Are the files read in the correct order? If so, can we present them in the correct order to prevent alerting the users?
FIF> Adding linked file: F:\MYPROJECTS15\project_cvMANOVA\rawdata\MEGrawdata\KMO1_subj20_sess02-3_tsss_mc.fif
FIF> Adding linked file: F:\MYPROJECTS15\project_cvMANOVA\rawdata\MEGrawdata\KMO1_subj20_sess02-2_tsss_mc.fif
FIF> Adding linked file: F:\MYPROJECTS15\project_cvMANOVA\rawdata\MEGrawdata\KMO1_subj20_sess02-1_tsss_mc.fif

Optional: Many users will be surprised with this feature and may not notice it, and as a result load the 0 file, then the 1 file and so on, without realizing they reload the same data. Perhaps you could present a popup warning saying something like "Discovered multiple fif files, loading all of them..." with a choice to not present this warning ever again. This will make the change explicit to all users.

Optional 2: Even when multiple fif files are linked, the directory tree includes the name of the first fif file only (e.g. KMO1_subj20_sess02-0_tsss_mc). It would be nice to have some visual confirmation in the data tree that indeed multiple fif files have been combined. Maybe include [x] after the name, where x is the number of fif files?

Thank you for this wonderful feature. After addressing the bug, it will really change the way we analyze our fif files!

Cheers,
Dimitrios

Francois · January 10, 2020, 12:07pm

I imported a few data sets and found this issue consistently. For example, I made a link to "KMO1_subj20_sess02-0_tsss_mc.fif", and then imported trials to the database with epoch time -300 to 1000 ms. During importation, I get the message: "Some epochs (2769) are shorter than others, ignore them?" and if I select yes it basically ignores all trials excluding a few (4 in this case) that have wrong duration -300 to 2301ms.

Fixed:
Bug fixes: support for split FIF files · brainstorm-tools/brainstorm3@6db44b7 · GitHub

Another issue: the prompt window presents the files in opposite order, from last to first

I fixed this. There was no problem with the reading order, but the files are read recursively and the reporting was done after the reading, therefore starting with the last file.

It would be nice to have some visual confirmation in the data tree that indeed multiple fif files have been combined. Maybe include after the name, where x is the number of fif files?

Done.

Happy new year

pantazis · January 13, 2020, 11:16pm

Hi Francois,

You are amazing, I just tried the combined fif importation on a new project and it works perfect! And my processing pipelines are greatly simplified now! Thank you so much!

Cheers,
Dimitrios

pantazis · January 27, 2020, 7:14pm

Hi Francois,

While importing multiple combined fif files now generally works, I have a new dataset consisting of 2 fif files that produces an error. I have uploaded the data in the minea1 server (same location as before, example_data_francois/data_fails/) so you can replicate the problem.

Specifically, the link to raw file is created, however importing the trials causes this error:

BST> Emptying temporary directory...
Error using fif_read_raw_segment (line 65)
No data in this range

Error in in_fread_fif (line 96)
[F, TimeVector] = fif_read_raw_segment(sFile, sfid, SamplesBounds, iChannels);

Error in in_fread (line 81)
[F,TimeVector] = in_fread_fif(sFile, iEpoch, SamplesBounds, iChannels);

Error in in_data (line 298)
[F, TimeVector] = in_fread(sFile, ChannelMat, BlocksToRead(iFile).iEpoch,
BlocksToRead(iFile).iTimes, [], ImportOptions);

Error in import_data (line 176)
[ImportedDataMat, ChannelMat, nChannels, nTime, ImportOptions] = in_data(sFile, ChannelMat,
FileFormat, ImportOptions, nbCall);

Error in import_raw_to_db (line 51)
NewFiles = import_data(sFile, ChannelMat, sFile.format, [], iSubject, [], sStudy.DateOfStudy);

Error in tree_callbacks>@(h,ev)import_raw_to_db(filenameRelative) (line 1296)
gui_component('MenuItem', jPopup, [], 'Import in database',
IconLoader.ICON_EEG_NEW, [], @(h,ev)import_raw_to_db(filenameRelative));

Also, the raw data viewer opens properly and displays the correct extended time, but when I move the time point towards the end of the recording, I get this error:

Error using fif_read_raw_segment (line 65)
No data in this range

Error in in_fread_fif (line 96)
[F, TimeVector] = fif_read_raw_segment(sFile, sfid, SamplesBounds, iChannels);

Error in in_fread (line 81)
[F,TimeVector] = in_fread_fif(sFile, iEpoch, SamplesBounds, iChannels);

Error in panel_record>ReadRawBlock (line 1205)
[F, TimeVector] = in_fread(sFile, ChannelMat, iEpoch, smpBlock, iChannels, ImportOptions);

Error in panel_record (line 30)
eval(macro_method);

Error in bst_memory>LoadRecordingsRaw (line 909)
F = panel_record('ReadRawBlock', GlobalData.DataSet(iDS).Measures.sFile, ChannelMat, iEpoch,
TimeRange, 1, RawViewerOptions.UseCtfComp, RawViewerOptions.RemoveBaseline, UseSsp);

Error in bst_memory>LoadRecordingsMatrix (line 843)
DataMat.F = LoadRecordingsRaw(iDS);

Error in bst_memory (line 72)
eval(macro_method);

Error in panel_record>ReloadRecordings (line 1152)
bst_memory('LoadRecordingsMatrix', iDS);

Error in panel_record>ValidateTimeWindow (line 633)
ReloadRecordings();

Error in panel_record>SetStartTime (line 574)
ValidateTimeWindow();

Error in panel_record>JumpToEvent (line 1549)
SetStartTime(startTime, evtEpoch);

Error in panel_record (line 30)
eval(macro_method);

Error in figure_timeseries>FigureMouseDownCallback (line 461)
panel_record('JumpToEvent', iEvt, iOccur);

Error while evaluating Figure WindowButtonDownFcn.

Thank you!
Dimitrios

Francois · January 28, 2020, 5:18pm

Fixed: https://github.com/brainstorm-tools/brainstorm3/commit/44f6789133d5b82e99ee1a83acbd2c43de46692b

It was a very stupid bug... a bad test "> 2" instead of ">= 2".
So it was never working for only two files...

pantazis · January 28, 2020, 6:35pm

Yeah, it works now! I am really grateful you are so responsive

Cheers,
Dimitrios