Filtering of long recording

Hello Francois,

I am contacting you as I am currently having a long EEG recording (~7hours) at 1KHz. The first step of my analysis is to filter between 0.5 to 100Hz and resample at 250Hz. However, the first step is very long ~1Hour and, it seems that the process is using very little of my computer capacity. So I was wondering if there was any way I could optimise this filtering step in Brainstorm?

Edit: actually it's getting stuck at the writing step.
Edit 2: If I tried to filter the data as continuous recording, it doesn't work and get stuck during the process. However, if I import the data in the database, and then filter, then it seems to work fine.

Edit3: Do you think it would be possible to have a process doing band-pass filtering, notch filtering and then resampling without saving any intermediate data ?

Regards,
Edouard

With the various edits, it is difficult to understand what is very slow vs. what is crashing.
Can you please summarize what is not working?

I can confirm some of your observations: the main bottleneck is the repeated I/O operations on the hard drive (the CPU and memory should not be the limiting factors).
To limit the number of times the files are read, you can try increasing a lot the option "Memory block size" in the Brainstorm preferences, so that bst_process.m tries processing more signals at each time.
Run your tests with a resource monitor open, to make sure you never go over 90% of the RAM memory used.

Optimizing the I/O involved in the processing: there are probably options (using some compression, using some temporary memory mapping...), but that would require significant developments, we'd need someone to spend a few weeks or months on this topic. We won't be able to work on this in the near future.

If the processing speed goes down too much or the memory usage becomes unmanageable because of the file size, you have the option to split the original file in smaller blocks with the process Import > Import recordings > Split raw file.

For what is not working: if you have identifying blocking bugs, we need to fix them.
7 hours at 1000Hz means 7 * 60 * 60 * 1000 = 25200000 time points. There should not be issues with this number of samples. The ephys recordings at 30Hz are commonly denser than this.

To help us debugging this:

  • Open a resource monitor to make sure there is nothing that saturates the memory
  • Try identifying at which duration it starts breaking (split the original file at different durations)
  • Please share the smallest file for which it stops working completely: zip the .bst file, upload it somewhere, post the download link here.

If you want to explore this topic further, use the Matlab Profiler (execute profile on, the process, profile viewer). This might reveal abnormal behaviors (e.g. some nested for loops executed millions of times).

Do you think it would be possible to have a process doing band-pass filtering, notch filtering and then resampling without saving any intermediate data ?

Ideally, yes, it would make sense not to save any intermediate files.
In practice, the way bst_process.m works can't handle this.

If you want to implement this:

  • You need to create a new process (e.g. "Multiple filters"):
  • In input: all the parameters of process_notch.m, process_bandpass.m and process_resample.m,
  • The Run() function should call the three Compute() functions sequentially.

With all the parameters of the bandpass filters, it might be a bit too confusing for the general users. But if you code this, it could be interesting to share it for advanced users. Let us know.

hello
Thanks for your reply.

With the various edits, it is difficult to understand what is very slow vs. what is crashing.

yes sorry about that.
What is not working / very slow (difficult to tell): applying a filter on the 6hours continuous recording at 1Khz. The process seems stuck during the writing phase. What is working: Importing the file in the database, and then filtering. The saving part is long but is only done once.

To limit the number of times the files are read, you can try increasing a lot the option "Memory block size" in the Brainstorm preferences, so that bst_process.m tries processing more signals at each time.

I tried to change the size from 100 to 900mb. The process is now telling:'BST> Band-pass filter: Processing 9 blocks of 3 signals' It doesn't seem to be a memory issue. The task manager is saying that Matlab is using ~gb of memory of the 62 available. but is much more slower than after importing the data. It took ~10minutes to complete 20% of the process (according to the progress bar). Edit: it took 45 minutes for the entire computation. By the way, why applying process on continuous signal is creating new condition and not just new file in the condition ? it make database king of messy

If the processing speed goes down too much or the memory usage becomes unmanageable because of the file size, you have the option to split the original file in smaller blocks with the process Import > Import recordings > Split raw file .

I would prefer to avoid that as we recently moved from chunking the recording in block of two hours to have a long recording to limit filter edge-effect.

If you want to explore this topic further, use the Matlab Profiler (execute profile on , the process, profile viewer ). This might reveal abnormal behaviors (e.g. some nested for loops executed millions of times).

ok. I tried on a 30 minutes file. Here are the results: bst_write_epoch is already called more than 4000 times

To simulate a longer recording, I reduced the maximum memory usage to 100mb, now bst_write_epoch is already called more than 21000 times. For the 8hours recording, it was called almost 200 000 times (900mb memory block)

Main botleneck seems to be the fseek function in bst_write_epoch

With all the parameters of the bandpass filters, it might be a bit too confusing for the general users. But if you code this, it could be interesting to share it for advanced users. Let us know.

ok i'll try and post the result here

Potential help:

since we are accessing data through nfs (we don't have access to local hard drive in our lab pc so the issue is even more than if we were having SSD i guess)

nfsstat -m
/NAS/home from perf-loy-nas:/HOME/home
Flags: rw,nodev,relatime,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=krb5,clientaddr=10.204.192.76,fsc,local_lock=none,addr=172.17.160.10

maybe the following can help too:

By the way, why applying process on continuous signal is creating new condition and not just new file in the condition ? it make database king of messy

One continuous file = one folder.
This is a design limitation. The main problem being that most of the time, the channel file can't be adapted for two different continuous files.

ok. I tried on a 30 minutes file. Here are the results: bst_write_epoch is already called more than 4000 times

This is expected.
The .bst files are saved channel by channel, by page (=epochs). This seems to offer the best compromise between the reading speed channel-by-channel (for reading one signal - ie for filtering) and reading time-by-time (for the brainstorm file viewer - only a few pages need to be read, making the interactive visualization of the file much faser)

Main botleneck seems to be the fseek function in bst_write_epoch

Again, this is expected... the most costly operations are the switch to a different reading/writing spot on the hard drive. Reading the entire file at once can be much faster because it's fully optimized.
Checking the option "Process the entire file at once" might make this a bit faster.

since we are accessing data through nfs (we don't have access to local hard drive in our lab pc so the issue is even more than if we were having SSD i guess)

You can't expect to have fast I/O over NFS, indeed.
If you want this to be decently fast: copy the entire files locally, process them, and copy back the output.
With the structure of the .bst files (demultiplexed by page), it's not possible to read the file linearly, and lots of fseek operations are necessary.

The future database system will allow working on distant databases, with temporary local copies on the hard drive, in order to avoid the super slow NFS interactive operations.

Improving fwrite performance - Undocumented Matlab

Can you try on your local Brainstorm install to replace 'w' with 'W' to prevent auto-flushing? Does it make any speed difference?
=> In out_fopen_bst.m, line 47.

hello Francois,

I am having an issue that i think is related. I tried to do a script that is loding .eeg file (brainamp), filtering and resampling them and then export as .edf.

However, the created edf file doesn't correspond to the data in brainstorm and contains data discontinuty:

File in brainstorm (after resampling):

After exportation:

Brainstorm is telling me: WARNING: Found discontinuity between 1389.000s and 1390.000s, expect blank data in between. 
WARNING: Found discontinuity between 1390.000s and 1391.000s, expect blank data in between.
...

The script i used:

% Script generated by Brainstorm (28-Jan-2022)
data_folder='/NAS/home/edelaire/Documents/data/normal_sleep/';
export_folder = '/NAS/home/edelaire/Desktop/marking';

%%
if ~brainstorm('status')
brainstorm
end
bst_report('Start');

protocol_name = 'For_laure';
if isempty(bst_get('Protocol', protocol_name))
gui_brainstorm('CreateProtocol', protocol_name, 1, 0); % UseDefaultAnat=1, UseDefaultChannel=0
end

subjects = dir( fullfile(data_folder,'*')); % ici on veut lister uniquement les repertoires sujets et pas leur contenu

for i_subject = 7:length(subjects)

    % Creation du sujet
sSubject = bst_get('Subject', subjects(i_subject).name, 1);
if isempty(sSubject)
    [sSubject, iSubject] = db_add_subject(subjects(i_subject).name, [], 1, 0);
end

EEG_recordings = dir( fullfile(data_folder, subjects(i_subject).name,'EEG/*.eeg')); % ici on veut lister uniquement les repertoires sujets et pas leur contenu

for i_recordings = 1:length(EEG_recordings)
    
    sFiles = bst_process('CallProcess', 'process_import_data_raw', {}, [], ...
            'subjectname',    subjects(i_subject).name, ...
            'datafile',       {fullfile(EEG_recordings(i_recordings).folder,EEG_recordings(i_recordings).name), 'EEG-BRAINAMP'}, ...
            'channelreplace', 1, ...
            'channelalign',   1, ...
            'evtmode',        'value');

    sFiles = bst_process('CallProcess', 'process_bandpass', sFiles, [], ...
        'sensortypes', '', ...
        'highpass',    0.5, ...
        'lowpass',     100, ...
        'tranband',    0, ...
        'attenuation', 'strict', ...  % 60dB
        'ver',         '2019', ...  % 2019
        'mirror',      0, ...
        'overwrite',   0);

    % Process: Notch filter: 60Hz
    sFiles = bst_process('CallProcess', 'process_notch', sFiles, [], ...
        'sensortypes', '', ...
        'freqlist',    60, ...
        'cutoffW',     5, ...
        'useold',      0, ...
        'overwrite',   0);
    
    % Process: Resample: 250Hz
    sFiles = bst_process('CallProcess', 'process_resample', sFiles, [], ...
        'freq',      250, ...
        'overwrite', 0);
    
    [filepath,name,ext] = fileparts(EEG_recordings(i_recordings).name); 
    if ~exist(fullfile(export_folder,subjects(i_subject).name))
        mkdir(fullfile(export_folder,subjects(i_subject).name))
    end    
    [ExportFile, sFileOut] = export_data( sFiles.FileName, [],  fullfile(export_folder,subjects(i_subject).name,[name '.edf']) );
end

end

Edit:

it seems that changing the way i import the data from continous file to recording in the db if fixing the issue:

Changing :

    sFiles = bst_process('CallProcess', 'process_import_data_raw', {}, [], ...
            'subjectname',    subjects(i_subject).name, ...
            'datafile',       {fullfile(EEG_recordings(i_recordings).folder,EEG_recordings(i_recordings).name), 'EEG-BRAINAMP'}, ...
            'channelreplace', 1, ...
            'channelalign',   1, ...
            'evtmode',        'value');

to

    sFiles = import_data(fullfile(EEG_recordings(i_recordings).folder,EEG_recordings(i_recordings).name),[],         'EEG-BRAINAMP', [], iSubject);

it seems that changing the way i import the data from continous file to recording in the db if fixing the issue:

The first version links the file to the database as a continuous file ( process_import_data_raw).
The second one makes a full copy of the file into the database as an "epoch" (import_data).
This is not surprising that the behavior is different.

However, the created edf file doesn't correspond to the data in brainstorm and contains data discontinuty:

This could be due to the length of the EDF pages to be incorrectly set.
But I could not reproduce this behavior with this sequence: linking a BrainVison .eeg, resampling, exporting as EDF.

Could you try to create a minimal reproducible example?

  • Find one short file with which you can reproduce the error
  • Write the minimal script to reproduce the error from this file (the notch filter is probably not necessary)
  • Upload and post here

Thanks