Abnormally slow behavior

Dear Brainstormers,

I am working on a MEG protocol with 17 subjs and I’m noticing an incredibly slow behavior. Some details:
[ul]
[li]version of Brainstorm: 27th of April 2016
[/li][li]time to load a protocol (the only one I have): between 10 min and 15 min
[/li][li]time to save the protocol after a simple step (cond A - cond B at group level, on the already averaged files): between 20 min and 30 min
[/li][li]more or less same performance on 3 different PC (all three with ~7GB of physical memory and 4 CPU)
[/li][li]I’m not having any matlab-related issue nor any slow down performance with any other software
[/li][/ul]
Do you judge this behavior an anomaly? What should I check?
What do you suggest to speed things up (if possible)?

I’ve tried to copy only group level data (and the protocol file) on a different machine and run everything from there, but it does not seem to help (probably this is not the correct way of lighten the protocol…).

Thanks for any suggestion,

vale

Hi Valentina,

No, this is definitely not normal, but it’s very difficult to guess what causes this without having access to the data.
Many database operations are not very fast and could be optimized, but I would need to know what precisely.

The bottleneck could be at three levels:

  1. The display of the database tree
    => How long does it take if you just press F5 to refresh the database?
  2. The saving of the database structure (protocol_folder/data/protocol.mat)
    => How big is this file?
    => How long does it take to run the command “db_save(1)” in the Matlab command window?
  3. The reloading of some parts of the database
    => How long does it take if you right-click on a folder > File > Reload
    => Do you have a lot of time-frequency or connectivity files?

One way of exploring exactly what is taking time is to use the Matlab profiler.

  • Type “profile on”
  • Run the operation that takes a lot of time
  • Type “profile viewer”: it shows the operations that the most time (column “self time”) and number of times they are executed. You can navigate between functions and see the corresponding code.
    If you can find what lines of code take that much time, I might be able to work on it.

How large is your entire protocol?
Is it something that you could send me, or is it in the range of hundreds of Gb?
Otherwise, could I access remotely your computer?

Cheers,
Francois

Dear Francois,

thanks for the fast reply!

It takes 2 seconds.

The protocol.mat is 6.23 GB. That's huge, isn't it?
Run "db_save(1)" takes 1 second if nothing has been done, more than 20 min if called after even just opening a file.

I think we found the problem. "right-click on a folder > File > Reload" takes...forever. Just to describe a bit more the weird behavior: I right clicked on a folder and the program froze for about 15 min before showing me the options. Then, the reload itself doesn't take much, but the saving of the database takes more than 20 minutes.
I do not have any time-frequency/connectivity files. However, I had to split the data in multiple different partitions (splitting epochs according to different sub-conditions) so lots of data are duplicated. I guess I should have been smarter in the way I handled my epochs...

I run the above step with "profile on" and it shows that it is definitely the saving part that is taking so long:




Would you say that the problem is the size of individual subjects folder due to the duplication of the epochs?

Thanks!

vale

The file protocol.mat should never be larger than a few Mb (and this is already for a 3Tb database).
No wonder that everything is taking forever: this file is constantly loaded and saved.
It’s supposed to contain only the list of files and a few additional information (relation between files, types of data), but no data matrices.
Something went wrong at this level.

Have you been creating or manipulating files by yourself?

I suggest you do the following:

  1. Make a backup of your database (you should already have all your data backed-up somewhere, it’s not safe to work without a backup)
  2. Unload the protocol: menu File > Delete protocol > Only detach from database
  3. Move the file protocol/data/protocol.mat somewhere else outside of the database (or simply delete it, you won’t need it again)
  4. Load again the protocol: menu File > Load protocol > Load from folder.
  5. Close Brainstorm when it’s done reloading (to make sure it saves all the information to protocol.mat)

Check the size of the file protocol.mat after. If it’s again > 2Mb, there is something wrong in my loading procedure or in one of your files.
It could be that one of your files contains an error, something like a 3Gb matrix where it is supposed to have a simple string of characters.

You can try to open it in Matlab directly and see why it so big.
Or send it to me and I’ll have a look at it (but 6Gb will be painful to transfer).

Francois

Thanks Francois!!

[QUOTE=Francois;10370]
Have you been creating or manipulating files by yourself?
I suggest you do the following:

  1. Make a backup of your database (you should already have all your data backed-up somewhere, it’s not safe to work without a backup)
  2. Unload the protocol: menu File > Delete protocol > Only detach from database
  3. Move the file protocol/data/protocol.mat somewhere else outside of the database (or simply delete it, you won’t need it again)
  4. Load again the protocol: menu File > Load protocol > Load from folder.
  5. Close Brainstorm when it’s done reloading (to make sure it saves all the information to protocol.mat)
    [/QUOTE]

I have no idea why the file got so big, but I tried on my colleague pc - one of those where data are backed up :slight_smile: - and it worked perfectly!
I’ll do the same on mine and get back here in case of issues, but hopefully I won’t need to.

Thanks a lot!!

vale

Dear Francois,

sorry to bug you again with this.

I've checked on another computer and apparently, unless I do not copy all the individual subjects folders (as done on my colleague computer), the trick of detach/re-load the protocol does not work.


[see how on the same computer if I leave all the subjects' data the protocol.mat created is huge (on the left)]

So, I did what you suggested: examine the file in matlab.


ProtocolStudies seems to be the problem.


Those 980 structures contain more than they should, am I right?


The first structures are smallish, but I confess I'm not aware how they should look like in the first place.


This seems wrong, Results contains 1394 structures, and I feel that's not OK..


Sorry for the bad image, but it seems like it just a repetition of the same thing over and over (all ProtocolStudies(1,727).Results(1,n) are the same).

I apologize for the mess I (somehow) created inside my subjects' folders.
I'll appreciate any advice on how to clean this up!

Thanks a lot again,

vale

You have 980 sub-folders in the “data” folder? (1 Study = 1 folder)
This means 50 conditions for each of you 19 subjects?

And 1394 source files in each folder? (1 Result = 1 source file or source “link” attached to an epoch)

980 * 1394 * 1Kb = 1.3Gb “only”.
So you have probably folders where you have lot more files…

How many files total do you have in this protocol?
Maybe you just have too much data in there, and you’re reaching the limits of this poorly designed database…

Could you try to send me this protocol.mat file?

Hi Valentina,

I looked at your protocol.mat, and there is no major error. The file is gigantic because you have a gigantic amount of files referenced in the database.
And because it is slightly over 2Gb large, it uses the "v7.3" .mat format, which is terribly inefficient for storing numerous small variables (and creates a "compressed" file of 6Gb instead of uncompressed file of 2.3Gb). It probably started to be a lot slower recently, when it passed the 2Gb threshold.
I need to work on improving the scalability of this element of the database.

However, I don't think you are using the database the way you are supposed to.
First anomaly: You have sometimes more than 10 imaging kernels in each subject, instead of one in the typical case.
This creates 10 "source links" entry in EACH epoch, hence the folders with 2000 "Results", which cause the protocol.mat to go completely out of control.
I don't think these files make much sense in your database. Example:

S01/@default_study/results_wMNE_MEG_GRAD_MEG_MAG_KERNEL_150605_1303.mat
S01/@default_study/results_wMNE_MEG_GRAD_MEG_MAG_KERNEL_150605_1303_abs_zscore.mat
S01/@default_study/results_wMNE_MEG_GRAD_MEG_MAG_KERNEL_150605_1303_abs_zscore_02.mat
S01/@default_study/results_wMNE_MEG_GRAD_MEG_MAG_KERNEL_150605_1303_abs_zscore_03.mat
S01/@default_study/results_wMNE_MEG_GRAD_MEG_MAG_KERNEL_150605_1303_abs_zscore_04.mat
S01/@default_study/results_wMNE_MEG_GRAD_MEG_MAG_KERNEL_150605_1303_abs_zscore_05.mat
S01/@default_study/results_wMNE_MEG_GRAD_MEG_MAG_KERNEL_150605_1303_abs_zscore_06.mat
S01/@default_study/results_wMNE_MEG_GRAD_MEG_MAG_KERNEL_150605_1303_abs_zscore_07.mat
S01/@default_study/results_wMNE_MEG_GRAD_MEG_MAG_KERNEL_150605_1303_abs_zscore_08.mat

Second anomaly: Some files in the @default_study folders are linked to averages in other folders. Have you been moving files around manually?

         FileName: 'S01/@default_study/results_wMNE_MEG_GRAD_MEG_MAG_150605_1303_abs_zscore_02_ssmooth.mat'
          Comment: 'MN: MEG ALL(Constr) | abs | zscored | ssmooth10a'
         DataFile: 'S01/click2_short/data_32768_average_150610_1658_resample.mat'
           isLink: 0
    HeadModelType: 'surface'

Third problem, which is more in the the range of data analysis advice: We do not recommend anymore rectifying the source maps before normalizing them.
See the updated tutorials:
https://neuroimage.usc.edu/brainstorm/Tutorials/SourceEstimation#Standardization_of_source_maps
http://neuroimage.usc.edu/brainstorm/Tutorials/Difference#Source_normalization
http://neuroimage.usc.edu/brainstorm/Tutorials/Workflows#Constrained_cortical_sources

What I recommend:

  1. File > Delete protocol > Only detach from database
  2. Delete protocol/data/protocol.mat
  3. Delete manually ALL the files "results_.mat" from all the S/@default_study folders.
  4. Load again the protocol: menu File > Load protocol > Load from folder.
  5. Read the new tutorials #20-27
  6. Re-do your source analysis in a clean way

Have fun :slight_smile:
Francois

Thanks Francois!

I clearly messed things up when I moved in source space…not sure why - yes, I did create/moved file via (now clearly) inefficient scripting.

I’ll definitely do all the cleaning you suggested!

Thanks again,

vale

Hi Francois,

I have a similar problem with slow saves and have tried the recommended steps above. My protocol.mat is too big (8.4GB!), but I don’t have any head model problems as described in the previous post, I just have a lot of files.

I have defined 20 epoch types, and each subject has 4 runs. In order to be able to make a source model for each of the 4 runs separately, which I believe is the best way to do it for accurate source localization, I have further separated the epochs within each session by the four runs. That means each session has 80 conditions. So,

90 sessions x 80 conditions (different epoch types) for each, and each has 450-900 epochs in it = big protocol.mat.

I am currently trying to run artefact rejection and make an average on the non-rejected epochs. Each of these steps is quite short, but then it spends a very long time saving the database. Thereafter I can delete or detach the individual epochs and just work with the averages, but in the meantime every save is impossibly long.

Is there some way I can do these steps without saving the database until the end? Or do you have another suggestion?

Best regards,

Emily

P.S. Happy New Year!

Hi Emily,

This means that you have more than 7 million files in your database? And if you have sources computed for each subject, it doubles the number of referenced files. Indeed, this is complicated to handle…

I don’t have any optimization to suggest at this stage. What you could have done while importing the data was to import the epochs, average and delete the single epochs for each subject, instead of importing everything first. I guess you have already realized that :slight_smile:

As you suggested, one hack could be to prevent temporarily the database from saving this 8Gb file at the end of each process. Edit the file brainstorm3/toolbox/db/db_save.m, add a line “return;” at line 23. Run everything you need to, restore the original db_save.m, and run it manually (type “db_save” in the Matlab command window).
Once you start getting rid of the single epochs, the size of the protocol.mat file should decrease quickly.

Another thing that might help decreasing the size of this file is to delete the inverse kernels. Compute all your sensor-level averages (one per subject / acquisition run / condition), then compute again the sources. You don’t need the single epochs for estimating the sources for the run-level averages.
Example: http://neuroimage.usc.edu/brainstorm/Tutorials/VisualGroup

Happy new year!
Francois

I’m afraid so. And yes, I realized the alternative too late! :’(

Thanks for the quick response, I’ll try the no-save version.

Emily

Footnote: turning off the save until the end is very helpful, I’ve just been saving it after critical steps and this reduces analysis time quite considerably.