Warning for too long "Search database" request

pantazis · February 4, 2020, 10:35pm

Hi Martin,

When using the 'search database' request, it is too easy to accidentally initiate a too broad request and then have to wait several minutes. Is it possible to add a warning that the output will return too many items and allow the user to revise before repopulating the entire database? Perhaps if over 500 items?

For example, I was hoping to get a few avg trials but somehow I messed up my and/or operators and I am waiting for the entire database to expand...

Thank you
Dimitrios

MartinC · February 4, 2020, 11:57pm

Good suggestion, Dimitrios! I'll have a look to see how much work this would take to implement. The best would probably to have a way for users to stop execution if they feel it's taking too long. For now, you can press Control + C in the Matlab command window to stop execution and then close your search tab (not a pretty way but it works).

Thanks!
Martin

MartinC · February 5, 2020, 4:51pm

Hi Dimitrios,

I found some ways to optimize the execution time of the "Search database" results display, but if it's already taking minutes it definitely won't be enough. Do you mind posting a profiler result of a search to help me dig further? (Note: please update Brainstorm first)

I looked into adding a Cancel button to the progress bar but I see this was an avenue @Francois had already tried and disabled so there must have been issues with it. If most of the time is spent in the expanding step then I could disable it for queries that return more than a certain number of items. Other avenues such as performing the search completely before populating the database tree would require major refactoring.

Thanks,
Martin

pantazis · February 5, 2020, 5:17pm

Hi Martin,

Sorry for the basic question, but how do I run it as a script, what should the input sFiles be? The script below runs empty. Do I need to run a process 'select files' first?

sFiles = [];

% Process: Select files using search query
sFiles = bst_process('CallProcess', 'process_select_search', sFiles, [], ...
'search', '(([type EQUALS {"Results", "Link"}] AND [parent CONTAINS "faces"] AND [parent CONTAINS "Avg"]))');

Thank you,
Dimitrios

MartinC · February 5, 2020, 5:45pm

Hi Dimitrios,

Yes, as mentioned in this thread: Select recordings by trial group, for the time being you need to run a Select Files process first to populate the sFiles variable before using the Search query process. This is on my To-Do list...

However, this should not be related to the issue you reported in this thread, unless I am misunderstanding something. When running the process, it does not update the database tree so there should not be slowdowns. It should be very fast this way actually. The slowdown should be coming from when you use the Search dialog from the GUI, and this is what I was asking you to profile.

Martin

pantazis · February 5, 2020, 6:58pm

Hi Martin,

I see, so I used profiler to time the GUI in a database of 20 subjects, each having 5 conditions/folders (faces, objects, bodies, scenes, scrambled) with about 300 trials each.

(([type EQUALS {"Results", "Link"}] AND [path CONTAINS "faces"]) OR [path CONTAINS "objects"])

This run for about 11 minutes before completion. Here is the profiler result, with org.brainstorm.tree.BstTree taking the majority of time:

I also run another GUI search using Avg to limit the number of output files, which took about 2.5 minutes to finish:

(([type EQUALS {"Results", "Link"}] AND [parent CONTAINS "Avg"]))

The profiler result:

Anything you can do to make them faster, or at least warn the user of the lengthy time, would be appreciated

Cheers,
Dimitrios

Francois · February 5, 2020, 7:05pm

I checked for generic options to interrupt the current Matlab thread (the equivalent of a CTRL+C) from another thread at least 4 years ago.
Maybe there are new solutions available now.

And otherwise, it is still possible to write custom wait bars for some processes with many iterations, which would check explicitly at each iteration if the cancel button was clicked.

MartinC · February 5, 2020, 8:30pm

Thanks Dimitrios. I do not think you updated Brainstorm before you ran the profiler to take advantage of the optimization I added this morning, hopefully it's a bit better now. As expected, the ExpandAll function is the real bottleneck. I will count the number of files returned and only expand the whole database if less there are less than 500 files as you suggested.

I like your idea François of writing a custom progress bar that is checked at every iteration (I guess in our case every database node created). We can go down that route if the search is still deemed too slow after these improvements.

Thanks to both!
Martin

MartinC · February 5, 2020, 9:48pm

Done in Database search: do not expand result if >500 files returned · brainstorm-tools/brainstorm3@c171508 · GitHub

MartinC · February 6, 2020, 7:24pm

Done in Database search: process can now be run without inputs · brainstorm-tools/brainstorm3@53cc6b2 · GitHub, thanks for the reminder. I tried to optimize it when you specify a file type in your search query (highly recommended on big datasets).

Do you mind helping me test this @pantazis?
@Francois Can you validate how I check for bad trials? I tried to mirror what was done in the other Select Files process, but they did not have statistic files.

Thanks!
Martin