Hello,
TL:DR: i need to run MEM on a lot of files in a short amount of time. To do that, I am trying to run it on our local cluster but have hard time to do that.
Long story:
I am trying to launch some computation on our cluster and I need to have access to some brainstorm function (not the database).
I exported the data from the database, into a mat file; and created a script ('run_MEM') calling the appropriate MEM functions.
I am able to run my script using:
matlab -nodisplay -nosplash -nodesktop -r "run_MEM('08-Apr-2024/in/data_sim_240404_1672.mat','08-Apr-2024/in/wMEM_options.json')"
where data_sim_240404_1672.mat contains the data coming from Brainstorm, and wMEM_options.json contains the options of the MEM process.
However, to use our HPC i need to use something like echo matlab -nojvm -nodisplay -nosplash -nodesktop -r \"run ./m.m\" | qsub -j y -o logs/mat.txt -V -cwd -q matlab.q -N matlabJob
(https://perform-wiki.concordia.ca/mediawiki/index.php/SGE_/_Batch-queuing_system)
I tried the following:
echo matlab -nodisplay -nosplash -nodesktop -r "/NAS/home/edelaire//Documents/Project/wMEM-fnirs/run_MEM\('08-Apr-2024/in/data_sim_240404_1672.mat','08-Apr-2024/in/wMEM_options.json'\)" | qsub -j y -o logs/mat.txt -V -cwd -q matlab.q -N matlabJob -cwd -S /bin/bash
Log:
/opt/sge/default/spool/perf-hpc07/job_scripts/180873: line 1: 3481723 Aborted (core dumped) /util/packages/matlab/R2021b/bin/matlab -nodisplay -nosplash -nodesktop -r /NAS/home/edelaire//Documents/Project/wMEM-fnirs/run_MEM\('08-Apr-2024/in/data_sim_240404_1672.mat','08-Apr-2024/in/wMEM_options.json'\)
Opening log file: /NAS/home/edelaire/java.log.30554
pure virtual method called
terminate called without an active exception
--------------------------------------------------------------------------------
abort() detected at Tue Apr 09 15:00:32 2024 -0400
--------------------------------------------------------------------------------
Configuration:
Crash Decoding : Disabled - No sandbox or build area path
Crash Mode : continue (default)
Default Encoding : UTF-8
GNU C Library : 2.31 stable
MATLAB Architecture : glnxa64
MATLAB Root : /util/packages/matlab/R2019b
MATLAB Version : 9.7.0.1190202 (R2019b)
Operating System : Ubuntu 20.04.6 LTS
Process ID : 3045210
Processor ID : x86 Family 6 Model 63 Stepping 2, GenuineIntel
Session Key : 85124635-f246-4fe8-96a0-4093e7b0a629
Static TLS mitigation : Disabled: Unnecessary
Window System : No active display
Fault Count: 1
Abnormal termination:
abort()
Register State (from fault):
RAX = 0000000000000000 RBX = 00001551ebfff700
RCX = 000015521000c00b RDX = 0000000000000000
RSP = 00001551ebff9620 RBP = 00001551ebffa1a0
RSI = 00001551ebff9620 RDI = 0000000000000002
R8 = 0000000000000000 R9 = 00001551ebff9620
R10 = 0000000000000008 R11 = 0000000000000246
R12 = 00001551ebff98f0 R13 = 00001551ebffa830
R14 = 00001551ebffa9e0 R15 = 00001551eda34800
RIP = 000015521000c00b EFL = 0000000000000246
CS = 0033 FS = 0000 GS = 0000
Stack Trace (from fault):
[ 0] 0x000015521000c00b /lib/x86_64-linux-gnu/libc.so.6+00274443 gsignal+00000203
[ 1] 0x000015520ffeb859 /lib/x86_64-linux-gnu/libc.so.6+00141401 abort+00000299
[ 2] 0x00001551f2e2c965 /util/packages/matlab/R2019b/sys/java/jre/glnxa64/jre/lib/amd64/server/libjvm.so+11549029
[ 3] 0x00001551f2e2ab86 /util/packages/matlab/R2019b/sys/java/jre/glnxa64/jre/lib/amd64/server/libjvm.so+11541382
[ 4] 0x00001551f2e2abd1 /util/packages/matlab/R2019b/sys/java/jre/glnxa64/jre/lib/amd64/server/libjvm.so+11541457
[ 5] 0x00001551f2e20f1f /util/packages/matlab/R2019b/sys/java/jre/glnxa64/jre/lib/amd64/server/libjvm.so+11501343
[ 6] 0x00001551f2c48a40 /util/packages/matlab/R2019b/sys/java/jre/glnxa64/jre/lib/amd64/server/libjvm.so+09566784
[ 7] 0x00001551f2dfa728 /util/packages/matlab/R2019b/sys/java/jre/glnxa64/jre/lib/amd64/server/libjvm.so+11343656
[ 8] 0x00001551f2dfc28f /util/packages/matlab/R2019b/sys/java/jre/glnxa64/jre/lib/amd64/server/libjvm.so+11350671
[ 9] 0x00001551f2c41995 /util/packages/matlab/R2019b/sys/java/jre/glnxa64/jre/lib/amd64/server/libjvm.so+09537941 JVM_handle_linux_signal+00000421
[ 10] 0x00001551f2c34858 /util/packages/matlab/R2019b/sys/java/jre/glnxa64/jre/lib/amd64/server/libjvm.so+09484376
[ 11] 0x0000155210767420 /lib/x86_64-linux-gnu/libpthread.so.0+00082976
[ 12] 0x00001551eccd3250 <unknown-module>+00000000
[ 13] 0x00001552104dd5ee /util/packages/matlab/R2019b/bin/glnxa64/../../sys/os/glnxa64/libstdc++.so.6+01095150 _ZNSo5flushEv+00000030
[ 14] 0x0000000000000000 <unknown-module>+00000000
** This crash report has been saved to disk as /NAS/home/edelaire/matlab_crash_dump.3045210-1 **
MATLAB is exiting because of fatal error
/opt/sge/default/spool/perf-hpc02/job_scripts/180874: line 1: 3045210 Killed matlab -nodisplay -nosplash -nodesktop -r /NAS/home/edelaire//Documents/Project/wMEM-fnirs/run_MEM\('08-Apr-2024/in/data_sim_240404_1672.mat','08-Apr-2024/in/wMEM_options.json'\)
Any suggestion on how I should proceed ?
Solution:
I created a bash script that contains the following code:
#!/bin/bash
module load matlab/R2019b
cd ~/Documents/Project/wMEM-fnirs
matlab -nodisplay -nosplash -nodesktop -r "run_MEM('08-Apr-2024/in/data_sim_240404_1672.mat','08-Apr-2024/in/wMEM_options.json')"
and that is started like that:
qsub -j y -o log.txt -pe smp 12 -S /bin/bash -cwd -q matlab.q -N FS ./start_hpc.sh
I guess now, I'll make the datapath, and option as parameters of that script and i should be able to easily launch qsub on all the data I need to localize
NOTE: in run_MEM; it seems important to specify the number of threads when opening the parpool; otherwise; it hang forever: in my case using: parpool(12);
More info here: https://forum.bic.mni.mcgill.ca/t/how-to-use-sge-with-matlab/1140
Edit: edited for clarity
Edouard