Parallelize import of many files

Question

I'm trying to speed up the import of a large number of files, using ParallelTable to store them in a indexed variable, eqDump. The files are in the folder "Frames" and are named "Conf_Run.1000", "Conf_Run.2000", ... Here is what I've tried,

    Clear[eqDump];    
    SetSharedFunction[eqDump];
    ParallelTable[
      eqDump[t] = Import["Frames/Conf_Run." <> ToString[t]
                    ,"Table"
                    ,HeaderLines -> 9]
    ,{t, 5000, 1000000, 5000}];

But the execution doesn't even seems to start, the kernels remain idle. I don't know whats happening, since I think it should work in the same way as here for example. I've tried also to SetSharedVariable[t] since I supposed each kernel should know the current t value, but doesn't seem to help.

I'm using Mathematica 10.0.0 on Linux system with a 3-core (6-threads) CPU. Two 4GB memory modules that should be working in dual-channel. Only one HDD.

Thank you very much!

Are you working on a computer system with a parallel I/O capability ? And I mean parallel all the way from RAM to the metal so multiple drives, multiple channels, ... — High Performance Mark, May 29 '20 at 10:43
You may have to provide the full path name.Please also indicate with operating system and which Mathematica version you use. BTW: My experience with SetSharedFunction is that there usually is a better (more efficient) way of doing what I want to do. In your case I would not use SetSharedFunction at all, but try something like res=ParallelTable[ Inactive[Set][ eqDump[t] , Import[... And then on the result, in the main session, use Activate[res] — Rolf Mertig, May 29 '20 at 11:22
Thanks both for fast comment! @HighPerformanceMark Actually didn't thought about that... maybe thats the thing. — kl0z, May 29 '20 at 11:59
@RolfMertig Updated question with system specifications. Will try what you suggest. I think there is no problem with the path since I previously issued a SetDirectory[] and got no errors when doing Table[]. Will try also justo to be sure. — kl0z, May 29 '20 at 12:00
"Only one HDD." So there is no hope to parallelize this. The HDD is by far the slowest resource and you have only one of them. — Henrik Schumacher, May 29 '20 at 12:18
If you have only the one HDD it probably helps if you have only one process reading data at a time, and doing so in as large chunks as your RAM and your computation can cope with. Same for writing. The situation might be different with SSDs, but on consumer-grade hardware they seem often to have the same controllers as used for HDDs which makes them no more useful for parallel I/O. — High Performance Mark, May 29 '20 at 12:34
The other situation where you might benefit from 'parallelisation' on a computer with only one HDD is where computation time for each file read is significant. You might then see some benefit if Process 1 reads its file, then goes off for a while to think about it, leaving Process 2 to play with the disk, ..., and so on. But this requires careful orchestration. — High Performance Mark, May 29 '20 at 12:37

Parallelize import of many files

0 Answers0