0

I have a Do[] loop that references a few global variables but doesn't modify anything. The only output is appending data to an external file. The order of appending doesn't matter, I just can't have duplicates. How can I make this run in parallel? I've tried ParallelDo and Parallelize but for the life of me I can't figure out the syntax and they give errors. Here are the relevant parts of my code:

List1 = [*list one stuff*];
List2 = [*list two stuff*];
Table1 = [*table one stuff*];
max = Length[List2];

Do[ [a whole bunch of stuff using data from both lists and the table]; data >>> "output.txt";, {i,max}];

The code runs perfectly fine this way but if I change Do to ParallelDo it gives a ton of errors such as:

Part::partd :  Part specification ComputationalGeometry`List2[[1]] is longer than depth of object.
Part::pkspec1 :  The expression ComputationalGeometry`List2 cannot be used as a part specification.
General::stop :  Further output of Part::partd will be suppressed during this calculation.
Part::partw :  Part 3 of ComputationalGeometry`List2[[1]] does not exist.

These were occuring for each kernel running.

I couldn't figure out the syntax for Parallelize at all. My data files are huge so I need all parallel processing to use the same global variables. I can't just start another notebook instance and run them manually, I already have to buy more RAM as is.

Is there any way to manually specify what I want run on each kernel? It would be so easy if I could just write:

calc[n_,m_] := Do[
[*a whole bunch of stuff using data from both lists and the table*];
data >>> "output.txt";,
{i,n,m}];

RunParallel[calc[1,250],calc[251,500],calc[501,750],calc[751,1000]];

  • 1
    You may want to start by checking out Why won't Parallelize speed up my code?. For a more specific answer we will probably need to see a bit more of your code. Can you make a minimal working example out of it? – MarcoB May 26 '22 at 02:09
  • 1
    Writing to disk is, without the kind of hardware found on supercomputers and clusters, an inherently serial operation. It's also a slow operation and without evidence to the contrary I always (cynical, me!) expect disk i/o to dominate execution times. Distributing the work across threads / processes / whatever-you-call-them may very well make computation faster but then, ...., well imagine twelve trains leaving the platforms of a large terminus simultaneously and trying to get onto the only outbound main line. At best the signalling system will ensure orderly serialisation of the trains ... – High Performance Mark May 26 '22 at 08:44
  • Incidentally, I made my previous comment because the answer @MarcoB points us to doesn't explicitly call out i/o as a hindrance to parallel speed up. – High Performance Mark May 26 '22 at 08:46
  • 1
    @HighPerformanceMark the excellent answer linked is a "community wiki", so we can all contribute to it. Also, new answers will be welcomed. I think, for instance, that it would be nice to have some debugging tips for Parallel work. – rhermans May 26 '22 at 08:49

0 Answers0