2
apple = {1, 2, 3};
sapple = 1;
banana = {10, 20, 30};
sbanana = 10;
kiwi = {100, 200, 300};
skiwi = 100;
data = {"apple", "banana", "kiwi"};

myfun[data_, scale_] := Total[data]/scale
myfun[apple, sapple]
myfun[banana, sbanana]
myfun[kiwi, skiwi]
myfun[ToExpression[#], ToExpression["s" <> ToString[#]]] & /@ data

In parallel, it still works, but with a warning:

ParallelMap[myfun[ToExpression[#], ToExpression["s" <> ToString[#]]] &, data]

enter image description here

What's the correct way of coding this?

WARNING:

@AlbertRetey commented below that

it probably is worth mentioning that while you do get the expected result the code is not evaluated in the parallel kernels but on the master, which is most probably not what you intended. What happens is that the parallel kernels return the unevaluated expressions which then are evaluated on the master...

I am not sure whether this is true. However, I think this is an extremely important observation as MMA DOES NOT tell you this. So users will be cheated by the impression that MMA still works in parallel.

Chen Stats Yu
  • 4,986
  • 2
  • 24
  • 50
  • 1
    I suspect you need to DistributeDefintions for apple, sapple... etc. They are only defined on your master kernel, not the slaves. – Ymareth Jan 14 '15 at 15:02
  • Related: (6511) (See Related links there too.) – Mr.Wizard Jan 14 '15 at 15:11
  • 1
    it probably is worth mentioning that while you do get the expected result the code is not evaluated in the parallel kernels but on the master, which is most probably not what you intended. What happens is that the parallel kernels return the unevaluated expressions which then are evaluated on the master... – Albert Retey Jan 16 '15 at 09:44
  • @AlbertRetey I think that is an extremely important thing you pointed out as MMA DOES not mention this information!! – Chen Stats Yu Jan 17 '15 at 00:36
  • actually the Total::normal warning messages indicate just that (that's why I made that comment). It is of course a quite consealed message. I'm not sure whether we could expect Mathematica to do something smarter here, though... – Albert Retey Jan 17 '15 at 01:30
  • @Albert I do not believe that you are correct. If you are then Mathematica is using trickery to make us believe otherwise. Two examples: (1) ParallelMap[$KernelID &, Range@12] shows that in the environment in which the function is evaluated the $KernelID varies and (2) ParallelMap[Pause[1] &, Range@12] // AbsoluteTiming (also in "wall clock" time) shows apparent parallel evaluation. – Mr.Wizard Jan 17 '15 at 01:31
  • @Albert Or do you mean that these examples are parallel but the OP's example is not? – Mr.Wizard Jan 17 '15 at 01:33
  • to be precise the evaluation is partially done on the parallel kernels and partially on the master: the problem is that on the parallel kernels Total is called with an undefined symbol as argument (because automatic distribution can't be done as ParallelMap doesn't see symbols but only strings). With a symbol Total gives the error messages as shown and each kernel returns e.g. Total[kiwi]/skiwi, the partially evaluated result. On the master, there are definitions for kiwi and skiwi, so it now starts its own evaluation which will not give messages and return the expected results. – Albert Retey Jan 17 '15 at 01:38
  • @Mr.Wizard: you can see what happens when you make the following additional definitions: total[data : {__?NumericQ}] := (Print[$KernelID]; data); myfun[data_, scale_] := (total[data]/scale) – Albert Retey Jan 17 '15 at 01:39
  • @Albert You mean with the OP's troubled code or with the code in my answer? If the latter I'm still not seeing it. – Mr.Wizard Jan 17 '15 at 01:43
  • 1
    @Mr.Wizard: I'm just talking about the OPs troubled code. Your code seems to be OK. I first was tricked to believe it had the same problem but it doesn't: the reason is that in your case ParallelMap sees the symbols d and s and can autodistribute them to the parallel kernels... – Albert Retey Jan 17 '15 at 01:47
  • 2
    @Albert Okay, I'm glad we're on the same page. Yes, that's how my code fixed the OP's problem, even though I didn't state it. Perhaps I should have. Nevertheless there are other reason to prefer "indexed objects" (DownValues) over a long list of Symbols, so I chose to simply recommend the (IMO) superior format without justification. As always I would be happy to attempt to explain further if asked. – Mr.Wizard Jan 17 '15 at 01:50

1 Answers1

3

I would rethink your data format. Consider using "indexed objects" (DownValues) or perhaps Associations. One example:

d["apple"]  = {1, 2, 3};
s["apple"]  = 1;
d["banana"] = {10, 20, 30};
s["banana"] = 10;
d["kiwi"]   = {100, 200, 300};
s["kiwi"]   = 100;
data        = {"apple", "banana", "kiwi"};

myfun[data_, scale_] := Total[data]/scale

ParallelMap[myfun[d[#], s[#]] &, data]
{6, 6, 6}

Evidence that my code is running in parallel:

ParallelMap[(Pause[1]; myfun[d[#], s[#]]) &, data] // AbsoluteTiming

{1.016058, {6, 6, 6}}

Manual timing also confirms this result.

Mr.Wizard
  • 271,378
  • 34
  • 587
  • 1,371
  • Thanks! I will wait to see if there is a better alternative. otherwise, I might have to rewrite hundreds of datasets using the DownValues. – Chen Stats Yu Jan 14 '15 at 15:51
  • @Chen Please don't feel rushed to Accept this answer. Take your time; you may like other answered better. I will say however that in the long run it is much easier to access data by a list of keys than a list of Symbols that constantly "want" to evaluate at the wrong time. – Mr.Wizard Jan 14 '15 at 19:08
  • 1
    I can always change my mind about the correct answer :). For now, i used a loop to loop all data into the DownValues. So it's kind of sorted out. – Chen Stats Yu Jan 14 '15 at 19:14