I'm have a simple piece of code:
results={};
Do[If[!FailedQ[g = grab[i]], AppendTo[results, g]], {i, 2000000, 3000000}]
I had thought that the simplest way to parallelize this would be
LaunchKernels[]
SetSharedVariable[results]
And then to rerun the former with ParallelDo[] instead of Do[].
But this doesn't work. What is the correct and simplest way to do parallelize a trivial accumulation loop like this?
Here is the function for testing:
FailedQ[expr_] := FailedQ[expr, 0]
FailedQ[expr_, d : _Integer | \[Infinity]] := !FreeQ[expr, $FailedSymbols, {0, d}]
grab[n_] := Quiet @ Module[{u,r,i},
u = "http://photo.net/photodb/photo?photo_id="<>ToString[n];
Check[
r = First@StringCases[Import[u,"HTML"],"ratings, "~~Shortest[s__]~~" average":>ToExpression[StringDrop[s,-2]]];
i = Import["http://gallery.photo.net/photo/" <> ToString[n] <> "-lg.jpg", "Image"], Return @ $Failed];
Return[{n,i,r}]]
grabis stateful or has other side-effects, it might be far from trivial. – Oleksandr R. Jul 23 '15 at 22:38FailedQ? Note thatReapandSoware far more efficient at list accumulation problems thanAppendTo, sinceAppendTo's performance degrades as the list gets larger (I think). – DumpsterDoofus Jul 24 '15 at 01:26URLFetchAsynchronousand friends.Parallel*functions are not the right way to do it, because you'll spend most of your time just waiting for the server to respond/file transfer. I useURLFetchAsynchronoushere. – C. E. Jul 24 '15 at 02:01URLFetchAsynchronousto use a specific number of threads. Each time you callURLFetchAsynchronousyou start a new job in the background. (There may be limits on how many background jobs that can run at once, I don't know what it is.) These functions are rather low level, I'm not sure what a function that loads images conditionally would look like?! Anyway, I would propose to first useURLFetchAsynchronousto get all the HTML that you require for your tests, then based on that create a list of images you want to download. Then useURLSaveAs..orURLFetchAs..– C. E. Jul 24 '15 at 02:56