In the function I propose, I build an association with keys that are not supposed to be evaluated. There are some issues with this, see this answer by Taliesin, with the following quote.
generally this just sounds like a dangerous and confusing game to play, to me.
I think the function presented in this answer deals with the complications you mention reasonably well. It uses an option to set the levelspec. To see how patterns are handled, see the section Verbatim.
Concerning point (1),(2) and (3): Before there were a lot of additional complications. But I now that we have Association we no longer have to deal with those. Making this work with held expressions is just a matter of being thorough in surrounding expressions with Hold and Unevaluated. My intuition is also that Association should have better performance than a Dispatch table or something similar. An Association should be unbeatable in terms of how long it takes to look up a particular sub-expression. But maybe we should do a proper comparison.
ClearAll[positionIndexGeneral]
Options[positionIndexGeneral] = {Heads -> True};
SetAttributes[positionIndexGeneral, HoldAll];
positionIndexGeneral[expr_, lev_: {1,Infinity}, OptionsPattern[]] :=
Module[{subExprs, positions, len, together, gathered, hGathered,
gatheredSubExprs, gatheredPos},
subExprs =
Level[Unevaluated@expr, lev, Hold, Heads -> OptionValue[Heads]];
positions =
Position[Unevaluated@expr, _, lev, Heads -> OptionValue[Heads]];
len = subExprs // Length;
together = Transpose[{List @@ Hold /@ subExprs, positions}];
gathered = GatherBy[together, First];
hGathered = Hold@Evaluate@gathered;
gatheredSubExprs = hGathered[[All, All, 1, 1, 1]];
gatheredPos = gathered[[All, All, 2]];
AssociationThread @@ {Unevaluated @@ gatheredSubExprs, gatheredPos}]
Example:
a = 3;
positionAssoc =
positionIndexGeneral[{a, 2, {3, 4, a}}]
positionAssoc[Unevaluated[a]]
{{1},{3,3}}
corresponding to
Position[Unevaluated@{a, 2, {3, 4, a}}, Unevaluated[a]]
{{1},{3,3}}
Verbatim
Note that in general we are simulating how Position works with Verbatim.
positionAssoc = positionIndexGeneral[{a, 2, {3, 4, a_}}]
positionAssoc[a_]
{{3,3}}
Corresponding to
Position[Unevaluated@{a, 2, {3, 4, a_}}, Verbatim[a_]]
{{3,3}}
To simulate how Position works without Verbatim in this way is probably not very useful. There are infinitely many patterns against which an expression can be tested, so of course we cannot make a big lookup table. For a very specific pattern like List | Hold we might make some specialised code that looks up both List and Hold in the association.
Timing
My function can kind of compete with a specialised function by Mr.Wiz in the 1D case, and of course it dwarfs the built in PositionIndex for large data.
f[x_] := AssociationThread @@ {Hold[
Unevaluated[x]][[1, {1}, #[[All, 1]]]], #} &@
GatherBy[Range@Length@x, Hold[x][[{1}, #]] &]
Now let's make some data and compare
data = RandomInteger[999, 1*^5];
(jacobGen =
positionIndexGeneral[Evaluate@data, {1, 1}, Heads -> False]) //
Timing // First
(mma1D = PositionIndex[data]) // Timing // First
(wiz1D = f[data]) // Timing // First
Position[data, 115] === jacobGen[115] === List /@ wiz1D[115] ===
List /@ mma1D[115]
0.214873
0.174309
0.164100
True
data = RandomInteger[10, 1*^5];
(jacobGen =
positionIndexGeneral[Evaluate@data, {1, 1}, Heads -> False]) //
Timing // First
(mma1D = PositionIndex[data]) // Timing // First
(wiz1D = f[data]) // Timing // First
0.235508
4.119624
0.153041
data = RandomInteger[10, 1*^6];
(jacobGen = positionIndexGeneral[data, {1, 1}, Heads -> False]) //
Timing // First
(wiz1D = f[data]) // Timing // First
2.294256
1.703060
Possible improvement
When we only want the expressions at level 1, the function provided by Mr.Wizard is faster. With some good metaprogramming it should be possible to get the best of both worlds.
Appendix
It would of course have been cooler to write something like
ClearAll[positionIndexGeneral]
Options[positionIndexGeneral] = {Heads -> True};
positionIndexGeneral[expr_, lev_: {1,Infinity}, OptionsPattern[]] :=
AssociationThread @@
{
Unevaluated @@ #2[[All, All, 1, 1, 1]]
,
Function[{x}, x[[#]] & /@ #[[All, All, 2]]]@
Position[expr, _, lev, Heads -> OptionValue[Heads]]
} &[#, Hold@Evaluate@#] &@
GatherBy[Transpose[{List @@ Hold /@ #, Range[# // Length]}] &@
Level[expr, lev, Hold, Heads -> OptionValue[Heads]], First]
but I prefer the style with Module(/Block when possible) for debugging, as well as to immediately see what happens first.
Position[]-users... :-( – Rod May 21 '13 at 13:23Positionis well used too, and I certainly use it a lot myself (not implying that makes it right). Rather, certain uses of it are quite slow. I hope to bring attention to the issue and perhaps develop a useable general function. – Mr.Wizard May 21 '13 at 13:26Position. So, I believePositionis used more often than you think by the vast majority of users (not only unexperienced ones)... – Rod May 21 '13 at 13:32Positionis commonly used by many people, and many of its applications are good. I don't mean to imply thatPositionitself is slow, only a certain use where it is used repetitively. I want to make sure that is clear. – Mr.Wizard May 21 '13 at 13:36Positionafter participating here. I think this site has warped my sensibilities. :D – rcollyer May 21 '13 at 13:56PositionFunctionwere (only) beneficial in the case were one has repeated position queries about the same input expr. The equivalent is certainly the case forNearestFunctionandInterpolationFunction. What is a (strong) usage case for aPositionFunctionthat meets this criterion? – May 22 '13 at 07:50a = RandomInteger[{1, 1000}, {5000}]; Table[{i, Position[a, i]}, {i, 1, 1000}] // Timing // Firstwhen magnitudes faster would beSort[{#[[1, 1]], #[[All, 2]]} & /@ GatherBy[Transpose[{a, Range@Length@a}], First]] // Timing // First– Mr.Wizard May 22 '13 at 07:56(* retruns a PositionFunction[data] which is listable *) pf = Position[a,levelSpec,n,opts]; pf[Range[10] (*pattern*)] { {{2}, {3}, {7}, {19}, {33}, {35}, {37}, {41}, {50}}, {{1}, {39}, {42}}, {}, ... }– May 22 '13 at 08:39pf /@ Range[10]– May 22 '13 at 08:57PositionFunction, and that is thatNearestandInterpolationare "subject matter" functions whereasPositionis more a language thing. E.g. hypothetically,Nearestcould be constructed from a variety of data structures because all that matters is the mathematical distance between one unit and another, whereasPositionis entirely about the specific list structure you are using. Not saying I'd agree with this, but there is this conceptual difference which may have something to do with the absence of aPositionFunction. – amr May 23 '13 at 01:52GatherByinversion trick, but I think we cannot efficiently use it in the general case. The only thing that is missing now is to use your idea when level 1 andHeads-> Falseis specified, but I'd say that is a minor point. – Jacob Akkerboom Jul 16 '14 at 12:02