12

I have two lists, list A and list B. All of the elements of A are members of B. The order of B is fixed but the order of A is random. I need to arrange the elements in A as given in B.

Example let A

A={5,2,8}

and B is

B={9,2,6,8,5,1}

I need to reorder A to be:

A={2,8,5}

I'm looking for a clever one-liner. This is similar to How to order a list to match the order of another list? but there the two lists have the same length. I tried some variations on adding some padding.

From the clever solutions below I have settled on using:

OrderLike[A_List,B_List]:=Cases[Alternatives @@ A]@B;

Which has been shown to be the fasted and most poetic. (However, Its poeticness is a subjective matter unlike its speed.)

kglr
  • 394,356
  • 18
  • 477
  • 896
c186282
  • 1,402
  • 9
  • 17

2 Answers2

14
SortBy[Position[B, #]&]@A

{2, 8, 5}

and

SortBy[PositionIndex @ B] @ A (* thanks: WReach *)

{2, 8, 5}

Also,

Cases[Alternatives @@ A] @ B
Select[MatchQ[Alternatives @@ A]] @ B
DeleteCases[Except[Alternatives @@ A]] @ B

Update: Some timing comparisons:

f1[a_, b_] := SortBy[FirstPosition[b, #] &]@a;
f2[a_, b_] := SortBy[Position[b, #] &]@a;
f3[a_, b_] := SortBy[PositionIndex @ b]@a;
f4[a_, b_] := Cases[Alternatives @@ a] @ b;
f5[a_, b_] := Select[MatchQ[Alternatives @@ a]] @ b;
f6[a_, b_] := DeleteCases[Except[Alternatives @@ a]] @b;
funcs = {"f1", "f2", "f3", "f4", "f5", "f6"};

SeedRandom[1]
nb = 10000;
na = 5;
b = RandomSample[Range[10^6], nb];
a = RandomSample[b, na];
t1 = First@RepeatedTiming[r1 = f1[a, b];];
t2 = First@RepeatedTiming[r2 = f2[a, b];];
t3 = First@RepeatedTiming[r3 = f3[a, b];];
t4 = First@RepeatedTiming[r4 = f4[a, b];];
t5 = First@RepeatedTiming[r5 = f5[a, b];];
t6 = First@RepeatedTiming[r6 = f6[a, b];];
r1 == r2 == r3 == r4 == r5 == r6

True

timings = {t1, t2, t3, t4, t5, t6};
Grid[Prepend[SortBy[Last]@Transpose[{funcs, timings}], {"function", "timing"}], 
 Dividers -> All]

$\begin{array}{|c|c|} \hline \text{function} & \text{timing} \\ \hline \text{f4} & 0.0009 \\ \hline \text{f6} & 0.0028 \\ \hline \text{f1} & 0.003 \\ \hline \text{f2} & 0.0034 \\ \hline \text{f5} & 0.004 \\ \hline \text{f3} & 0.012 \\ \hline \end{array}$

With na = 1000 we get

$\begin{array}{|c|c|} \hline \text{function} & \text{timing} \\ \hline \text{f4} & 0.0014 \\ \hline \text{f3} & 0.016 \\ \hline \text{f2} & 0.0737 \\ \hline \text{f6} & 0.117 \\ \hline \text{f5} & 0.118 \\ \hline \text{f1} & 0.59 \\ \hline \end{array}$

kglr
  • 394,356
  • 18
  • 477
  • 896
  • 3
    Cases[B, Alternatives@@A] is real-world poetry. – Roman May 04 '19 at 06:12
  • 4
    +1. Also: SortBy[A, PositionIndex[B]] – WReach May 04 '19 at 06:38
  • I think the ony valid solutions are the SortBy ones; the others do not work when there are repeated elements in A – Fortsaint May 04 '19 at 08:44
  • @Fortsaint, I think with duplicate elements in A (and in B), SortBy methods do not work (other methods do work). – kglr May 04 '19 at 09:10
  • @kglr Consider this: A = {5, 2, 8, 5, 8}, B = {9, 2, 6, 8, 5, 1}. The SortBy methods gives {2, 8, 8, 5, 5}while the Cases/Select methods gives {2, 8, 5}. Which is right? I still think the SortBy are. He needs ".. to arrange the elements in A as given in B.". Deleting duplicates is not rearranging – Fortsaint May 04 '19 at 09:37
  • @Fortsaint, I read (I may be wrong) "All of the elements of A are members of B" as B is A with additional elements shuffled. So, for your example modified to make B have the same duplications, say, A = {5, 2, 8, 5, 8}; B = {9, 2, 6, 8,5,8, 5, 1} ; we get {2, 8, 8, 5, 5} from SortBy[PositionIndex@B]@A and {2, 8, 5, 8, 5} from Cases/Select/DeleteCases. – kglr May 04 '19 at 09:52
  • 2
    @WReach good idea to use an Association in SortBy. It's not in the documentation and looks very useful for future reference. – Roman May 04 '19 at 12:52
  • 1
    Beautiful one-liners! I did not specify in my question but in my case, B only has unique elements. I have gone with the following solution: OrderLike[A_List,B_List]:=SortBy[A,FirstPosition[B,#]&]; I took out some of the syntactic sugar for us mere mortals. – c186282 May 04 '19 at 16:49
  • @c186282 nice final implementation! Great way to cross simplicity with explanation. I love one-liners. I'm commenting to compliment this thread, and also to ask that you post this final code in your question, possibly crediting those that helped? So it doesn't get lost in the comments :D – CA Trevillian May 04 '19 at 17:14
  • @Roman & WReach & kglr, could you provide some wiki input on this technique? I recognize the power of this combination, but I would like more explanation on how you all view it. Does it provide for a speed-up, or a more applicable nature (generality in implementation)? – CA Trevillian May 04 '19 at 17:17
  • 1
    @c186282 it looks like you've chosen the slowest of all proposed methods. The one you picked has a runtime that scales quadratically with the lengths of the lists, whereas other methods scale linearly. – Roman May 04 '19 at 18:44
  • @Roman Bummer! I'll have to look more closely at the others. Luckily, for my application, the lengths of my lists will not get very large $<100$ – c186282 May 04 '19 at 18:54
  • @CATrevillian, updated with some timing comparisons. – kglr May 04 '19 at 19:17
  • ... and just noticed that Roman already posted a more detailed comparison half hour earlier:) – kglr May 04 '19 at 19:21
6

benchmarks

No new methods here, only benchmarks of methods given in @kglr's answer.

Clear[timings];
timings[n_Integer] := timings[n] = 
  Module[{A, B, a1, a2, a3, a4, a5, a6, t1, t2, t3, t4, t5, t6},
    B = RandomSample[Range[n]];
    A = RandomSample[B, Floor[n/2]];
    t1 = First[AbsoluteTiming[a1 = SortBy[Position[B, #] &]@A;]];
    t2 = First[AbsoluteTiming[a2 = SortBy[A, FirstPosition[B, #] &];]];
    t3 = First[AbsoluteTiming[a3 = SortBy[PositionIndex@B]@A;]];
    t4 = First[AbsoluteTiming[a4 = Cases[Alternatives @@ A]@B;]];
    t5 = First[AbsoluteTiming[a5 = Select[MatchQ[Alternatives @@ A]]@B;]];
    t6 = First[AbsoluteTiming[a6 = DeleteCases[Except[Alternatives @@ A]]@B;]];
    If[a1 == a2 == a3 == a4 == a5 == a6, {t1, t2, t3, t4, t5, t6}, $Failed]]

ListLogLogPlot[Transpose[Table[Thread[{n, timings[n]}],
  {n, Round[10^Range[3, 9/2, 1/4]]}]],
  Joined -> True, PlotLegends -> Range[6],
  Frame -> True, FrameLabel -> {"n", "time [s]"}]

enter image description here

Observations:

  • Methods 5 and 6 are almost indistinguishable in their timings.
  • Methods 3 & 4 scale linearly with $n$.
  • Methods 1, 2, 5, 6 scale quadratically with $n$.
  • Method 4 is the absolute front-runner: Cases[Alternatives @@ A]@B. Fast & poetic.
Roman
  • 47,322
  • 2
  • 55
  • 121