12

I have a bunch of datasets which look like:

3 7 1 6 5
8 2 4 1 2
8 1 5 2 5
...

and I need to make a list of pairs such that the first element makes a pair with other elements in the same line.

3 7 
3 1
3 6
...

I usually use Table because it is fast enough for $10^3$ elements, but as my dataset gets bigger, it's no longer efficient. For example:

data = RandomInteger[{1, 10}, {10^7, 5}];
Flatten[Table[{data[[i, 1]], data[[i, j]]}, {i, 1, Length@data}, {j, 
 2, 5}], 1]//AbsoluteTiming//First
(* 39.828744 *)

What is the most efficient way to find this pair list?

Timing

Finally I had a chance to do a timing on my real machine:

(* Kuba *)
0.726869
(* ciao *)
1.297926
(* nested table*)
2.146111
(* Shutao Tang *)
11.290944
(* Mr.Wizard♦ *)
57.232387

Although it seems results can be (completely) different if I do a timing on my old laptop!

Carl Woll
  • 130,679
  • 6
  • 243
  • 355
Mahdi
  • 1,619
  • 10
  • 23

6 Answers6

16

About 4x faster:

Partition[Flatten @ data[[All, {1, 2, 1, 3, 1, 4, 1, 5}]], 2]
Mr.Wizard
  • 271,378
  • 34
  • 587
  • 1,371
Kuba
  • 136,707
  • 13
  • 279
  • 740
  • 1
    Shouldn't you make this more general? – Mr.Wizard Apr 11 '15 at 22:32
  • It depends. If it is slow, probably doesn't matter. If it is fast, ok. But atm I'm confused because I'm testing your and rasher's method with result that your is 20x slower than OP's. So as soon as I figure it out I will tell you :P – Kuba Apr 11 '15 at 22:35
  • LOL -- looks like I wrote a function for the wrong shape of data. Okay, +1, and back to the drawing board. – Mr.Wizard Apr 11 '15 at 22:43
  • @Mr.Wizard Ok :) I have to go but I will make it more general tomorrow, if it is worth it :). Feel free to delete my non relevant comments when you change something. Good night. :) – Kuba Apr 11 '15 at 22:46
  • Ah, clean. I feel herp-a-derp for not thinking of that... +1 – ciao Apr 12 '15 at 01:54
10

I think you'll find this faster:

pairem[data_] := 
 Module[{c = ConstantArray[0, 8*(Length@data)], 
   p = Flatten@data[[All, 2 ;;]], p2 = data[[All, 1]]},
  c[[2 ;; ;; 2]] = p;
  c[[1 ;; ;; 2]] = Flatten@Transpose[ConstantArray[p2, 4]];
  Partition[c, 2]]
ciao
  • 25,774
  • 2
  • 58
  • 139
  • @Pickett I don't know, I still get only 2.5 speed up here, maybe it is hardware related. – Kuba Apr 11 '15 at 22:27
5

Edit: this method is optimized for long sublists which is exactly the opposite of your example. Sorry. I'll post if I find anything applicable.


This is fairly clean and on larger data at least as fast as rasher's code:

fn = ArrayFlatten[{{First@#, Rest@# ~Partition~ 1 }}] &;

Test:

fn /@ {{3, 7, 1, 6, 5}, {8, 2, 4, 1, 2}, {8, 1, 5, 2, 5}}
{{{3, 7}, {3, 1}, {3, 6}, {3, 5}},
 {{8, 2}, {8, 4}, {8, 1}, {8, 2}},
 {{8, 1}, {8, 5}, {8, 2}, {8, 5}}}
x = RandomInteger[99, {5000, 5000}];

fn /@ x   // AbsoluteTiming // First
pairem[x] // AbsoluteTiming // First
0.334019

0.383022

A related question: Prepend 0 to sublists

Mr.Wizard
  • 271,378
  • 34
  • 587
  • 1,371
  • 3
    Try with x = RandomInteger[99, {10^6, 5}] which is really the case. – Kuba Apr 11 '15 at 22:39
  • @Kuba, x = RandomInteger[99, {10^6, 5}];pairem[x]; // AbsoluteTiming fn /@ x; // AbsoluteTiming ===> {0.2158204, Null},{13.1357421, Null} – xyz Apr 12 '15 at 05:12
4

Here is a refinement of @Kuba's clever Part syntax:

data = RandomInteger[{1,10},{10^7,5}];

r1 = Partition[
    Flatten @ data[[All,{1,2,1,3,1,4,1,5}]],
    2
]; //AbsoluteTiming

r2 = ArrayReshape[
    data[[All, {1,2,1,3,1,4,1,5}]],
    {4 Length[data], 2}
]; //AbsoluteTiming

r1===r2

{1.55516, Null}

{0.568824, Null}

True

Carl Woll
  • 130,679
  • 6
  • 243
  • 355
3

My trail:

data = RandomInteger[{1, 10}, {8, 5}]
{{1, 1, 9, 10, 6}, {7, 2, 8, 5, 4}, {2, 1, 10, 1, 7}, {9, 1, 4, 5, 2},
   {6, 10, 6, 5, 10}, {2, 10, 1, 7, 4}, {3, 5, 3, 2, 2}, {1, 2, 9, 6, 2}}
Thread@{#1, {##2}} & @@@ data
{
   {{1, 1}, {1, 9}, {1, 10}, {1, 6}}, {{7, 2}, {7, 8}, {7, 5}, {7, 4}},
   {{2, 1}, {2, 10}, {2, 1}, {2, 7}}, {{9, 1}, {9, 4}, {9, 5}, {9, 2}},
   {{6, 10}, {6, 6}, {6, 5}, {6, 10}}, {{2, 10}, {2, 1}, {2, 7}, {2, 4}},  
   {{3, 5}, {3, 3}, {3, 2}, {3, 2}}, {{1, 2}, {1,9}, {1, 6}, {1, 2}}
}

Performance test

x = RandomInteger[99, {10^6, 5}];
Thread@{#1, {##2}} & @@@ x; // AbsoluteTiming
{2.3535156, Null}
xyz
  • 605
  • 4
  • 38
  • 117
2

Just for fun a solution with Riffle:

riffle[d_] := Module[{data = Transpose[d]},
  Partition[Flatten[Transpose[Rest[Riffle[data, {First[data]}]]]], 2]
  ]