Searching linked lists that contain lists?

Question

Following the advice I've read here and other sites, I've been trying to use the Mathematica equivalent of a linked lists...

testList = {{a, b}, {{c, d}, {{e, f}, {}}}}

Now, I want to see if {c,d} is a member of testList. How do I do that? MemeberQ doesn't transverse the list recursively and Flatten also nukes the sub lists. The following seems to work but I would expect there to be a cleaner simpler way...

memberInLinkedList[{}, _] = False;
memberInLinkedList[l_List, v_] := True /; First[l] == v;
memberInLinkedList[l_List, v_] := memberInLinkedList[Last[l], v];

Is there a more eloquent or built-in way to do this? Perhaps a general idiom or package that handles this transparently?

What exactly is l in the original definition of testList? — Shredderroy, Sep 11 '13 at 17:30
You could use a head different than List for the linked list to avoid the flattening issue — Rojo, Sep 11 '13 at 18:43

score 15 · Accepted Answer · answered Sep 11 '13 at 17:31

15

MemberQ[testList, {c, d}, Infinity]

True

answered Sep 11 '13 at 17:31

ybeltukov

43,673
5
108
212

ybeltukov · Answer 2 · 2013-09-12T20:31:00.307

10

Yet another answer

b1[x_] := Module[{f, res},
   f[{x, R_}] = True;
   f[{L_, R_}] := f[R];
   f[#] /. f[{}] -> False
   ] &

Block[{$RecursionLimit = 1*^6, $IterationLimit = 1*^6},
 {
  MemberQ[ll, {86, 99}, Infinity] // timeAvg,
  t7[{86, 99}][ll] // timeAvg,
  b1[{86, 99}][ll] // timeAvg
  }
 ]

{0.0196000, 0.00751200, 0.00321600}

Your move, Mr. Wizard! :)

edited Sep 12 '13 at 20:31

answered Sep 12 '13 at 11:57

ybeltukov

43,673
5
108
212

Well that's really weird; I thought I tried this form and it was slower, which surprised me. I guess failed to properly clear a definition during testing. +1! Back to testing. :-) – Mr.Wizard Sep 12 '13 at 15:36
I can't think of any approach to try to improve this. You win. – Mr.Wizard Sep 12 '13 at 15:44
b1 works fine with L1 but not with L2: L=Partition[Range@8,2]; L1=Fold[{#2,#}&,{},L]; L2=Fold[{##}&,{},L]; b1[#][L1]&/@L (*{True,True,True,True}*); b1[#][L2]&/@L (*{f$200[8],f$201[8],f$202[8],f$203[8]}*) – Ray Koopman Sep 12 '13 at 20:24
@RayKoopman As Mr.Wizard I didn't convert this output to False. Now I fix it. – ybeltukov Sep 12 '13 at 20:33
I still get the same kind of f$xxx[8] sequence. – Ray Koopman Sep 12 '13 at 21:23
@RayKoopman Now I understand your question! b1 does not support reverse ordering. If you want you can change x <-> R_ and L_ <-> R_ in the definition. – ybeltukov Sep 12 '13 at 21:47
There is a slightly simpler version of b1 which seems to be just as fast and doesn't need iteration/recursion limits to be touched. It does unlike Mr.Wizards t9 crash the kernel for longer linked lists though (unless it finds a match before the crash). Here it is: b2[x_] := (# //. {{x, _} -> True, {_, r_} :> r, {} -> False}) & – Albert Retey Sep 13 '13 at 10:24
@ybeltukov If something seems too good to be true, it probably isn't. – Ray Koopman Sep 15 '13 at 06:47

score 9 · Answer 3 · edited Apr 13 '17 at 12:55

Methods revisited

ybeltukov posted a cleaner version of t7 that made me feel rather silly. (Thanks ybeltukov; it will teach me to be more careful about clearing definitions while experimenting!) I can't beat it, so instead I'll try to refine it. First, several of my functions and his b1 do not return False on a failure to match, so this should be corrected. Second, one should incorporate extension of $IterationLimit into the function. It would then look something like this:

t8[linked_, x_] :=
 Module[{f},
  Block[{$IterationLimit = ∞},
   f[{x, R_}] = True;
   f[{}] = False;
   f[{L_, R_}] := f[R];
   f @ linked
 ]]

t8[ll, {86, 99}]

True

A crash with most methods

I discovered that on longer linked lists all the methods suggested so far cause a kernel crash. ybeltukov confirmed that this problem also affects version 9.0.1.

Examples:

SeedRandom[1]
RandomInteger[99, {500000, 2}];
ll = Fold[{#2, #} &, {}, %];

MemberQ[ll, {86, 99}]           (* kernel crash *)

Cases[ll, {86, 99}, -1, 1]      (* kernel crash *)

ll /. {86, 99} :> Return[True]  (* kernel crash *)

One way around this problem is to manage a stack manually as Daniel did here.

t9[linked_, pat_] :=
  Module[{R = linked, L},
    While[R =!= {},
      {L, R} = R;
      If[MatchQ[L, pat], Return @ True];
    ];
    False
  ]

Now:

SeedRandom[1]
RandomInteger[99, {500000, 2}];
ll = Fold[{#2, #} &, {}, %];

t9[ll, {86, 99}]

True

This is not as fast as b1/t8 however.

Original answer

If you did not have MemberQ you could still walk the tree recursively. Here are several ways to do that:

t1[x_] := MatchQ[#, {x, _} | {_, _?#0}] &

t2[x_][{L_, R_}] := MatchQ[L, x] || t2[x][R]

t3[x_] := Module[{f}, f[{L_, R_}] := MatchQ[L, x] || f[R]; f]

t4[x_] := MatchQ[#[[1]], x] || #0 @ #[[2]] &

All functions have the syntax: tfunc[pattern][linkedlist].

Sometimes these are even faster than MemberQ:

SetAttributes[timeAvg, HoldFirst]
timeAvg[func_] := Do[If[# > 0.3, Return[#/5^i]] & @@ Timing@Do[func, {5^i}], {i, 0, 15}]

SeedRandom[1]
RandomInteger[99, {50000, 2}];
ll = Fold[{#2, #} &, {}, %];

MemberQ[ll, {86, 99}, Infinity] // timeAvg

0.01148

Block[{$RecursionLimit = 1*^6},
 timeAvg @ #[{86, 99}][ll] & /@ {t1, t2, t3, t4}
]

{0.01812, 0.007112, 0.00612, 0.008736}

More experiments

Using the syntax tfunc[pat, list] is somewhat faster, i.e. this is faster than t2:

t5[x_, {L_, R_}] := MatchQ[L, x] || t5[x, R]

A bit faster still is shifting this to a form that is iterative:

t6[x_, {L_, R_}] /; MatchQ[L, x] = True;
t6[x_, {L_, R_}] := t6[x, R]

The fastest I found so far is combining this iterative form with the dedicated function a la t3:

t7[x_] := 
  Module[{f},
    f[{L_, R_}] /; MatchQ[L, x] = True;
    f[{L_, R_}] := f[R];
    f
  ]

Timings for these three variations:

Block[{$RecursionLimit = 1*^6, $IterationLimit = 1*^6},
 {
  t5[{86, 99}, ll] // timeAvg,
  t6[{86, 99}, ll] // timeAvg,
  t7[{86, 99}][ll] // timeAvg
 }
]

{0.00624, 0.005864, 0.005368}

I find it fairly impressive that t7 is twice is fast as MemberQ in this application.

I haven't tried them all but, at least t7 breaks in here when you serach for something that isn't there — Rojo, Sep 12 '13 at 21:07
@Rojo Yes; I believe t6 does as well. That's what I meant when I said "First, several of my functions and his b1 do not return False on a failure to match ..." I posted t5/t6/t7 in a hurry and I didn't do a good job. I also tried the cleaner form and somehow convinced myself it was slower. I hope that t8 fixes these problems. — Mr.Wizard, Sep 12 '13 at 23:33
@Mr.Wizard Can you add that crash depends on system stack size? For example on Linux you can run Mathematica as ulimit -s 65536 && mathematica and the problem disappears. — ybeltukov, Sep 13 '13 at 12:30
@ybeltukov It will have to wait a couple of days if I do it. I don't have ulimit in Windows AFAIK so I'd need to use another tool, and I'd like to test it before making claims myself. I don't mind if you edit this answer to note your findings. — Mr.Wizard, Sep 13 '13 at 13:49

score 3 · Answer 4 · edited Apr 13 '17 at 12:55

[Edit: I forgot to include the modified m2 needed for the LL-headed linked lists. I'm including an improved (faster) version.]

Here are a couple of ways:

m1[l_, pat_] := Catch[l /. pat /; Throw[True] :> Null; False];
m2[l_, pat_] := NestWhile[Last, l, # =!= {} && ! MatchQ[First@#, pat] &] =!= {};

Leonid Shifrin suggests in this answer using a special head for linked lists if the elements of the linked list are to be lists themselves. For example,

testLL = LL[{a, b}, LL[{c, d}, LL[{e, f}, LL[]]]]

One can then use Flatten to get a flat expression:

Flatten[testLL, Infinity, LL]
(* LL[{a, b}, {c, d}, {e, f}] *)

Then MemberQ (and other such functions) can be used more or less normally:

MemberQ[Flatten[testLL, Infinity, LL], {c, d}]
(* True *)

If LL has the attribute HoldAllComplete, as in Leonid Shifrin's answer, we can mark the end of the linked list with Throw like this:

testLL2 = LL[{a, b}, LL[{c, d}, LL[{e, f}, LL[Throw["endLL"]]]]]

Then we can modify m2 as follows:

m2LL[l_, pat_] :=
  Catch[NestWhile[Last, l, ! MatchQ[First@#, pat] &]; True] /. "EndLL" -> False

If NestWhile gets to the end of the linked list, First@# will execute the Throw.

Timing tests

Using Mr.Wizard's data (updated to include Throw["EndLL"] to mark the end of the list ll2):

SetAttributes[timeAvg, HoldFirst]
timeAvg[func_] := Do[If[# > 0.3, Return[#/5^i]] & @@ Timing@Do[func, {5^i}], {i, 0, 15}]

SeedRandom[1];
data = RandomInteger[99, {50000, 2}];
ll = Fold[{#2, #} &, {}, data];
ClearAll[LL];
SetAttributes[LL, HoldAllComplete];
ll2 = Fold[LL[#2, #] &, LL[Throw["EndLL"]], data];

Timing:

m1[ll, {86, 99}] // timeAvg
m2[ll, {86, 99}] // timeAvg
MemberQ[ll, {86, 99}, Infinity] // timeAvg

0.00949383
0.0142940
0.0301806

Below are (updated) timings on the LL linked lists. It is interesting that MemberQ with Flatten is faster than MemberQ with a level spec. of Infinity.

m1[ll2, {86, 99}] // timeAvg
m2LL[ll2, {86, 99}] // timeAvg
MemberQ[Flatten[ll2, Infinity, LL], {86, 99}] // timeAvg

0.00865924
0.01107858
0.0227578

Mr.Wizard's t7 is about as fast as m1. A comparison with MemberQ is included.

Block[{$RecursionLimit = 1*^6, $IterationLimit = 1*^6}, 
 t7[{86, 99}][ll] // timeAvg]

0.00860823

But t7 is faster than m1 on patterns that don't match (or match near the end):

Block[{$RecursionLimit = 1*^6, $IterationLimit = 1*^6}, 
 t7[{86, 999}][ll] // timeAvg]
m1[ll2, {86, 999}] // timeAvg

0.064016
0.074637

(Update: Of course @ybeltukov's b1 beats both t7 and m1.)

@Mr.Wizard Thanks! Nesting Last was conceptually appealing, too, and it avoids having to deal with recursion limit (not particularly important in this case). Your recursive approach is clearly superior. — Michael E2, Sep 12 '13 at 18:00

Searching linked lists that contain lists?

4 Answers4

Methods revisited

A crash with most methods

Original answer

More experiments

Linked