15

How can we create a Vlookup function, similar to what we have in Excel? So if we have a table a and another b, we could add columns to table a, based on a common key to both, in order to create a table c.

Tables

Szabolcs
  • 234,956
  • 30
  • 623
  • 1,263
Murta
  • 26,275
  • 6
  • 76
  • 166
  • Closely related: http://mathematica.stackexchange.com/questions/15425/finding-the-intersection-of-two-date-lists. – whuber Jan 30 '13 at 15:31

2 Answers2

13

Inspired in this solution from @Mr.Wizard. That is the function that I made in order to simulate Excel vLookup:

vLookup[data1_, data2_, pk1_:1, pk2_:1,null_:Null] := Module[{index,pickList},
    pickList=Complement[Range[Length[data2[[1]]]],{pk2}];
    SetAttributes[index,Listable];
    (index[#[[pk2]]]=#[[pickList]])&/@data2;
    index[_]=ConstantArray[null,Length[pickList]];
    Join[data1,index@data1[[All,pk1]],2]
]

This function worked just nice, let's do some time test. Doing:

a = RandomInteger[1000, {10000, 2}];
b1 = RandomInteger[1000, {10, 2}];
b2 = RandomInteger[1000, {100, 2}];
b3 = RandomInteger[1000, {1000, 2}];
b4 = RandomInteger[1000, {10000, 2}];

t1 = vLookup[a, b1] // AbsoluteTiming // #[[1]] &;
t2 = vLookup[a, b2] // AbsoluteTiming // #[[1]] &;
t3 = vLookup[a, b3] // AbsoluteTiming // #[[1]] &;
t4 = vLookup[a, b4] // AbsoluteTiming // #[[1]] &;

{t1, t2, t3, t4}
Ratios[{t1, t2, t3, t4}]

and get:

{0.016816, 0.013729, 0.018197, 0.056836}
{0.816, 1.325, 3.123}

Use index Downvalues is a nice way create a "indexed table" in Mathematica, so we can grow b quantity of lines in a order of magnitude with no linear impact in the algorithm time performance. Maybe it could be done with Dispatch and Rule list too.

It's important to state that when the b key value repeats, this vLookup takes the line of the last key value. Excel would return the first value.

Update 1

Instead of use:

(index[#[[pk2]]]=#[[pickList]])&/@data2;

I get a little better performace using

Scan[(index[#[[pk2]]]=#[[pickList]])&,data2];

Update 2

There is a new version with some new implementation:

  1. Option instead of arguments. I think that the code get a much better to read and control.
  2. Match key columns could be more then one, so you could use "Key1" -> {1, 2} and "Key2" -> {2, 3} as match columns, and not just one.
  3. Columns option, where you can control witch column you want to retrive in your vlookup operation.

Here is the code:

Options[vLookup2]={"Key1"->{1},"Key2"->{1},"NullValue"->Null,"Columns"->Automatic};

vLookup2[data1_,data2_,OptionsPattern[]]:=Module[{index,pickList,pKey1,pKey2,null,columns},
    null=OptionValue["NullValue"];
    pKey1=OptionValue["Key1"]//List//Flatten;
    pKey2=OptionValue["Key2"]//List//Flatten;
    columns=OptionValue["Columns"]//List//Flatten;
    pickList=If[columns==={Automatic},Complement[Range[Length[data2[[1]]]],pKey2],columns];
    Scan[(index[#[[pKey2]]]=#[[pickList]])&,data2];
    index[_]=ConstantArray[null,Length[pickList]];
    Join[data1,index/@data1[[All,pKey1]],2]
]

The use is very simple:

a=RandomInteger[5,{20,3}];
b=RandomInteger[5,{10,3}];
vLookup2[a,b,"Key1"-> {1,2},"Key2"-> {2,3},"NullValue"->Null,"Columns"->{2,3}]//TableForm

Update 3 (Dispatch with Rule)

I did not resisted to do the code using Dispatch. As follows:

Options[vLookup3]={"Key1"->{1},"Key2"->{1},"NullValue"->Null,"Columns"->Automatic};

vLookup3[data1_,data2_,OptionsPattern[]]:=Module[{index,pickList,pKey1,pKey2,null,columns,rule},
    null=OptionValue["NullValue"];
    pKey1=OptionValue["Key1"]//List//Flatten;
    pKey2=OptionValue["Key2"]//List//Flatten;
    columns=OptionValue["Columns"]//List//Flatten;
    pickList=If[columns==={Automatic},Complement[Range[Length[data2[[1]]]],pKey2],columns];
    rule=#[[pKey2]]->#[[pickList]]&/@data2;
    rule=Dispatch[Reverse[rule]~Join~{_-> ConstantArray[null,Length[pickList]]}];
    Join[data1,#/.rule&/@data1[[All,pKey1]],2]
]

Code tips for tuning and simplifications are always welcome!

Murta
  • 26,275
  • 6
  • 76
  • 166
3

I don't think I understand what all your options are for but the basic operation can be done fairly simply:

tabA = {{"a", 1}, {"b", 2}, {"c", 3}, {"d", 4}, {"e", 5}, {"f", 1}};

tabB = {
   {5, a1, b1},
   {2, a2, b2},
   {1, a3, b3},
   {10, a4, b4},
   {11, a5, b5}
  };

rls = Append[# :> {##2} & @@@ tabB, _ -> {None, None}] // Dispatch;

Join[tabA, Replace[tabA[[All, 2]], rls, {1}], 2] // MatrixForm

$\left( \begin{array}{cccc} a & 1 & \text{a3} & \text{b3} \\ b & 2 & \text{a2} & \text{b2} \\ c & 3 & \text{None} & \text{None} \\ d & 4 & \text{None} & \text{None} \\ e & 5 & \text{a1} & \text{b1} \\ f & 1 & \text{a3} & \text{b3} \end{array} \right)$

Mr.Wizard
  • 271,378
  • 34
  • 587
  • 1,371
  • This major option that I use a lot is Key1 and Key2. They indicate the simple or composed keys to be matched in list1 and list2. The standard is to look for Column 1 from list1 in Column 1 in list2 – Murta Sep 11 '13 at 22:00