6

I have a matrix whose rows I want to extract based on whether the elements of one of its columns is a member of another vector (calling it the "comparison vector"). I would like to get better at using functional programming, and so I want to avoid using a loop, if it´s possible. I believe I have the right functions: Selectand MemberQ. But I can´t coerce MemberQ to compare the element of the matrix to each element of the comparison vector.

If there´s a duplicate answer, I´m willing to be lead to it (I´ve searched though)...

Addition to my question:

 mymatrix={{1, 1, -56}, {1, 2, 3.06}, {2, 0, -30.02}, {3, 1, -7.291}, {3, 2,1.93}, {4, 0, 0}, {5, 0, 0}, {5, 1, -356.4}, {6, 1, 9.945}, {7,0, -7.512}};

 compvector={1,2,6,7,11,12,16,17};

I would like to extract the rows of ´mymatrix´ based on whether the values of the first column of ´mymatrix´ are in ´compvector´.

Verde
  • 85
  • 5
  • I suppose you want something like this: mat = {{1, 2, 3}, {5, 4, 6}, {6, 20, 13}}; cmpvec = {10, 13, 33, 44}; Select[mat, Intersection[#, cmpvec] != {} &] – Aky Aug 09 '13 at 13:57
  • Hi @Aky. As I said to Anon, thanks for your response, but it´s not quite right, and that´s due to my lack of information since the beginning... I´m not looking for whether a value is found in any part of the row - I´m trying to test whether a specific part of the row belongs to the list. – Verde Aug 09 '13 at 14:16
  • @Aky could you briefly explain what ´!= {} &´ does in your code? – Verde Aug 09 '13 at 14:26
  • {} is just an empty list. So the expression checks if the result of the Intersection is a non-empty list (meaning the two argument lists have at least one common element, which is how I had initially understood your question). – Aky Aug 09 '13 at 16:19

9 Answers9

12

I recommend using Pick for these things. It has already been used by others but the simplest form using Alternatives hasn't been shown:

Pick[#, #[[All, 1]], Alternatives @@ #2] &[mymatrix, compvector]

{{1, 1, -56}, {1, 2, 3.06}, {2, 0, -30.02}, {6, 1, 9.945}, {7, 0, -7.512}}

For greater speed especially with longer comvectors we can use a Dispatch table:

fast[m_, c_] := Pick[m, m[[All, 1]] /. Dispatch @ Thread[c -> True]]

fast[mymatrix, compvector]

{{1, 1, -56}, {1, 2, 3.06}, {2, 0, -30.02}, {6, 1, 9.945}, {7, 0, -7.512}}

Timings with some larger data:

mymatrix = RandomInteger[9999, {50000, 3}];
compvector = RandomChoice[Range@9999, 150];

(* the faster of Michael's functions *)
michael[m_, c_] := With[{nf = Nearest[c]},  
  Pick[m, # - First /@ nf /@ # &@m[[All, 1]], 0]
 ]

Cases[mymatrix, {x_, _, _} /; MemberQ[compvector, x]] // Timing // First
Pick[#, #[[All, 1]], Alternatives @@ #2] &[mymatrix, compvector] // Timing // First
michael[mymatrix, compvector]                         // Timing // First
fast[mymatrix, compvector]                            // Timing // First

3.447

0.905

0.265

0.047

A run-off with Michael's method on even larger data:

mymatrix = RandomInteger[99999, {500000, 3}];
compvector = RandomChoice[Range@99999, 15000];

michael[mymatrix, compvector] // Timing // First
fast[mymatrix, compvector]    // Timing // First

15.943

0.327

Mr.Wizard
  • 271,378
  • 34
  • 587
  • 1,371
  • 3
    The timings for V9: In the first "race" (the 50000-row mymatrix), the timings of the four were respectively 0.594460, 0.158027, 0.140691, 0.059914. In the run-off, they were 2.479786, 0.725754. The relative improvement of michael from V7 to V9 is almost 15X - wow. Bully for NearestFunction, I guess. – Michael E2 Aug 10 '13 at 00:05
7

Assuming I've (now) understood your question properly, it's a very simple problem that deserves a very simple answer:

Select[mymatrix, MemberQ[compvector, First@#] &]
Aky
  • 2,719
  • 12
  • 19
5

New solution

Cases[mymatrix, {x_, _, _} /; MemberQ[compvector, x]]

{{1, 1, -56}, {1, 2, 3.06}, {2, 0, -30.02}, {6, 1, 9.945}, {7, 0, -7.512}}

I do not take credit for this solution, someone posted this here before me but in the confusion over what the problem was that person (whose name I do not remember) deleted his answer :(

Here's another answer that does not use Cases.

First let's define what columns are required to be in compvector, first and second for example:

required = {True, True, False}

Then

Select[mymatrix, And @@ (MemberQ[compvector, #] & /@ Pick[#, required]) &]

{{1, 1, -56}, {1, 2, 3.06}, {6, 1, 9.945}}

If the requirement is just that a specific column should exist, this can obviously be made a lot simpler. See aky's answer.

The complaint for the first version was that if there are many columns the pattern would also be very long, {x_,_,_,_ ...: this is not really true, one could write a short pattern to match such a list.

Old solution, not what the OP wants

First define some test data:

comparison = Range[10]

{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}

matrix = RandomInteger[100, {10, 5}]

{{46, 51, 84, 49, 52}, {12, 22, 7, 51, 56}, {74, 61, 9, 23, 93}, {97, 0, 23, 87, 78}, {23, 29, 83, 68, 21}, {79, 1, 25, 13, 84}, {23, 85,
35, 83, 83}, {2, 29, 50, 22, 88}, {34, 61, 91, 84, 29}, {60, 51, 96,
48, 68}}

The test:

Select[matrix, Length[Intersection[comparison, #]] > 0 &]

{{12, 22, 7, 51, 56}, {74, 61, 9, 23, 93}, {79, 1, 25, 13, 84}, {2,
29, 50, 22, 88}}

C. E.
  • 70,533
  • 6
  • 140
  • 264
  • Hi @Anon. Thank you for your response, but it´s not quite right (my fault). I´m not looking for whether a value is found in any part of the row - I´m trying to test whether a specific part of the row belongs to the list. But I´m going to try to use this ´Intersection´ function... maybe it´s useful. – Verde Aug 09 '13 at 14:14
  • @Verde See update. – C. E. Aug 09 '13 at 14:28
  • Yes, this did it... thank you. I wanted to ask something about your answer, though. If my matrix had many more columns, using the syntax ´{x_,,,_}´ inside ´Cases´ wouldn´t be efficient. Do you know of another way of solving my question without using ´Cases´ in this way? I was working with ´Cases´function as well, but never got it to work properly... – Verde Aug 09 '13 at 14:38
  • @Verde I added a different solution which hopefully works better for you. – C. E. Aug 09 '13 at 14:53
  • What I meant by not being efficient is that, if I had 100 columns in my matrix for instance, I would have to write ´´ 99 times in the syntax ´{x,,,_}´. Maybe efficient isn´t the right word... – Verde Aug 09 '13 at 14:56
  • @Verde Ah, I see... yeah that's quite a lot. Hopefully you can find a way to programmatically generate the list required and all should be fine though. – C. E. Aug 09 '13 at 14:57
  • @Verde If you are only ever matching on the first column then it doesn't matter how many columns you have. Just use {x,___} to match the first column in a table with one or more columns. – Mike Honeychurch Aug 09 '13 at 21:52
5

Here are a couple of ways:

nf = Nearest[compvector];
Pick[mymatrix, # - First /@ nf /@ # &@ mymatrix[[All, 1]], 0]
(* {{1, 1, -56}, {1, 2, 3.06}, {2, 0, -30.02}, {6, 1, 9.945}, {7, 0, -7.512}} *)


Pick[mymatrix, Times @@@ Outer[Plus, mymatrix[[All, 1]], -compvector], 0]
(* {{1, 1, -56}, {1, 2, 3.06}, {2, 0, -30.02}, {6, 1, 9.945}, {7, 0, -7.512}} *)

This one below is pretty fast for small compvector, but the complexity is not very good. It starts to lose out to the first one (the faster one) with the length of compvector exceeds about 170, and to Mr.Wizard's fast when the length is above about 70.

Extract[mymatrix, Position[mymatrix[[All, 1]], Alternatives @@ compvector]]
Michael E2
  • 235,386
  • 17
  • 334
  • 747
  • +1 for being primed by using Nearest recently. :-) – Mr.Wizard Aug 09 '13 at 23:24
  • @Mr.Wizard I considered the binary search, but I left it for you, if you wish. For the sake of variety, I'm looking for a less Pick-y solution that's as fast or faster. – Michael E2 Aug 09 '13 at 23:29
  • Michael, I added an answer with my own preference (Alternatives). I believe that this function was improved some time after v7; could you please run the timings yourself, and also with compvector = RandomChoice[Range@9999, 2500]; and tell me what you get? – Mr.Wizard Aug 09 '13 at 23:34
  • Michael, I added a much faster method using Dispatch. I'm still curious to know how Alternatives compares in later versions. – Mr.Wizard Aug 09 '13 at 23:52
3

Modified post

I erased my post because I thought that the other solutions were better for the wording of your question. Your clarifications, especially the 100 column one, suggest that you need:

matrix = {{a, b, c}, {d, e, f}, {g, h, i}};
picker = RandomChoice[{b, h, u}, 30000000];
Timing[Cases[matrix, x_ /; MemberQ[picker, x[[2]]]];]

Interestingly, this calculation is fast even when picker is a long vector.

Deleted post

It seems that you want something like

matrix = {{a, b, c}, {d, e, f}, {g, h, i}}; picker = {b, h, u};
Cases[matrix, {_, x_, _} /; MemberQ[picker, x]]

which returns {{a, b, c}, {g, h, i}}. Those are the rows whose second element is a member of that other vector picker.

Hector
  • 6,428
  • 15
  • 34
  • The first solution (the deleted post) is the same as Anon's first solution and the second solution (the modified post) is the same as mine. The only difference is that you have compared the second element of the lists. – M6299 Aug 10 '13 at 03:50
  • @M6299: Anon refers to "someone posted this … that person … deleted his answer". Anon read my post in the 10 minutes it was alive. My modified post addresses Verde's the concern about not writing 99 Blank[]s. I went further and showed with Timing that such code is efficient. – Hector Aug 10 '13 at 05:17
  • I added that my post is the modification of your deleted post. – M6299 Aug 10 '13 at 05:48
3

Another possibility:

Pick [#, MemberQ[compvector, #] & @@@ #] &@mymatrix

=> {{1, 1, -56}, {1, 2, 3.06}, {2, 0, -30.02}, {6, 1, 9.945}, {7, 0, -7.512}}

user1066
  • 17,923
  • 3
  • 31
  • 49
3

Modifying Anon's first solution gives:

Cases[mymatrix, x_List /; MemberQ[compvector, x[[1]]]]

As we can see there is no need to type {x_,_,_,_ ...} which is inconvenient for a matrix with a large number of columns.

{{1, 1, -56}, {1, 2, 3.06}, {2, 0, -30.02}, {6, 1, 9.945}, {7, 0, -7.512}}

M6299
  • 1,471
  • 1
  • 13
  • 20
  • x_List in this code can be replaced with x_. There is no need to specify the List head. – M6299 Aug 09 '13 at 19:35
  • In fact this is the modification of Hector's deleted post. Thanks to Hector. – M6299 Aug 10 '13 at 05:46
  • @Mr.Wizard How many posts am I allowed to edit per day? – M6299 Aug 13 '13 at 11:36
  • I'm going to say six (6). This is based on Area 51 saying we get 12 questions a day; I figure one person's edits should not be more than half that. We did just approve a lot more than that but it was a one-time thing because of the effort I know you put out. – Mr.Wizard Aug 13 '13 at 11:51
  • @Mr.Wizard Ok. I am sorry for any inconvenience I may have caused. – M6299 Aug 13 '13 at 12:00
  • I see that you have not made any more edits for Greek letters. I would like to encourage you to make a few of these a day as they are helpful. – Mr.Wizard Aug 16 '13 at 11:25
  • @Mr.Wizard Thank you for the encouragement. I will make more edits. – M6299 Aug 16 '13 at 12:14
  • @rm-rf As you can see from the previous comments, Mr.Wizard has encouraged me to make more edits and he has set a limit of 6 edits per day and I have done what he has said. I have written these here because I can not log into the chat. Anyway if you say so, I will not make such edits anymore. – M6299 Aug 19 '13 at 15:07
  • M6299, you should be aware that @pings are only effective if a user has already participated in a given string of comments. I have notified rm -rf of your comment. Do you know why you were unable to join Chat? Is this an ongoing problem? Regarding the edits I failed to inform others of my instructions to you, and I failed to raise a question about these edits on Meta which would have been appropriate. I take any blame for the mixed messages or wasted effort. – Mr.Wizard Aug 19 '13 at 15:35
  • @Mr.Wizard yes it is an ongoing problem. It is because of the failure to communicate with "stackauth.com" (this is the result of the site test) I do not know how to fix this problem. I have tried to log into the chat several times but I could not do it even once. Dear Mr.Wizard no one is to blame. – M6299 Aug 19 '13 at 15:48
  • @M6299 I replied to Mr.Wizard here. The main issue here is that the homepage gets flooded. Many people only read the front page and see if there's anything new and/or interesting. If one user rapidly edits 1 yr old questions to change \[Pi] to π, then we'll end up pushing new and interesting questions off the page and fill it with old ones that had trivial changes made. I know Mr.W allowed you 6/day, but if you do them all at once, it still floods the home page. Please also see the meta question in the linked message. – rm -rf Aug 19 '13 at 18:13
  • I strongly encourage you to submit edits as well. I'm a prolific editor myself (or used to be), but instead of just searching for Greek letters to fix using a single click, perhaps you can try improving posts that are really bad or have poor formatting, etc. And remember to pace yourself so that it doesn't come in large numbers. That way, the edits will certainly be substantial and not irk users :) Thanks for understanding! – rm -rf Aug 19 '13 at 18:14
2
list =
  {{1, 1, -56}, {1, 2, 3.06}, {2, 0, -30.02}, {3, 1, -7.291}, {3, 2, 1.93}, {4, 0, 0}, {5, 0, 0}, {5, 1, -356.4}, {6, 1, 9.945}, {7, 0, -7.512}};

comp = {1, 2, 6, 7, 11, 12, 16, 17};

Join @@ Values @ KeyTake[comp] @ GroupBy[First] @ list

{{1, 1, -56}, {1, 2, 3.06}, {2, 0, -30.02}, {6, 1, 9.945}, {7, 0, -7.512}}

eldo
  • 67,911
  • 5
  • 60
  • 168
1
Pick[mymatrix, Times @@ BitXor[compvector, #] & /@ mymatrix[[All, 1]], 0]

(*{{1, 1, -56}, {1, 2, 3.06}, {2, 0, -30.02}, {6, 1, 9.945}, {7, 0, -7.512}}*)
chyanog
  • 15,542
  • 3
  • 40
  • 78
  • I like this method and have used it myself but in this case I don't see the advantage over simpler options. This is an order of magnitude slower than simply using Alternatives and more than two orders slower than using a Dispatch (hash) table, on the first of the two tests in my answer. – Mr.Wizard Aug 10 '13 at 06:55
  • @Mr.Wizard You are right, I haven't run performance test. – chyanog Aug 10 '13 at 07:18