12

How to match set-patterns against sets?

A set (in the mathematical sense) is a list of elements without repetition and order of elements does not matter. For example, we have a pattern set {3, 1} that should match sets {1, 3}, {1, 2, 3}, {1, 2, 3, 4} and so on. Note, that the list-length of the pattern is not relevant: any set that contains elements 3 and 1 should match the pattern. So far, this is simple subset testing - but there are two problems:

  1. Since order does matter for the patternmatcher, one has to write e.g. Cases[sets, {___, 3, ___, 1, ____}|{___, 1, ___, 3, ____}] which causes a combinatorial expansion for an increasing number of element-wise matches. Thus I used MemberQ instead of structural patterns.

  2. I would like to use more complicated patterns, like: "Find all sets that contain 1 and 3 but not 2!".

I have a working solution, but it is neither effective nor elegant in my opinion. It involves a Boolean description of the pattern (And to include all listed elements, Or to include any listed element, Not to exclude an element), but I am not sure it is the right way to do it. The function simply wraps each element that apperas in the pattern into MemberQ, so the Boolean expression translates to a logical combination of MemberQ and Not@MemberQ calls.

setCases[sets_List, patt_] := Module[{elem = Union @@ sets},
   Cases[sets, _?((patt /. x_ /; MemberQ[elem, x] :> MemberQ[#, x]) &), {1}]
   ];

Define a list of sets, and a list of patterns for testing:

sets = Subsets[{1, 2, 3, 4}];

patterns = {1, \[Not] 1, 1 \[And] \[Not] 2, 1 \[Or] \[Not] 2, 
   1 \[Or] 2 \[Or] 3, 1 \[And] 2 \[And] 3, 
   1 \[And] \[Not] 2 \[And] \[Not] 3, 1 \[Or] (2 \[And] \[Not] 3), 
   1 \[And] \[Not] (2 \[And] 3), 1 \[And] \[Not] (2 \[Or] 3), 
   1 \[And] \[Not] (2 \[Or] (3 \[And] \[Not] 4)),
   \[Not] 1 \[And] \[Not] 2 \[And] \[Not] 3 \[And] \[Not] 4}

Grid[{#, setCases[sets, #]} & /@ patterns, Alignment -> Left, 
 Background -> {None, {{LightGray, White}}}, Spacings -> {1, 1}] // TraditionalForm

Mathematica graphics

Let's examine one case closer, by displaying the ultimate pattern that is tested:

(1 \[Or] 2) \[And] \[Not] 3 /. x_Integer :> MemberQ[#, x]
(MemberQ[#1, 1] & || (MemberQ[#1, 2] &)) && ! (MemberQ[#1, 3] &)

As one can see, the function is far from being economic: alternatives could have been gathered under one MemberQ (MemberQ[#, 1]& || MemberQ[#, 2]& is equivalent to MemberQ[#, 1|2]&) and I think that Except should be used as well, though have no idea how. I am interested in robust, fast solutions.

Note: Do NOT try to simplify the logical patterns, as:

Simplify[And[1, 2]] ==> 2
Simplify[And[0, 1]] ==> False
Leonid Shifrin
  • 114,335
  • 15
  • 329
  • 420
István Zachar
  • 47,032
  • 20
  • 143
  • 291
  • @Leonid Actually, I've found a case where Mathematica uses a similar approach. Check the Details under TextSearch. It uses List for And, Alternatives for Or and Except for Not. According to PrintDefinitions spelunking, there is a full-blown query-set-algebra, check e.g. TextSearch^IndexSearch^PackagePrivate^compileQuery and ...^exec: QNot, QUnion, QIntersection, QComplement, QString are the specific functions. (backticks are replaced by ^) – István Zachar Mar 09 '16 at 21:23
  • Thanks for letting me know, @Istvan. I will definitely look it up. These things interest me a lot. – Leonid Shifrin Mar 09 '16 at 21:42

4 Answers4

8

A main idea of a pattern-based solution

I don't know why we should make life so complicated, since you can always use things like Intersection and Complement to test whether a given set is a subset of another set. But if you want to use the pattern-matcher, here is one option:

ClearAll[set];
SetAttributes[set, {Orderless, Flat, OneIdentity}];

ClearAll[setCasesLS]
setCasesLS[sets : {__List}, patt_] :=
   List @@@ Cases[set @@@ sets, patt];

Now, for example:

setCasesLS[sets, set[1,__]]

(*  {{1}, {1, 2}, {1, 3}, {1, 4}, {1, 2, 3}, {1, 2, 4}, {1, 3, 4}, {1, 2, 3, 4}}  *)

setCasesLS[sets,set[1,Except[2|set[3 ,Except[4]]]...]]

(* {{1},{1,3},{1,4},{1,3,4}}  *)

It may be an interesting sub-problem to to translate your specs into the patterns used here (involving Except etc), but at least conceptually this could be a valid starting point.

Pattern translator (a sketch, may contain errors)

Ok, it seems that I was able to write a pattern translator which translates your patterns into those which can be used with setCasesLS. But the code is long and ugly, and I would not be suprised if it won't work in more complicated cases. Anyway, here goes:

This is a set of preprocessing functions:

ClearAll[or, and, not];
SetAttributes[{or, and}, {Flat, OneIdentity}];
or[left : Except[_not] ..., x_not, y : Except[_not], rest___] :=
   or[left, y, x, rest];
and[left : Except[_not] ..., x_not, y : Except[_not], rest___] :=
   and[left, y, x, rest];
and[not[x_], not[y_]] /; FreeQ[{x, y}, _not] := not[or[x, y]];
not[not[x_]] := x;
not[and[x_, y_]] := or[not[x], not[y]];
and[left___, or[not[x_], y___], right___] :=
   or[and[left, not[x], right], and[left, y, right]];

Clear[process];
process[expr_] :=  expr /. {And -> and, Not -> not, Or -> or}

Here is a pattern converter:

ClearAll[convert];
convert[HoldPattern[pattern[or[simple : Except[_not | _and] .., rest___]]]] :=
   Alternatives[
     set[Alternatives @@ simple, ___],
     Sequence @@ convert[pattern[or[rest]]]
   ];
convert[HoldPattern[pattern[or[args___]]]] :=
    Alternatives @@ 
        Map[
          If[MatchQ[#, _not], set[convert[#]], convert[pattern[#]]] &, 
          {args}
        ];
convert[HoldPattern[pattern[and[args : Except[_not] ..]]]] :=
   set @@ Append[Map[convert, {args}], ___];
convert[HoldPattern[pattern[and[args___]]]] :=
   set @@ Map[convert, {args}];
convert[HoldPattern[pattern[not[x_]]]] := Except[convert[pattern@x]];
convert[HoldPattern[not[x_]]] := Except[convert[x]] ...;
convert[HoldPattern[or[args___]]] := Alternatives[args];
convert[pattern[x_]] := set[x, ___];
convert[x_] := x;

and this is a main function to bring it all together:

ClearAll[fullConvert];
fullConvert[patt_] :=
  With[{res = convert@pattern@process@patt},
     res /; FreeQ[res, not | and | or]
  ];
fullConvert[patt_] :=
  With[{res = convert@pattern@not@process@Not@patt},
     res /; FreeQ[res, not | and | or]
  ];
fullConvert[patt_] := $Failed;

If it does not succeed in converting a direct pattern, it attempts to convert a negated one. If that also fails, it returns $Failed.

Here is how this works on your patterns:

fullConvert/@patterns
{
    set[1,___],
    Except[set[1,___]],
    set[1,Except[2]...],
    set[1,___]|set[Except[2]...],
    set[1,___]|set[2,___]|set[3,___],
    set[1,2,3,___],
    set[1,Except[2|3]...],
    set[1,___]|set[2,Except[3]...],
    set[1,Except[3]...]|set[1,Except[2]...],
    set[1,Except[2|3]...],
    Except[set[2,___]|set[3,Except[4]...]|set[Except[1]...]],
    Except[set[1,___]|set[4,___]|set[2,___]|set[3,___]]
 }

If you now execute

 setCasesLS[sets, fullConvert[#]]} & /@ patterns

you get the results identical to yours.

I actually think that I am missing some simplificatins which would make the above code shorter, more general and more robust at the same time, but the current solution still seems interesting enough to post it.

Leonid Shifrin
  • 114,335
  • 15
  • 329
  • 420
  • You know how it is... After 3 days of intensive brainstorming you close in on the core of your problem. And that is the point where you won't see the wood from the tree. – István Zachar Feb 16 '13 at 18:13
  • To answer your Q about the complicatedness of life in general: in my case I refrained to use Union, Intersection and Complement because I inject the sets later into placeholders which are not sets themselves, thus functions like Union must be hold. Instead of holding, I went the other way. – István Zachar Feb 16 '13 at 18:19
  • @IstvánZachar Actually, I changed my mind after reading the question more carefully, for more complex patterns of the type you ask for we may benefit from something else than Complement and Intersection - such as pattern-matcher. Working on an automatic translator from your patterns to the ones I use here... – Leonid Shifrin Feb 16 '13 at 18:24
  • Actually, @IstvánZachar, there are some ambiguities in the patterns. For example, what do you mean by "And" (&&)? I currently interpret 1 && 2 as a requirement that at both 1 and 2 should be present in the subset at least once - but this is obviously not how you interpret it (otherwise the pattern !1 && !2 && !3 && !4 should match all the subsets). Also, can you explain e.g. what is the meaning of 1 && 2 && 3? Again, I currently interpret this as a requirement that the subset should have all these elements, but from your results, this is obviously not how you interpret this. – Leonid Shifrin Feb 16 '13 at 18:29
  • Yes, you are right: my examples are wrong, I intended to use And and Or the way you have interpreted them. I lost my solution somewhere in the process of development. Checking now... – István Zachar Feb 16 '13 at 18:38
  • 2
    Ha! The late addition Simplify was the problem: apparently, Simplify[And[1,2]] simplifies to 2. I didn't know that 0/1 booleans are handled this way! – István Zachar Feb 16 '13 at 18:42
  • 1
    @IstvánZachar I analyzed the current setup and came to a conclusion that your pattern language is ambiguous because it is underspecified. What you need is to introduce two new quantors: exists and for all. Then, your logical operators such as Not or Identity will be applied to these. Then, there will be no ambiguities left. – Leonid Shifrin Feb 16 '13 at 19:13
  • Thanks for the analysis and the answer Leonid! Could you please detail a case where my approach becomes ambiguous? I would really appreciate it. – István Zachar Feb 16 '13 at 19:36
  • @IstvánZachar I am still into it. I still want to write an automatic translator of patterns. As to your current setup, I think it is probably ok, if we interpret Not as not for all elements (so, !2 means "there is no single 2 among the elements"), but certain things are currently impossible to express. For example, I may wish all subsets where there is at least one element which is not 2 (but I don't require all of them to be not 2). – Leonid Shifrin Feb 16 '13 at 19:47
  • I see! Yes, that makes sense, and I did not think of this as my use-case is much more limited than the full potential of your predicate logic. The less powerfull zeroth-order logic is fine with me. – István Zachar Feb 16 '13 at 20:31
  • @IstvánZachar Yes, sure. It is just that it makes it harder to write an automatic pattern converter.But I am on my way ... – Leonid Shifrin Feb 16 '13 at 20:51
  • I am intrigued as to how you are tweaking his specifications. I am under the impression, without giving it deep thought, that specifications using set like yours are quite straighforward, so quite similar to any reasonable specifications that come to mind. – Rojo Feb 16 '13 at 23:15
  • @IstvánZachar I added an ugly first version of a pattern translator. I am sure that I am missing something simple here, but can't think clearly any more today, so posted as is. – Leonid Shifrin Feb 16 '13 at 23:54
  • @Rojo Have a look at my update. It is pretty ugly, and I have a hunch that I am missing something very simple, but posted nonetheless. – Leonid Shifrin Feb 16 '13 at 23:55
  • @LeonidShifrin, what about... Humm, too big for a comment, posting as answer – Rojo Feb 17 '13 at 02:22
  • Why use the attribute OneIdentity? The set {x} which contains the sole element x is not the same as the object x itself. – a06e Jan 25 '14 at 17:12
  • @becko It is more complicated than that. OneIdentity works on the level of pattern-matching, not literally converting {x} to x. Usually, Flat and OneIdentity go together. Giving some symbol one attribute but not the other may result in some really weird stuff. I don't quite remember why I used it in this particular place, and have no time right now to look closer at this, but chances are that I had some reason to do this. You can read more on the relation between Flat and OneIdentity in David Wagner's book, and also in a number of Mathgroup threads. – Leonid Shifrin Jan 25 '14 at 17:20
5

This is just to give set the proper attributes and make it simplify double ___ and __

ClearAll[set];
set[a___, Verbatim[___], Verbatim[___] .., b___] := set[a, ___, b];
set[a___, Verbatim[__], Verbatim[__] .., Verbatim[___] ..., b___] := 
  set[a, __, b];
SetAttributes[set, {Orderless, Flat, OneIdentity}];

The patterns will be a boolean function of subset[el1, el2, el3...], with the possibility of mixing patterns, so Except[subset[1,2]] would represent any subset that is not subset[1,2], and subset[1, 3, _] would represent any subset with 1, 3, and any other element.

forms = {"DNF", "CNF",  "AND", "OR"};

convertPattern[patt_, 
  type : (Alternatives @@ forms | Automatic) : Automatic] := 
 With[{pat = BooleanMinimize[patt, type]}, 
  Internal`InheritedBlock[{And, Or}, SetAttributes[{And, Or}, Orderless]; 
   ClearAttributes[{And, Or}, Flat]; And[patt] //. convertionRules]]

convertionRules = {
   And[a___, b : (\[Not] _) .. // Longest] :> 
    Except[Alternatives[b]~Thread~Not // First, And[a]],
   And[a_subset, b__subset, rest___ // Shortest] :> 
    And[set @@ 
      Append[List @@@ Unevaluated@leastCommonElements[a, b], ___], 
     rest],
   (subset | And)[a___] :> set[a, ___], Or -> Alternatives,
   Not -> Except,
   Verbatim[Alternatives][a_] :> a};

(*Thanks @rm-rf*)
leastCommonElements[lists___List] := 
 Join @@ Composition[Last, Sort] /@ 
   GatherBy[Join @@ Gather /@ {lists}, First]

So given

patterns = {1, \[Not] 1, 1 \[And] \[Not] 2, 1 \[Or] \[Not] 2, 
  1 \[Or] 2 \[Or] 3, 1 \[And] 2 \[And] 3, 
  1 \[And] \[Not] 2 \[And] \[Not] 3, 1 \[Or] (2 \[And] \[Not] 3), 
  1 \[And] \[Not] (2 \[And] 3), 1 \[And] \[Not] (2 \[Or] 3), 
  1 \[And] \[Not] (2 \[Or] (3 \[And] \[Not] 4)), \[Not] 
    1 \[And] \[Not] 2 \[And] \[Not] 3 \[And] \[Not] 4}

sets = Subsets[Range[6]];

To translate your patterns to our form we just need to wrap the integers in subset, if I understood correctly

newPatterns = patterns /. i_Integer :> subset[i]

Now, we can see the patterns converted

Table[{patterns, convertPattern[#, form] & /@ (newPatterns)}\[Transpose] // 
  TableForm , {form, forms}]

Finally, we can test it

Cases[set @@@ sets, convertPattern[#]]&/@newPatterns/.set->List//Column
Rojo
  • 42,601
  • 7
  • 96
  • 188
1

OrderlessPatternSequence

Using the first four patterns ({1, ¬ 1, 1 ∧ ¬ 2, 1 ∨ ¬ 2} ) from Istvan's table:

Pick[sets, MatchQ[{OrderlessPatternSequence[___, 1, ___]}] /@ sets]

{{1}, {1, 2}, {1, 3}, {1, 4}, {1, 2, 3}, {1, 2, 4}, {1, 3, 4}, {1, 2, 3, 4}}

Pick[sets, MatchQ[Except@{OrderlessPatternSequence[___, 1, ___]}] /@ sets]

{{}, {2}, {3}, {4}, {2, 3}, {2, 4}, {3, 4}, {2, 3, 4}}

Pick[sets, MatchQ[Except[{OrderlessPatternSequence[___, 2, ___]},
    {OrderlessPatternSequence[___, 1, ___]}]] /@ sets]

{{1}, {1, 3}, {1, 4}, {1, 3, 4}}

Pick[sets, MatchQ[Except[{OrderlessPatternSequence[___,  2, ___]}] | 
  {OrderlessPatternSequence[___, 1, ___]}] /@ sets]

{{}, {1}, {3}, {4}, {1, 2}, {1, 3}, {1, 4}, {3, 4}, {1, 2, 3}, {1, 2, 4}, {1, 3, 4}, {1, 2, 3, 4}}

kglr
  • 394,356
  • 18
  • 477
  • 896
1

This solution is in the same spirit as your approach:

Clear@findSets
findSets[list_, all_, any_, none_] := Block[{set},
    SetAttributes[set, {Flat, Orderless}];
    Select[set @@@ list, 
        Function[s,
            !FreeQ[s, set @@ all] &&
            Or @@ (! FreeQ[s, set@#] & /@ any /. {} -> True) &&        
            And @@ (FreeQ[s, set@#] & /@ none)
        ]
   ] /. set -> List
]

Use an empty set if you're not specifying anything. You can also use default values of {} or convert the arguments to options such as Any -> {1,2}, All -> {3}, None -> {4} if that's easier to read. Here's an example usage:

l = Subsets[{1, 2, 3, 4}];
findSets[l, {2}, {3}, {}]
(* {{2, 3}, {1, 2, 3}, {2, 3, 4}, {1, 2, 3, 4}} *)

findSets[l, {1, 2}, {}, {4}]
(* {{1, 2}, {1, 2, 3}} *)

findSets[l, {1}, {2, 4}, {3}]
(* {{1, 2}, {1, 4}, {1, 2, 4}} *)
rm -rf
  • 88,781
  • 21
  • 293
  • 472