3

Consider the following list of 4215 sublists (note that the file as displayed at github is truncated, but there is a link to the full version). Each of the 4215 sublists contain one or more cst[i] entries for some integer i. I would like to set to zero / remove the maximum amount of different cst[i] while still maintaining at least one cst[i] in each of the 4215 sublists. In particular, I need to know which of the cst[i] are being dropped in the process. How should I be going about finding this list of cst[i] efficiently?

EDIT:

I was asked in the comments to illustrate my problem on a simple example. Consider the following toy list:

myList= {{cst[1],cst[2],cst[3]},{cst[1]},{cst[2],cst[3]}}

This toy example list contains three sublists. Now, the task is to drop a maximum amount of cst[i] such that each of the sublists still contains at least one cst[i]. By direct inspection we see that the second sublist consists of the single element cst[1], so that cst[1] definitely cannot be removed. The first and the second sublists have more entries though, and we see that either cst[2] or cst[3] can be dropped (but not both) in order to still satisfy the condition. Therefore the output might look like

WhichCanBeDropped[myList]

{cst[2]}

Or, it could look like

WhichCanBeDropped[myList]

{cst[3]}

Both results would be considered equivalent.

EDIT2:

Another case of interest would be a list with sublists containing 2 or more elements cst[i]. i.e.

myList= {{cst[1],cst[2],cst[3]},{cst[1],cst[4]},{cst[2],cst[3],cst[4]}}

Now we want to remove a maximum amount of elements cst[i] such that each sublist still contains at least two cst[i] entries. In the above new toy list the answer would again be that cst[2] xor cst[3] can be dropped.

Kagaratsch
  • 11,955
  • 4
  • 25
  • 72

1 Answers1

5

Generalization

Your second example is easy to accommodate, and while I'm at it I'll wrap up my code as reusable functions. First a function to convert your data to a binary matrix. I shall assume that the input to the function will be e.g. {{1, 2, 3}, {1}, {2, 3}} but I include two external methods to strip the cst from your lists.

toArray[dat_: {{__Integer} ..}] /; Min[dat] > 0 :=
  Module[{m},
    m = ConstantArray[0, {Max @ dat, Length @ dat}];
    MapIndexed[(m[[##]] = 1) &, dat];
    m
  ]

Applied to your first input:

myList1 = {{cst[1], cst[2], cst[3]}, {cst[1]}, {cst[2], cst[3]}};

myList1[[All, All, 1]] // toArray // MatrixForm    

$\left( \begin{array}{ccc} 1 & 1 & 0 \\ 1 & 0 & 1 \\ 1 & 0 & 1 \\ \end{array} \right)$

The question is then which rows can we delete without causing any columns to total a number below a specified value. To speed the test I will Total the array (by column) first, then subtract each row from that total. The minimum value is compared against the reference n and positions are found using fast numeric operations UnitStep and SparseArray.

canBeDropped[m_?MatrixQ, n : _Integer?Positive : 1] :=
  With[{tot = Total[m]},
    Map[Min[tot ~Subtract~ #] &, m] - n //
      UnitStep // SparseArray[#]["AdjacencyLists"] &
  ]

The complete process for your first example:

myList1[[All, All, 1]] // toArray // canBeDropped
{2, 3}

And for your second:

myList2 = {{cst[1],cst[2],cst[3]},{cst[1],cst[4]},{cst[2],cst[3],cst[4]}};

toArray[myList2 /. cst -> Identity] ~canBeDropped~ 2
{2, 3}

Observe that /. cst -> Identity is another way to prepare your data for toArray. Depending on your needs you may also wish to look at ArrayComponents for this task, though be sure you understand what it is doing before you adopt it.

Mr.Wizard
  • 271,378
  • 34
  • 587
  • 1,371