5

In my algorithm I need to maintain a set (an unordered list of distinct elements) of expressions supporting two operations:

  • Test an expression for membership in the set
  • Adding a new expression to the set

Expressions are to be compared using SameQ. The set can have hundreds of thousands of elements and I want it to work as fast as possible. In most programming languages I would use a hash-table or a balanced tree to implement such a set. Is there any better data structure in Mathematica for this purpose than a plain List? Is it worth trying to manually implement a better structure?

Vladimir Reshetnikov
  • 7,213
  • 27
  • 75
  • Try linked lists. – Spawn1701D Jun 07 '13 at 20:13
  • 2
    You can use a hash table implicit in DownValues, just by introducing some symbol (say, presentQ). Starting definition is presentQ[_]=False. Then,adding is as simple as presentQ[expr] = True, and presentQ itself tests for membership. This seems the easiest option. You can also use System`Utilities`HashTable as an alternative. – Leonid Shifrin Jun 07 '13 at 20:47
  • 1
    @LeonidShifrin Using downvalues is a simple and great idea! I should have realized this myself. – Vladimir Reshetnikov Jun 07 '13 at 21:46
  • @VladimirReshetnikov This is a standard and most common way to implement this sort of things. Sometimes one can also use SubValues, although the difference is mostly syntactic. But I have not seen a clear exposition in the documentation which would have explained that hash table functionality in Mathematica is most easily achieved via DownValues. – Leonid Shifrin Jun 07 '13 at 22:10
  • @LeonidShifrin I implemented the approach you suggested, but later found a bug in my implementation: when expr is a pattern, the plain presentQ[expr] = True does not have the intended meaning. The fix is to use presentQ[Verbatim[expr]] = True instead. – Vladimir Reshetnikov Jun 25 '13 at 01:26
  • @VladimirReshetnikov Yes, sure. This is well-known to me. It just did not cross my mind that you could have patterns among your expressions, this is rather untypical. – Leonid Shifrin Jun 25 '13 at 08:37

1 Answers1

5

Per Leonid's comment:

You can use a hash table implicit in DownValues, just by introducing some symbol (say, presentQ). Starting definition is presentQ[_] = False. Then, adding is as simple as presentQ[expr] = True, and presentQ itself tests for membership. This seems the easiest option. You can also use System`Utilities`HashTable as an alternative.

However, Vladimir notes:

When expr is a pattern, the plain presentQ[expr] = True does not have the intended meaning. The fix is to use presentQ[Verbatim[expr]] = True instead.

I would also add that the new Association data structure in Mathematica 10 is likely to be a faster and perhaps more convenient approach than using either downvalues or the System`Utilities`HashTable.

Oleksandr R.
  • 23,023
  • 4
  • 87
  • 125