11

I want to know what Dispatch actually does to rule the list. Why is it so fast?

The documentation says

Dispatch generates a dispatch table which uses hash codes to specify which sets of rules need actually be scanned for a particular input expression.

What is a hash table? How does it specify which sets of rules need actually be scanned for a particular input expression?

What about DeleteCases? Since it is fast too, Does it use hash table too?

matheorem
  • 17,132
  • 8
  • 45
  • 115
  • 4
    I think this question is likely off-topic as it is about a common data structure. See hash table – Mr.Wizard Nov 22 '13 at 11:42
  • 2
    @Mr.Wizard You are right, but I think this question could attract good answers about Mma internals. Let's see. – Dr. belisarius Nov 22 '13 at 12:11
  • @Mr.Wizard Yeah, I admit it is a little off-topic. And I was wondering if DeleteCases also use hash table? Because it is fast too. – matheorem Nov 22 '13 at 13:14
  • @rm-rf For the case where the r.h.s. of DeleteCases is an alternative pattern (Alternatives[elems]), and when elems do not contain patterns, since V8 DeleteCases has been optimized to work really fast, even for the large number of elems. I strongly suspect that a hash table was used internally in this case. And this may well be what the OP means here, since I do recall some comments by the OP related to this particular issue. – Leonid Shifrin Nov 22 '13 at 15:18
  • @LeonidShifrin Wouldn't that then be an improvement to Alternatives? I thought DeleteCases by itself was just a tree traversal? – rm -rf Nov 22 '13 at 15:22
  • @rm-rf It would, in case if it was implemented consistently across all functions which work with Alternatives. Since this change has been made specifically for DeleteCases, it is de facto an improvement for DeleteCases. – Leonid Shifrin Nov 22 '13 at 15:24
  • @LeonidShifrin Ah, ok then. I thought that it was quite random to pair up DeleteCases with hash tables, but what you say makes sense. Thanks. – rm -rf Nov 22 '13 at 15:26
  • @rm-rf No problem :) – Leonid Shifrin Nov 22 '13 at 16:28
  • 5
    As @LeonidShifrin (and the original poster and maybe others) surmise, that special case of DeleteCases uses a hash table. Cases likewise. – Daniel Lichtblau Nov 22 '13 at 19:17
  • @DanielLichtblau I should have probably mentioned that this optimization was your contribution. Very useful thing. – Leonid Shifrin Nov 22 '13 at 19:35
  • @DanielLichtblau Do you care to post that as an answer? – Dr. belisarius Nov 24 '13 at 23:27
  • @belisarius Okay, done. For what little I could say about the main part of the question, I might as well have deferred to Shirley... – Daniel Lichtblau Nov 26 '13 at 15:32

1 Answers1

4

As @LeonidShifrin (and the original poster and maybe others) surmise, that special case of DeleteCases uses a hash table. Cases likewise.

This does not really address the original question, to wit, how is hashing used behind the scenes with dispatch tables. In truth I do not know the answer to that. I will surmise that subexpressions are hashed and only dispatch table rules with left-hand-sides of same hash value are applied. But that's just a wild guess.

Daniel Lichtblau
  • 58,970
  • 2
  • 101
  • 199