What is the difference between defining a function and specifying the type of argument, versus applying a test to that argument?

Question

Say I want to create a function that evaluates differently based on what type of argument is given. I've found two ways of doing this,

typefuncs := (f[x_List] := x~PadRight~3;
   f[x_Real] := x^2;
   f[x_Integer] := x - 2;);
testfuncs := (f[x_?ListQ] := x~PadRight~7;
   f[x_?IntegerQ] := x - 20;
   f[x_?NumberQ] := x^2.5;);

Interestingly, they both seem to work, but if you define the function one way first, trying to define it the other way will not stick:

ClearAll[f]
typefuncs
{f[{5}], f[5.7], f[5]}
testfuncs
{f[{5}], f[5.7], f[5]}
(* {{5, 0, 0}, 32.49, 3} *)
(* {{5, 0, 0}, 32.49, 3} *)

versus

ClearAll[f]
testfuncs
{f[{5}], f[5.7], f[5]}
typefuncs
{f[{5}], f[5.7], f[5]}
(* {{5, 0, 0, 0, 0, 0, 0}, 77.5688, -15} *)
(* {{5, 0, 0, 0, 0, 0, 0}, 77.5688, -15} *)

Which is the better practice for defining functions? What is the fundamental difference?

score 21 · Accepted Answer · edited Apr 13 '17 at 12:55

General

When you define a type based on a head, like

f[x_List, y_List]:=...

the test happens entirely in the pattern-matcher, not involving the main evaluator. I call such patterns "syntactic". Pattern tests on such patterns are usually faster or much faster. The reason is that all the matching happens entirely in the pattern-matcher, and the latter only needs to operate on the syntactic form of expression (FullForm), to establish the fact of the match. This can also be viewed as a strongest typing scheme available in Mathematica.

When you defined the patterns with ? (PatternTest) or /; (Condition), you invite the main evaluator to join the game. So when you define a function like

f[x_?ListQ, y_?NumericQ]:=...,

the pattern-matcher always calls the main evaluator in order to pattern-match these patterns. Because of this, such tests are more general, but also can be significantly slower. They can also induce side effects, through the predicate's code executed by the main evaluator - which can't happen with the _h - style patterns.

Performance

Here is a comparison with a built-in atomic type (strings):

chars = RandomChoice[CharacterRange["a", "z"], 100000];
MatchQ[chars, {___?StringQ}] // AbsoluteTiming
MatchQ[chars, {___String}] // AbsoluteTiming

(* {0.018108, True} *)

(* {0.001687, True} *)

We can see an order of magnitude speed difference. This becomes even worse when the testing function is a little more complex:

ClearAll[f];
symtest  = f /@ Range[100000];
MatchQ[symtest, {___f}] // AbsoluteTiming
MatchQ[symtest, {___?(Function[Head[#] === f])}] // AbsoluteTiming

(* {0.002725, True} *)

(* {0.087275, True} *)

The difference is most dramatic for packed arrays, where, in addition to the usual difference explained above, many pattern-testing system functions have been specially overloaded on certain patterns, so that they perform a constant-time check:

tst = Range[10000000];
MatchQ[tst, {___Integer}] // AbsoluteTiming
MatchQ[tst, {___?IntegerQ}] // AbsoluteTiming

(* {5.*10^-6, True} *)

(* {2.4889, True} *)

Evaluation

The differences outlined above have implications for evaluation control. In particular, for functions which hold their argument, the "syntactic" patterns won't always work. For example:

ClearAll[a, f, g];
a = Range[10];
SetAttributes[{f,g}, HoldFirst];
f[l_List]:=l;
g[l_?ListQ]:=l;
{f[a], g[a]}

(* {f[a], {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}} *)

The point is, that even though a stores a List, f can't check that by only using the pattern-matcher. In contrast to this, g calls the main evaluator, which confirms that a evaluates to a List.

This effect has also another side: when you use a pattern like a_List in a Hold* - function, you can be sure that there will be no evaluation leaks, while in cases where you use the tests based on Condition or PatternTest, you have to make an extra effort to ensure that (if that is desired). For example:

ClearAll[a];
a := Print["Leak!"]
Cases[Unevaluated[{a, {a, {a}}}], s_List :> Hold[s], Infinity, Heads -> True]

(* {Hold[{a}], Hold[{a, {a}}]} *)

while

Cases[Unevaluated[{a, {a, {a}}}], s_?ListQ :> Hold[s], Infinity, Heads -> True]

During evaluation of In[296]:= Leak!

During evaluation of In[296]:= Leak!

During evaluation of In[296]:= Leak!

During evaluation of In[296]:= Leak!

During evaluation of In[296]:= Leak!

During evaluation of In[296]:= Leak!

(* Out[299]= {Hold[{a}], Hold[{a, {a}}]} *)

In this particular case, one would have to write someting like this, to avoid the leaks:

Cases[
   Unevaluated[{a, {a, {a}}}], 
   s_?(Function[Null, ListQ[Unevaluated[#]], HoldAll]) :> Hold[s], 
   Infinity, 
   Heads -> True
]

(* {Hold[{a}], Hold[{a, {a}}]} *)

Pre-filtering

It is often useful to use the _h patterns even if they are not enough be themselves, for pre-filtering purposes.

Here is an example - first, a simple custom test for a list:

ClearAll[smallListQ];
smallListQ[x_ /; ListQ[x] && Length[x] < 20 && Total[x^2] < 1000] := True;
smallListQ[_] := False

Now, the same test, but with the part of a pattern using _h idiom:

ClearAll[smallListQBetter];
smallListQBetter[x_List /; Length[x] < 20 && Total[x^2] < 1000] := True;
smallListQBetter[_] := False

Compare performance:

smallListQ /@ Range[100000]; // AbsoluteTiming

(* {0.153377, Null} *)

smallListQBetter /@ Range[100000]; // AbsoluteTiming

(* {0.058185, Null} *)

In some cases, the difference may be far greater

When to use which

Whenever you can get away with purely "syntactic" tests, surely du use them.
Also, they can often be used when you defined a custom data type based on some head, used as a container for data. In those cases, the _h test serves as a test for a given data type.
Often you can't avoid using the _?predicate or x_/;predicate[x] types of patterns, because the tests must call the main evaluator, for whatever reason. Just be aware of performance and evaluation control - related implications of this method
Often one can combine both styles, using the _h - style patterns as a pre-filtering device. This can improve both the robustness of the code (stronger typing), and the performance.

Related discussions

I thought that was you. +1, because you're obviously here just for the points. :) — rcollyer, Dec 10 '15 at 14:59
@Kuba, maybe he accidentally saved the files for his book here instead of his hard drive... :) — J. M.'s missing motivation, Dec 10 '15 at 15:03
@rcollyer I think there are only a handful of people here on the site who still remember my true motives, obviously you being one of them :) — Leonid Shifrin, Dec 10 '15 at 15:04
Yes, and some of them read ol' answers How do you set attributes on SubValues? and Comments, well. — , Dec 10 '15 at 16:46
@LeonidShifrin - is there a good Q&A that discusses the practical differences between using PatternTest and Condition for defining a series of DownValues? Both bring the evaluator into play, but I wonder if there is a reason to prefer f[x_ /; test[x]] over f[x_?test]. — Jason B., Mar 31 '18 at 01:50
@JasonB. [1] I can't recall such a discussion, although it might be hidden somewhere on this site. One clear difference is that Condition can be applied to a pattern involving several variables, like f[x_, y_] /; x < y, while with PatternTest, a similar thing would be more clunky. Another difference is that it is somewhat easier to control evaluation (leaks) with Condition, like e.g. Cases[Unevaluated[expr], s_Symbol /; Context[s] === "System`" :> Hold[s], Infinity], while for PatternTest one would have something like Function[sym, Context[sym] === "System`", HoldFirst], ... — Leonid Shifrin, Mar 31 '18 at 14:32
@JasonB. [2] ... and it is far easier to forget using this long Function syntax with PatternTest - while in Condition, this problem does not exist, because there is no extra parameter-passing stage, where evaluation leak may happen. In terms of speed, PatternTest tends to be slightly faster, but if that difference starts to really be important, then may be pattern-matching is a wrong tool for that problem altogether. In terms of readability, I personally tend to use PatternTest a lot with predicates, like f[_?EvenQ], it is brief and clear. But this is largely a matter of taste. — Leonid Shifrin, Mar 31 '18 at 14:36
@JasonB. [3] A bit more on evaluation leaks: in my example above, if Context didn't have a HoldFirst attribute, I would have had to use Unevaluated in both examples: s_Symbol /; someFunction[Unevaluated[s]] and Function[sym, someFunction[Unevaluated @ sym], HoldFirst], which is what one generally needs to do to ensure safe passing of arguments to predicates without their premature evaluation (whenever that is important, which is in general not very often). Anyway, I personally tend to use PatternTest pretty much always, for a single-arg tests, and Condition for multi-arg tests. — Leonid Shifrin, Mar 31 '18 at 14:42
@JasonB.[4] Note also: Condition can be used in an extended rule / function definition syntax with so-called shared local variables, for example x_Integer :> With[{result = x ^2}, f[result] /; result > 100], in which case Condition is still considered a part of the pattern, even though it is used inside With (can also be Module or Block), and operates on a local variable. If the condition is not fulfilled, the whole rule is considered not applicable by the patter-matcher, which goes on to try the next rule (if available), or returns an expression back, as if no evaluation ... — Leonid Shifrin, Mar 31 '18 at 14:47
@JasonB.[5] ... took place, even though there clearly were intermediate evaluations like result = x ^ 2 happening in the process of resolving the Condition. This is a powerful feature associated specifically with Condition and not PatternTest. — Leonid Shifrin, Mar 31 '18 at 15:19
@LeonidShifrin - thank you (again) for the detailed response. — Jason B., Mar 31 '18 at 20:11