39

With the introduction of Dataset in version 10, Mathematica acquired a static type-checking subsystem. A fair number of dataset-related questions here on MSE concern the operation of that type subsystem. Recent examples include (88810) and (88503). The type system is discussed at length in (87479).

It can be difficult to sort out type-violations, ascending/descending operators, acceptable operator signatures, and so on. Is there a way to visualize the operation of the type system as an aid to understanding?

WReach
  • 68,832
  • 4
  • 164
  • 269

1 Answers1

40

This response defines a function called traceTypes which provides a quick-and-dirty visualization of type system operation. The function is somewhat fragile as it depends upon undocumented implementation details in version 10.2. Despite this fragility, it might be useful for study purposes as it handles many common type system use cases.

The code for the function can be found at the bottom of this post.

Here is a basic usage example, where we ask the type system to determine what would happen if we started with any value with the same type as {1, 2, 3} and then applied the operator Sort /* Reverse /* First:

traceTypes[{1, 2, 3}][Sort /* Reverse /* First]

traceTypes screenshot

The screenshot shows the basic features. The Operator column shows the original operator, along with the various suboperators. The Sig column shows whether the operator has any argument type signature rules. If we hover over a Y, a tooltip shows those rules. The A/D column indicates whether the operator is an ascending or descending operator (it presently does not identify mixed operators). Arg Types describes the input argument types for each operator. Result Type shows the final type produced by each operator.

So, reading the last row, we see that the First operator has two valid signatures, is an ascending operator, and will convert a 3-vector of integers into a scalar integer.

As a second example, consider what happens if we apply Map[#class&] to the Titanic data:

data = ExampleData[{"Dataset", "Titanic"}] // Normal;
traceTypes[data][Map[#class&]]

traceTypes screenshot

If we hover over the E data type for the class field, then we see that it is an enumeration of three values. The overall trace shows us that we reduce a list of 1309 "structs" to a list of 1309 enumeration values.

A trace can tell us if we make a type error. For example, let's say we try to extract the non-existent class field from the top-level list:

traceTypes[data][#class &]

traceTypes screenshot

We can see that the type system will not permit the operation, claiming a problem with "slot 1".

For more complex examples, this failure information combined with the detailed operator breakdown can be quite useful for diagnosing type errors. For example, consider the problem reported in (88810):

traceTypes[{<|"a" -> 1|>}][Merge[Identity] /* (#a &)]

traceTypes screenshot

The system accepted Merge[Identity], but the resulting type was unknown. Then, when the #a& was applied to that unknown type, the system rejected it as invalid. But a simple change of operator from #a& to #["a"]& makes the operation succeed:

traceTypes[{<|"a" -> 1|>}][Merge[Identity] /* (#["a"] &)]

traceTypes screenshot

The resultant data type remains unknown, however.

A trace can reveal the operator-rewriting that occurs when Query is used:

traceTypes[{<|"a" -> 1|>}][Query[Transpose]]

traceTypes screenshot

The repetition of the AssociationTranspose line is an artifact of duplicate evaluation steps within the trace of type system code evaluation.

Once again, the type system will catch our mistake if we attempt to transpose something that cannot be transposed:

traceTypes[<|"a" -> 1|>][Query[Transpose]]

traceTypes screenshot

traceTypes will operate upon datasets directly, in which case a sequence of operators can be applied, Query-style:

dataset = ExampleData[{"Dataset", "Titanic"}];
traceTypes[dataset][GroupBy["class"], Min, "age"]

traceTypes screenshot

Without further ado, the code follows...


Code

(* Caveat emptor:
   This code is designed to work with version 10.2.
   It uses undocumented functionality that could change at any time.
*)

Dataset; (* to force auto-loading *)
Needs["TypeSystem`"]
Needs["Dataset`"]

ClearAll[traceTypes]

traceTypes[ds_Dataset][op__] := traceTypes[ds // Normal][Query[op] // Normal]

traceTypes[args___][op_] :=
  With[{a0 = TypeSystem`Inference`PackagePrivate`apply0}
  , Module[{result = <||>, n = 0, s = {}, in, out, row}
    , SetAttributes[{in, out}, HoldAllComplete]
    ; in[HoldForm@a0[x_, a_]] :=
        ( ++n; s = {n, s}
        ; AppendTo[result, n -> <| "op" -> x, "args" -> a, "type" -> Null |>]
        )
    ; out[HoldForm@_a0, r_] :=
        (AppendTo[result[First@s], "type" -> HoldForm[r]]; s = Last@s)
    ; out[x:HoldForm@Catch[_a0, f:FailureType], FailureType[{_, r_}, ___]] :=
        AppendTo[result[First@s], "type" -> HoldForm[f[r]]]
    ; row =
        { #op
        , Signatures[#op] /. {s:_[{__}] :> Tooltip["Y", s], _ -> Null}
        , If[DescendingQ[#op], "Desc", "Asc"]
        , Row[#args, " \[Times] "]
        , #type
        }&
    ; ResetTypeApplyCache[]
    ; TraceScan[in, TypeApply[op, DeduceType /@ {args}], _a0 | _Throw | _Catch, out]
    ; result //
        Query[KeySort /* Values /* Map[row]] //
        Prepend[Style[#, Bold]& /@
          {"Operator", "Sig", "A/D", "Arg Types", "Result Type"}] //
        Grid[#, Frame -> All]&
    ]
  ]
WReach
  • 68,832
  • 4
  • 164
  • 269
  • 1
    This will be very useful. +1 – ciao Jul 25 '15 at 07:29
  • This is great, big +1. – Leonid Shifrin Jul 25 '15 at 14:55
  • 1
    @LeonidShifrin Can you really give bigger +1's than the rest of us? You make me a bit nervous. – Daniel Lichtblau Jul 26 '15 at 15:36
  • 2
    @DanielLichtblau Well, I just meant that I would give it more if the voting system would allow more detailed voting than a binary decision (well, ternary, if we include downvoting, but I don't use that). The "big" was only to compare to my own voting on some other q/a-s, not to how the others vote. So you can relax :). – Leonid Shifrin Jul 26 '15 at 19:34
  • 1
    @LeonidShifrin Thanks for the explanation. I was feeling early signs of upvote checkmark envy. – Daniel Lichtblau Jul 26 '15 at 22:35
  • Really great answer! I think this will be very useful. – Stefan R Jul 27 '15 at 18:47
  • By the way, while debugging some of the TypeSystem bugs, I have written my own tool, also based on TraceScan. But it was more like a traditional debugger - displayed more details, but was much less visual, than yours. If / when time permits, I will publish that code too. – Leonid Shifrin Jul 28 '15 at 09:29
  • @Leonid I look forward to the prospect. – WReach Jul 28 '15 at 14:25
  • @WReach of topic, do you indent the very first argument with tab( whatever default indent)+ two spaces if it can't be kept next to the header? – Kuba Feb 09 '17 at 10:20
  • @Kuba see my comments in chat. – WReach Feb 09 '17 at 15:44
  • If I verify the commands in 12.0.0 this still works stable and fast. There are many differences in the output. Some are well, less UnknownType fields, but there is another difference. I get Vector instead of {} and Atom and not E or I symbols. – Steffen Jaeschke May 17 '22 at 08:43