8

Functions like Mean and RandomVariate clearly infer the dimension of the distribution passed to them. One can also usually determine the dimension of a distribution by calling one of these functions, but this is suboptimal. RandomVariate does not work if some of the parameters of the distribution are symbolic and long tailed distributions may not have a well defined mean. Even when this method works it is overkill. Presumably there is some lower level function that just determines the dimension that Mean and RandomVariate themselves use to determine the dimesion but I have not been able to find it.

Daniel Mahler
  • 1,095
  • 8
  • 16

1 Answers1

11

You can use DistributionDomain to find the domain of a distribution, which will also tell you the dimension.

I do not know where this is documented, but it does appear in some examples in the documentation.

Usage examples:

DistributionDomain[NormalDistribution[]]
(* Interval[{-∞, ∞}] *)

DistributionDomain[ParetoDistribution[xmin, alpha]]
(* Interval[{xmin, ∞}] *)

DistributionDomain[MultinormalDistribution[{0, 0}, {{1, 0}, {0, 1}}]]
(* {Interval[{-∞, ∞}], Interval[{-∞, ∞}]} *)

data = RandomReal[1, 10]
(* {0.60996, 0.615194, 0.106301, 0.543126, 0.812796, 0.711574, 0.814802, 0.839422, 0.0528327, 0.40623} *)

DistributionDomain[EmpiricalDistribution[data]]
(* {0.0528327, 0.106301, 0.40623, 0.543126, 0.60996, 0.615194, 0.711574, 0.812796, 0.814802, 0.839422} *)

% == Sort[data]
(* True *)

DistributionDomain[ZipfDistribution[rho]]
(* Range[1, ∞] *)

DistributionDomain[ZipfDistribution[10, rho]]
(* {1, 2, 3, 4, 5, 6, 7, 8, 9, 10} *)

Notice that a simple Length@DistributionDomain[...] isn't sufficient to determine the dimension. There are helper functions to determine if a distribution (or its domain) are univariate.

  • Statistics`Library`UnivariateDomainSpecificationQ can be applied to a domain specification

  • Statistics`Library`UnivariateDistributionQ can be applied to a distribution and is based on the function above.

There is also Statistics`Library`Dump`HeldDistributionDomain which prevents Range from expanding in the domain of some discrete distributions, for example:

Statistics`Library`Dump`HeldDistributionDomain[ZipfDistribution[10, rho]]
(* Hold[Range][1, 10] *)

Looking at its definition it simple uses Block to temporarily prevent Range from evaluating, which you can do manually yourself to reduce the reliance on private internal functions that might not even be loaded in a fresh kernel (until something else triggers loading them).


As Andy Ross mentioned in the comments, Statistics`Library`DistributionDimensionality will directly return the dimensionality of the domain.


Warning: As with all undocumented functions that are not in the System` context there's no guarantee of reliability or that they'll work in future versions.

Szabolcs
  • 234,956
  • 30
  • 623
  • 1,263
  • I've never used this function before, I found it using some spelunking when I saw your question. – Szabolcs Apr 17 '14 at 18:43
  • 3
    There is also Statistics`Library`DistributionDimensionality but I make no promises that it is robust. – Andy Ross Apr 17 '14 at 19:54
  • @Andy I think you're in a much better position to answer this ... Is DistributionDomain robust? It's not documented but it is in the System context. – Szabolcs Apr 17 '14 at 20:03
  • Looks like @AndyRoss's DistributionDimensionality handles the corner cases that break DistributionDomain as @Szabolcs pointed out in the main answer – Daniel Mahler Apr 17 '14 at 20:25
  • 1
    Together with Andy's comment this answers my question. I guess if @AndyRoss had written a separate answer I would have accepted that since DistributionDimensionality is really what I had in mind, assuming it is robust. Gotta love undocumented functions in closed software ;) – Daniel Mahler Apr 17 '14 at 20:43
  • @AndyRoss & @Szabolcs How did you find these functions in the first place. How can I search/list all the functions in StatisticsLibrary. – Daniel Mahler Apr 19 '14 at 22:28
  • @DanielMahler Evaluate ?Statistics`Library`*. Keep in mind that anything that's not documented and not in System` might go away/change in the next version, might crash your kernel, or might give you a wrong result. Yes, these things do actually happen. I found DistributionDomain by making a good guess and searching for ?*Domain*. – Szabolcs Apr 19 '14 at 23:21
  • Thanks @Szabolcs! I did not realize that ? supports pattern matching. That is very useful. – Daniel Mahler Apr 19 '14 at 23:36
  • 1
    @DanielMahler ?something* will search in all contexts that are in $ContextPath and ?*`something* will search in all contexts. The latter tends to return a lot of internal stuff that's not useful, so there's more to wade through. The warnings I gave you about undocumented/internal stuff are not meant to deter you, I sometimes use these too. But the problems can and do happen (I've been bitten several times.) – Szabolcs Apr 19 '14 at 23:56
  • ThanksAgain @Szabolcs. I appreciate your warnings about StatisticsLibrary, but it looks like I will have to live in the land of undocumented functions if I actually want to get useful things done with Mathematica, as by the sounds of it, do you. I have absolutely no objections to more functions getting tested, documented and made official though :) – Daniel Mahler Apr 20 '14 at 00:06
  • Also @Szabolcz how did you figure oout what paarameters it takes once you found it> – Daniel Mahler Apr 20 '14 at 01:03
  • @Daniel Well, for DistributionDomain the first guess is pretty obvious and it worked. But you can always try a bit of spelunking to find interesting stuff ;-) Use the (newer) code from GitHub. – Szabolcs Apr 20 '14 at 01:07
  • 1
    @Szabolcs I believe DistributionDomain is fairly robust but it may be that the representation of domains may change in the future. Some things are pretty wonky, like held ranges to infinity, and probably need different symbols to represent them. – Andy Ross Apr 20 '14 at 01:20