33

Is there a way, using some established Python package (e.g. SciPy) to define my own probability density function (without any prior data, just $f(x) = a x + b$), so I can then make calculations with it (such as obtaining the variance of the continuous random variable)? Of course I could take, say, SymPy or Sage, create a symbolic function and do the operations, but I'm wondering whether instead of doing all this work myself I can make use of an already-implemented package.

abcd
  • 107
  • 3
astrojuanlu
  • 1,187
  • 2
  • 12
  • 25
  • Thanks for an easy way! How do you generate a histogram of random numbers implementing this way of defining custom distribution function? – Ankur Agrawal Jul 31 '17 at 17:44

2 Answers2

35

You have to subclass the rv_continuous class in scipy.stats

import scipy.stats as st

class my_pdf(st.rv_continuous):
    def _pdf(self,x):
        return 3*x**2  # Normalized over its range, in this case [0,1]

my_cv = my_pdf(a=0, b=1, name='my_pdf')

now my_cv is a continuous random variable with the given PDF and range [0,1]

Note that in this example my_pdf and my_cv are arbitrary names (that could have been anything), but _pdf is not arbitrary; it and _cdf are methods in st.rv_continuous one of which must be overwritten in order for the subclassing to work.

abcd
  • 107
  • 3
GertVdE
  • 6,179
  • 1
  • 21
  • 36
  • @GertVdE: What does "self" in def _pdf do?? – Srivatsan Jun 13 '14 at 12:40
  • There is a problem with the normalization, here: you need to give a normalized probability distribution function (3*x**2, here), or the resulting random variable yields incorrect results (you can check my_cv.median(), for example). I fixed the code. – Eric O. Lebigot Feb 23 '16 at 17:32
  • @EOL i'm finding your use of the term "normalized" confusing. what's needed, i believe, is for the function to be centered at 0 and scaled to 1. but this answer seems to imply that the normalization needs to be over the range of x [0, 1]. can you clarify? – abcd Feb 24 '16 at 19:32
  • follow-up question: how would i draw random samples from my defined variable? – abcd Feb 24 '16 at 19:36
  • @dbliss: "normalized" in the sense of a normalized probability distribution function: its integral over the range of possible values ([0,1], in this case) should be 1. – Eric O. Lebigot Feb 24 '16 at 20:55
  • 1
    Maybe the standard way is to use my_cv.rvs() (which can take a size argument, for obtaining multiple samples in one go). This is what I guess from the documentation (http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.rv_continuous.html#scipy.stats.rv_continuous). – Eric O. Lebigot Feb 24 '16 at 22:05
  • @EOL gotcha. it wasn't clear to me whether the [0, 1] thing was meant to be specific to this case or a generally necessary thing. – abcd Feb 24 '16 at 23:25
19

You should check out sympy.stats. It provides an interface to deal with random variables. The following example provides a random variable X defined on the unit interval with density 2x

In [1]: from sympy.stats import *
In [2]: x = Symbol('x')
In [3]: X = ContinuousRV(x, 2*x, Interval(0, 1))

In [4]: P(X>.5) 
Out[4]: 0.750000000000000

In [5]: Var(X) # variance
Out[5]: 1/18

In [6]: E(2*cos(X)+X**2) # complex expressions are ok too
Out[6]: -7/2 + 4⋅cos(1) + 4⋅sin(1)

If you're interested this abstraction can handle some fairly complex manipulations.

MRocklin
  • 3,088
  • 20
  • 29
  • 1
    Wow... this is just awesome! Thank you very much for this contribution. I'll keep an eye on this and your blog – astrojuanlu Mar 20 '12 at 21:39