Define custom probability density function in Python

Question

Is there a way, using some established Python package (e.g. SciPy) to define my own probability density function (without any prior data, just $f(x) = a x + b$), so I can then make calculations with it (such as obtaining the variance of the continuous random variable)? Of course I could take, say, SymPy or Sage, create a symbolic function and do the operations, but I'm wondering whether instead of doing all this work myself I can make use of an already-implemented package.

Thanks for an easy way! How do you generate a histogram of random numbers implementing this way of defining custom distribution function? — Ankur Agrawal, Jul 31 '17 at 17:44

score 35 · Accepted Answer · edited Feb 25 '16 at 08:13

35

You have to subclass the rv_continuous class in scipy.stats

import scipy.stats as st

class my_pdf(st.rv_continuous):
    def _pdf(self,x):
        return 3*x**2  # Normalized over its range, in this case [0,1]

my_cv = my_pdf(a=0, b=1, name='my_pdf')

now my_cv is a continuous random variable with the given PDF and range [0,1]

Note that in this example my_pdf and my_cv are arbitrary names (that could have been anything), but _pdf is not arbitrary; it and _cdf are methods in st.rv_continuous one of which must be overwritten in order for the subclassing to work.

edited Feb 25 '16 at 08:13

abcd

107
3

answered Mar 16 '12 at 20:01

GertVdE

6,179
1
21
36

@GertVdE: What does "self" in def _pdf do?? – Srivatsan Jun 13 '14 at 12:40
@Srivatsan: see the answer on Stack Overflow – GertVdE Jun 13 '14 at 13:39
There is a problem with the normalization, here: you need to give a normalized probability distribution function (3*x**2, here), or the resulting random variable yields incorrect results (you can check my_cv.median(), for example). I fixed the code. – Eric O. Lebigot Feb 23 '16 at 17:32
@EOL i'm finding your use of the term "normalized" confusing. what's needed, i believe, is for the function to be centered at 0 and scaled to 1. but this answer seems to imply that the normalization needs to be over the range of x [0, 1]. can you clarify? – abcd Feb 24 '16 at 19:32
follow-up question: how would i draw random samples from my defined variable? – abcd Feb 24 '16 at 19:36
@dbliss: "normalized" in the sense of a normalized probability distribution function: its integral over the range of possible values ([0,1], in this case) should be 1. – Eric O. Lebigot Feb 24 '16 at 20:55
1

Maybe the standard way is to use my_cv.rvs() (which can take a size argument, for obtaining multiple samples in one go). This is what I guess from the documentation (http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.rv_continuous.html#scipy.stats.rv_continuous). – Eric O. Lebigot Feb 24 '16 at 22:05
@EOL gotcha. it wasn't clear to me whether the [0, 1] thing was meant to be specific to this case or a generally necessary thing. – abcd Feb 24 '16 at 23:25

MRocklin · Answer 2 · 2015-03-17T17:48:11.333

19

You should check out sympy.stats. It provides an interface to deal with random variables. The following example provides a random variable X defined on the unit interval with density 2x

In [1]: from sympy.stats import *
In [2]: x = Symbol('x')
In [3]: X = ContinuousRV(x, 2*x, Interval(0, 1))

In [4]: P(X>.5) 
Out[4]: 0.750000000000000

In [5]: Var(X) # variance
Out[5]: 1/18

In [6]: E(2*cos(X)+X**2) # complex expressions are ok too
Out[6]: -7/2 + 4⋅cos(1) + 4⋅sin(1)

If you're interested this abstraction can handle some fairly complex manipulations.

edited Mar 17 '15 at 17:48

answered Mar 20 '12 at 18:11

MRocklin

3,088
20
29

1

Wow... this is just awesome! Thank you very much for this contribution. I'll keep an eye on this and your blog – astrojuanlu Mar 20 '12 at 21:39

Define custom probability density function in Python

2 Answers2

Linked