Is there a way, using some established Python package (e.g. SciPy) to define my own probability density function (without any prior data, just $f(x) = a x + b$), so I can then make calculations with it (such as obtaining the variance of the continuous random variable)? Of course I could take, say, SymPy or Sage, create a symbolic function and do the operations, but I'm wondering whether instead of doing all this work myself I can make use of an already-implemented package.
Asked
Active
Viewed 3.5k times
33
-
Thanks for an easy way! How do you generate a histogram of random numbers implementing this way of defining custom distribution function? – Ankur Agrawal Jul 31 '17 at 17:44
2 Answers
35
You have to subclass the rv_continuous class in scipy.stats
import scipy.stats as st
class my_pdf(st.rv_continuous):
def _pdf(self,x):
return 3*x**2 # Normalized over its range, in this case [0,1]
my_cv = my_pdf(a=0, b=1, name='my_pdf')
now my_cv is a continuous random variable with the given PDF and range [0,1]
Note that in this example my_pdf and my_cv are arbitrary names (that could have been anything), but _pdf is not arbitrary; it and _cdf are methods in st.rv_continuous one of which must be overwritten in order for the subclassing to work.
-
-
-
There is a problem with the normalization, here: you need to give a normalized probability distribution function (
3*x**2, here), or the resulting random variable yields incorrect results (you can checkmy_cv.median(), for example). I fixed the code. – Eric O. Lebigot Feb 23 '16 at 17:32 -
@EOL i'm finding your use of the term "normalized" confusing. what's needed, i believe, is for the function to be centered at 0 and scaled to 1. but this answer seems to imply that the normalization needs to be over the range of
x[0, 1]. can you clarify? – abcd Feb 24 '16 at 19:32 -
follow-up question: how would i draw random samples from my defined variable? – abcd Feb 24 '16 at 19:36
-
@dbliss: "normalized" in the sense of a normalized probability distribution function: its integral over the range of possible values ([0,1], in this case) should be 1. – Eric O. Lebigot Feb 24 '16 at 20:55
-
1Maybe the standard way is to use
my_cv.rvs()(which can take asizeargument, for obtaining multiple samples in one go). This is what I guess from the documentation (http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.rv_continuous.html#scipy.stats.rv_continuous). – Eric O. Lebigot Feb 24 '16 at 22:05 -
@EOL gotcha. it wasn't clear to me whether the [0, 1] thing was meant to be specific to this case or a generally necessary thing. – abcd Feb 24 '16 at 23:25
19
You should check out sympy.stats. It provides an interface to deal with random variables. The following example provides a random variable X defined on the unit interval with density 2x
In [1]: from sympy.stats import *
In [2]: x = Symbol('x')
In [3]: X = ContinuousRV(x, 2*x, Interval(0, 1))
In [4]: P(X>.5)
Out[4]: 0.750000000000000
In [5]: Var(X) # variance
Out[5]: 1/18
In [6]: E(2*cos(X)+X**2) # complex expressions are ok too
Out[6]: -7/2 + 4⋅cos(1) + 4⋅sin(1)
If you're interested this abstraction can handle some fairly complex manipulations.
MRocklin
- 3,088
- 20
- 29
-
1Wow... this is just awesome! Thank you very much for this contribution. I'll keep an eye on this and your blog – astrojuanlu Mar 20 '12 at 21:39