Is there any way to compile and export LearnedDistribution?

Question

I use the LearnDistribution function to learn the distribution of some data. Then, I want to export some code to sample from that learned distribution. I tried to use Compile as follows

compFun=Compile[{}, RandomVariate[ld]]

where ld is a LearnedDistribution. However, this fails and the function cannot be compiled and exported. Is there any way to export a function to sample from a LearnedDistribution so that it can be used in another language, like Python?

I think you may be out of luck: Which Distributions can be Compiled using RandomVariate and List of compilable functions. — MarcoB, Mar 25 '21 at 20:25
Is the resulting LearnedDistribution a kernel density distribution? If so, getting the pieces to obtain random samples in an external program should be relatively straightforward. — JimB, Mar 25 '21 at 21:49
@JimB Yes, the result uses kernel density estimation. How would I go about exporting it? If the LearnedDistribution were a mixture of Gaussians, I thought about exporting the mean and cov. of each Gaussian distribution. — sepehr78, Mar 26 '21 at 06:29
@MarcoB Thanks for the links. It seems that the issue is with the preprocessing stage because that is what the error indicates cannot be compiled. It seems like in general the ML functions like Classify and Predict also cannot be compiled, right? — sepehr78, Mar 26 '21 at 06:31

JimB · Accepted Answer · 2021-03-27T23:38:12.597

If you use the option Method -> "KernelDensityEstimation" in LearnDistribution, then the estimated bandwidth can be extracted. Call that value bw. (And I'm assuming that a Gaussian kernel will be used.)

(* Generate some data *)
SeedRandom[12345];
data = RandomVariate[NormalDistribution[0, 1], 100]
(* Use LearnDistribution  *)
ld = LearnDistribution[data, Method -> "KernelDensityEstimation"]

Based on the answer from @MarcoB one can obtain the bandwidth of the Gaussian kernel with

bw = First[ld]["Model", "KernelSize"]
(* 0.510338 *)

Now armed with the bandwidth (i.e., "KernelSize") to obtain a random sample from that estimated distribution one first randomly selects an observation from the original data. Then select a random sample from a normal distribution with a mean of the random observation and a standard deviation of bw. (Or equivalently, select a random sample from a normal distribution with mean 0 and standard deviation bw and then add in a random selection from the original data.)

RandomVariate[NormalDistribution[data[[RandomInteger[{1, Length[data]}]]], bw]]

It might even make more sense to write code in the external application that just uses the data, the estimated bandwidth, and whatever random functions are available especially if multiple random samples are contemplated.

I'll ignore the use of Compile (another subject I know little about).

For a much, much better description of this process see @whuber's answer at CrossValidated.

Is there any way to compile and export LearnedDistribution?

1 Answers1

Linked