7

I want to have two groups of $n$ random numbers $u_i$ and $v_i$ in $U(0,1)$, such that $\sum u_i = \sum v_i$

What I tried is:

I can firstly get $u_i$ by RandomReal[{0,1},n], make $s=\sum u_i$.

Then I found it is very difficult to generate another $n$ uniformly distributed random numbers $v_i$ from $U(0,1)$ that sum to $s$, where $s$ is a real value in $[0,n]$. I can scale it but need to reject many cases that $v_i$ is larger than 1, I guess.

Try to make the question clearer, my original problem is:

I have $8$ parameters $\kappa_i, i=1,\ldots,8$ from a system, each parameter $\kappa_i$ can be any value in $[0,1]$. But I have a constraint on my parameters which is $\kappa_1+\kappa_2+\kappa_3+\kappa_4=\kappa_5+\kappa_6+\kappa_7+\kappa_8$. Now I want to sample the whole parameter space (is this counted as Monte Carlo?) with such constraint. What should I do?

Update:

I have used @Coolwater 's method, but the problem is that rejecting any values larger than 1 costs a lot. When I want to sample 10,000 sets, it costs me hours. By the time I update this post, it is still running.

Any ideas about how to do this efficiently?

More update: @JasonB 's approach perfectly solved my problem. Actually, it makes sense that just scale the larger group based on the two sums ratio!!! I was too stupid to come out with this idea, which is very intuitive and straightforward!

LifeWorks
  • 427
  • 2
  • 11
  • this, and do a rescale maybe? – egwene sedai Feb 05 '16 at 13:09
  • 1
    Will they not have to have an expected sum of n/2 or they cannot be U[0,1] distributed? – Ymareth Feb 05 '16 at 13:47
  • 6
    I am not sure you realize that the question, as stated, makes no sense. If they are truly uniformly distributed then they (very likely) won't sum to a given value. If you put in a constraint, such as "they must sum to 1", then the question is: what do you mean by random? No, this is not nitpicking. It's a very common mistake when thinking about what "random" means, see e.g. Bertrand's paradox. Before the question can be answered you will need to decide what you really mean when you say "random numbers" and how the constraint impacts on that. – Szabolcs Feb 05 '16 at 16:26
  • Here's a related question where the answer shows how different interpretations of the question will lead to very different distributions. http://mathematica.stackexchange.com/q/33652/12 – Szabolcs Feb 05 '16 at 16:39
  • @Szabolcs: Well, they CAN both be uniformly distributed and equal, taking $u_i=v_{\pi(i)}$, for some permutation $\pi$, but then they are of course not independent. – Per Alexandersson Feb 05 '16 at 16:58
  • @Szabolcs: The unconditional distribution of the $x_i$ can be from independent $U(0,1)$ distributions. The joint density of the $x_i$ given that they sum to $s$ is not the same as the product of the uniform densities. In essence $x_i$ is not the same random variable as $x_i|x_1+x_2+\cdots+x_n=s$. So I don't see why you say the question makes no sense. – JimB Feb 06 '16 at 01:25
  • @Jim If you look at all the answers here, they all come to different interpretations of the question. See my comment on Per's answer. That shows that the question is unclear, and the OP very likely did not understand the subtleties involved with his three requirements and their interactions, namely: 1. the constraint that $x_i$ sum to $s$ 2. that each $x_i$ is from $U(0,1)$, which supposedly means that they'd have flat histograms and finally most most subtle notion, 3. that they be "random". Now about this I might be wrong (!), but it seems to me that 1. and 2. are contradictory for $n>2$. – Szabolcs Feb 06 '16 at 09:29
  • @Jim I'll try come back to this later today (I must leave now). – Szabolcs Feb 06 '16 at 09:31
  • 2
    I have voted to close as unclear with the following reasoning: the posted answers all seem to interpret the question differently (i.e. propose different distributions). This is good evidence that before allowing more answers, the question should be put into a clearer form. How to do that is a good and interesting question in itself but as Jim said it is more suitable to Math.SE. – Szabolcs Feb 06 '16 at 09:32
  • I'll propose a cheap and dirty solution, generate the 1st set from RandomReal; then generate the 2nd by permuting the 1st one. – m_goldberg Feb 06 '16 at 10:06
  • @Szabolcs I updated my question to make things clearer. Is the question clearer now? Many thanks for your input. – LifeWorks Feb 06 '16 at 14:54
  • @Szabolcs I never said those random variables are independent, I just need them to be uniformly distributed in fixed range, in this case it is $U(0,1)$. – LifeWorks Feb 06 '16 at 15:10
  • @Szabolcs : You convinced me. Having the OP get an answer from Cross Validated or Mathematics is still probably the best approach (then back in Mathematica for any issues about implementation). But the OP's update does make the question clearer (for me). The common sum is now explicitly a random variable rather than a fixed and known quantity as I assumed in my original answer. – JimB Feb 06 '16 at 16:25
  • @LifeWorks, you mention that Coolwater's method takes hours, but you haven't said how my answer fails to solve the problem. It produces two sets of random numbers between zero and one that both sum to the same number. If this isn't what you want, what is? – Jason B. Feb 06 '16 at 19:35
  • @JasonB Sorry for late post, I will try to implement your approach in my code to check if it is OK and fast enough. Then I will report the result. Thanks and Happy Chinese New Year. – LifeWorks Feb 07 '16 at 09:30
  • The edit didn't help at all. The ambiguity lies in the requirement /meaning of randomness. Also can you explicitly say the size of the set required. (10000 sets of pairs of length 4 ? ) – george2079 Feb 08 '16 at 16:30
  • @JasonB Thanks very much!!!!! This is perfect! – LifeWorks Feb 10 '16 at 15:01

4 Answers4

6

If you want two lists to have the same Total, then you need to scale one of them by the right amount. The trick is to pick which one to scale so that both of the lists are within $U(0,1)$

n=2000;
lists = RandomReal[1, {n, 2}] // Transpose;
lists = lists (Min[Total /@ lists]/Total@# & /@ lists);

Now you verify that they are both from the right distribution and have the same sum,

MinMax /@ lists
Total /@ lists
Histogram /@ lists
(* {{0.0000306034, 0.999652}, {0.0000765896, 0.992954}} *)
(* {999.074, 999.074} *)

enter image description here

As Coolwater points out, this does skew the distribution of sums, due to the fact that we are always choosing the smaller sum. You can do away with this by replacing Min[Total /@ lists] with Total[lists[[1]]], but then you have the problem that some small portion of your lists will be outside the range $U(0,1)$. I'm no statistician, but it seems that generating that second list which is both uniformly distributed and has a given sum isn't a problem with a solution. The above is pretty close though.

Looking around on the web, a common recipe given to generate a uniform random list with a given sum is [(quoting from here, but you find the same procedure here and here)

Generate N-1 random numbers between 0 and 1, add the numbers 0 and 1 themselves to the list, sort them, and take the differences of adjacent numbers.

So lets say I make list1, which has 100 elements and a given sum:

list1 = RandomReal[1, 100];
sum = Total@list1
Histogram@list1
(* 48.1 *)

enter image description here

Now I follow that recipe to make another list, again with 100 elements between 0 and 1, whose sum is the same as list1

list2 = 
  sum Differences@Sort@Join[RandomReal[1, 99], {0, 1}];
Total@list2
Histogram@list2
(* 48.1 *)

enter image description here

Clearly list2 is neither drawn from a uniform distribution, nor confined to the interval [0,1]

Jason B.
  • 68,381
  • 3
  • 139
  • 286
  • Note that this approach doesn't cause the sums to follow the UniformSumDistribution[n], because the smallest sum is chosen every time. QuantilePlot[Sort[Table[(lists = RandomReal[1, {20, 2}] // Transpose; lists = lists (Min[Total /@ lists]/Total@# & /@ lists); Total[First[lists]]), {200}]], UniformSumDistribution[20], Method -> {"ReferenceLineMethod" -> "Diagonal"}] – Coolwater Feb 05 '16 at 13:08
  • That isn't what was asked for in the question. As I understood the question, OP needs two lists that have the same sum, both of which have a uniform distribution between 0 and 1. This does that. I literally cannot get your code to run so I can't even evaluate it. – Jason B. Feb 05 '16 at 13:18
  • @JasonB The question is not well stated (or rather: contradictory), see my comments above. – Szabolcs Feb 05 '16 at 16:41
  • Thanks very much! I think this is the best solution! Appreciate it! – LifeWorks Feb 10 '16 at 15:07
1

Here is an approach to produce a good approximation. Brute force generate lots of distributions until we achieve the desired total:

target = Total@RandomReal[1, {1000}]

511.315

set2 = NestWhile[  Append[ Rest@#, RandomReal[1]] &, 
   RandomReal[1, {1000}],
   Abs[Total[#] - target] > .0001 &];
Total@set2

511.315

% - target

-0.0000451315

Alternately if we want two sets with the same total, as opposed to generating one and trying to match it, we can do this:

m = SortBy[RandomReal[1, {50000, 1000}] , Total ];
sets = m[[# ;; # + 1]] &@First@Ordering[Abs[Differences[Total /@ m]]];
{Total@sets[[1]], Total@sets[[2]], Total@sets[[1]] - Total@sets[[2]]}

{493.849, 493.849, -1.66665*10^-9}

This is of course biasing the total to be close to the expected value.

george2079
  • 38,913
  • 1
  • 43
  • 110
1

Let $U,V\sim U\left(0,\,1\right)$ be two iid standard uniform random variables. Sample $n$ times from $U$ and denote the sum by $s_{x} := \sum_{k=1}^{n} u_{k}$. Note that this sum $s_{1}$ has the Irwin–Hall distribution, which is defined to be the sum of iid standard uniform random variables.

I understand your question in the following way: You are asking for one realization of $n$ random variables $Y_{k}$, which have the conditional joint distribution that they sum to $s_{x}$ and are individually unconditional $Y_{k}\sim U\left(1,\,0\right)$ iid.

Some thoughts for the case $n=2$:

In Expectation we have $s_{x}=1$.

Let us assume that $s_{x}\leq 1$, for instance $s_{x}=0.8$.

So we are searching for two numbers which sum to $0.8$. Note that this problem has only $n-1=1$ Degree of Freedom: After we are given the first realization $y_{1}$ the last realization $y_{2}$ is uniquely determined by the requirement that $y_{1}+y_{2}=s_{x} \iff y_{2}=s_{x}-y_{1}$.

Because of the requirement that the sum is $s_{x}=0.8$, we can only sample $Y_{1}$ from $U\left(0,\,0.8\right)$.

Let us assume that $s_{x}\geq 1$, for instance $s_{x}=1.8$.

In the second and last step the maximal possible realization of $y_{2}$ is $1$. So in order to get to a sum of $s_{x}=1.8$ we have to sample $Y_{1}$ from $U\left(0.8,\,1\right)$.

Some thoughts for general $n$:

In the last $m$ realizations we can get a maximal sum of $y_{n-m+1}+\ldots+y_{n}=m$ and a minimal sum of $0$. So if there are $m$ realizations left we have to be in a position that the running sum $s_{y}^{\left( n-m+1\right)}:=\sum_{k=1}^{n-m}y_{k}$ is at least $s_{x}-m$.

Mathematica function

The following function implements this idea.

ClearAll[UnifCondOnSum]
UnifCondOnSum[sx_?NonNegative, n_?IntegerQ] := 
  Block[{sy = 0, y = ConstantArray[0, Length@x]},
   For[i = Length@x, i >= 2, i--,
    y[[i]] = RandomReal[{Max[0, sx - sy - (i - 1)], Min[1, sx - sy]}];
     sy = sy + y[[i]]
    ];
   y[[1]] = sx - sy;
   y];

Generate the realizations:

SeedRandom[0]
Block[{n = 8}, 
 x = RandomVariate[UniformDistribution[], n];
 v = UnifCondOnSum[Total@x, n];
 {x, v}]
(* {{0.393562, 0.701033, 0.966231, 0.221456, 0.436768}, 
    {0.809425, 0.333722, 0.288053, 0.727646, 0.560204}} *)

Check there respective sums:

Total /@ {x, v}
(* {2.71905, 2.71905} *)

You could also generate the sum $s_{x}=s_{y}$ by the Irwin–Hall distribution and generate both, $\left(x_{k}\right)_{k=1}^{n}$ and $\left(y_{k}\right)_{k=1}^{n}$ with UnifCondOnSum.

This method generates $10^{5}$ samples of $X$ and $Y$ nearly instantaneous because it is a direct method.

Disclaimer: I hope I did not mess up the indices.

Marco Breitig
  • 279
  • 3
  • 11
0

Someone can maybe implement the following idea:

It is enough to be able to produce a list of $n$ random numbers in range $[0,1]$ with total sum $s$, where the random numbers are chosen in some uniform (fair) fashion.

Note that picking a random vector in $C=[0,1]^n$ is the same as sampling a hypercube. We can intersect this hypercube with the plane $L_s:x_1+\dotsb+x_n =s$, and sample $L \cap C$ with a uniform measure. Thus, all variables are identically distributed, but they are not independent (I think)...

Now, once we know how to pick a vector in $C \cap L_s$ at random, we first pick $s$ with the same distribution as a sum of $n$ iid uniform random variables, and then pick two vectors from $C \cap L_s$. They will by construction have the same sum, and all variables are identically distributed random variables.

However, they are probably not independent (not inside the vector, not between vectors either).

Per Alexandersson
  • 2,469
  • 15
  • 19
  • 1
    To answer your comment, and others too: if you carry out this procedure and then look at the histogram of $x_1$, it won't be flat at all. What does the OP then mean by asking that "they be in $U(0,1)$"? We can reverse it too: if we enforce the flat histogram, and also enforce that they sum to $s$, what does it mean that they are "random" (see reference to Bertrand's paradox). It seems to me that the OP did not understand these issues well when asking the question, and thus ended up with an unclear phrasing. That's well demonstrated by all the answers here proposing different distributions. – Szabolcs Feb 06 '16 at 08:41