How to transform data to uniform distribution (uniform percentiles)?

Question

Given the list of data points (normalized in [0,1] range), I plot the histogram of values and compute percentiles (shown as x ticks).

How to find a transformation of data values so the histogram is approximately uniform. Which would, in turn, make percentile values also uniformly distributed.

score 2 · Accepted Answer · edited Sep 14 '19 at 23:59

2

Hi: You can calculate the empirical cumulative distribution of the data. By this, I mean, given some observation in the sample, $x_i$, calculate $P(X < x_{i})$ by calculating the proportion of observations that are less than $x_{i}$ (i.e. the percentiles ). Then, do this for all the $x_{i}$ so that you have the cumulative distribution of the $x_{i}$.

Then, $P(X < x)$ is uniform for a given value of $x$.

In fact, it seems like you already did this but the percentile values should be on the vertical axis and the values of the data should be on the x-axis.

Note that page 14 of this PDF explains the concept more clearly than I have.

Example Implementation

Below is a quick-and-dirty attempt to illustrate this answer. The image below shows the original histogram of the Gaussian, the empirical cumulative distribution function of that data, and then the histogram of the converted data.

R Code Below

par(mfrow=c(3,1))
# First, generate some Gaussian numbers.
gaussian <- rnorm(1000,0.0,0.05)
gh <- hist(gaussian, breaks=1000)

empirical_cumulative_distribution <- cumsum(gh$counts)/1000

plot(gh$mids, empirical_cumulative_distribution)


uniformize <- function(x) {
  ans_x <- x
  for (idx in seq(1,length(x))){
    max_idx <- max(which(gh$mids < x[idx]))
    ans_x[idx] <- empirical_cumulative_distribution[max_idx]
  }
  return(ans_x)
}

uniform2 <- uniformize(gaussian )
hist(uniform2, breaks=100)
par(mfrow=c(1,1))

edited Sep 14 '19 at 23:59

Peter K.

25,714
9
46
91

answered Sep 13 '19 at 14:34

mark leeds

1,117
1
7
14

1

Hi Peter: you're making me look bad :). thanks. – mark leeds Sep 14 '19 at 00:53
@Peter K: Is there something that shows you to do links so that I can do what you did as far as pointing to the actual pdf rather than typing the link ? Or, is there a way for me to see your latex so that I can see what you did. I have been told that I should learn this and I should !!!!! Thanks. – mark leeds Sep 14 '19 at 00:58
To inline links, just select the text you want to add the link to and hit the link/chain icon in the top of the text editor. I tend to prefer linked text to raw links, so I change it when I see it (and can be bothered). – Peter K. Sep 14 '19 at 01:03
@Peter K: I forgot that that was "my answer" so I was able to just hit the edit button to look at what you did. I learned a lot and will use it in the future when I want to do that better way of linking. Thanks. – mark leeds Sep 14 '19 at 06:26
You're welcome! Yes, I figured if I put my answer in it'd be the same as yours, so I thought I'd just edit yours and add my $0.02. – Peter K. Sep 14 '19 at 12:32
you added like 10 bucks when there was one cent there. It's MUCH more clear with your example. Some questions are best explained by example and that was one of them. Looks like you're an R person so thanks again fellow R-er. – mark leeds Sep 14 '19 at 19:58
You’re too kind. Thank-you. Not really very R proficient. I just decided I needed to learn it, and make most of my answers here I It if they require examples. – Peter K. Sep 14 '19 at 23:57
1

@Peter K: R is s vast so it has many "levels" of users-developers. If you want to up your R game, get Hadley Wickham's "Advanced R book" but not the most recent edition. The recent edition I think focuses on Hadley's tidyverse ( which I'm confident is fine if you're into that tidyverse material ) but I have the first edition and that focus was on advanced language concepts. A really nice exposition. – mark leeds Sep 15 '19 at 04:33
Thanks, Mark! I've ordered the first edition.It was available second hand. :-) – Peter K. Sep 16 '19 at 13:16
great. it's a very nice book. I wish I had time to go through it carefully. – mark leeds Sep 16 '19 at 15:12

score 2 · Answer 2 · edited Mar 25 '21 at 11:48

Python Version:

import matplotlib.pyplot as plt 
import numpy as np

def uniformize(x,nbins=1000):
    which = lambda lst:list(np.where(lst)[0])
gh = np.histogram(x,bins=nbins)

empirical_cumulative_distribution = np.cumsum(gh[0])/nbins

ans_x = x
for idx in range(len(x)):
    max_idx = max(which(gh[1]&lt;x[idx])+[0])
    ans_x[idx] = empirical_cumulative_distribution[max_idx]

return ans_x


if __name__ == '__main__':
    #number of bins to use
    numb = 1000

    # Distribution you want to transform
    dist_transform = np.random.normal(3,5,numb)

    # Plotting original distribution and CDF
    fig, (ax1,ax2,x3) = plt.subplots(3,1)
    n,bins,patches = ax1.hist(dist_transform,bins=numb)
    ax2.plot(bins[1:],np.cumsum(n)/numb)

    uniform_dist =  uniformize(dist_transform)   
    x3.hist(uniform_dist,bins = 100,alpha=0.5)

score 1 · Answer 3 · answered Nov 02 '23 at 23:13

Python version that ensures unique data points remain (most of the time) unique after "uniformization":

import numpy as np
import scipy
def uniformize(x, nbins=1000):
hist = np.histogram(x,bins=nbins)
cdf = np.cumsum(hist[0]) / hist[0].sum()
bins = (hist[1][:-1] + hist[1][1:]) / 2

f = scipy.interpolate.interp1d(
    bins, cdf, kind='quadratic', fill_value=&quot;extrapolate&quot;
)

return f(x)

```

How to transform data to uniform distribution (uniform percentiles)?

3 Answers3