Optimizing Muon Lifetime Data

Question

I performed an experiment which measured the lifetime of a muon. The apparatus was very archaic and susceptible to disturbances, which means that there would inevitably be some unreasonable data recorded. I have received two pieces of advice on how to deal with this, either of which I would be happy to implement if I knew how to.

The first method is to cut/trim data points either manually or by using some algorithm. My question for going this way would be how to determine which points to exclude? There could clearly be some outliers such as if there was a point recorded at 1s when the expectation is only 2.2$\mu$s, but this is more difficult to determine as you start to exclude points closer to the expectation.

The second method would be to add some kind of weight to the fit. Here is a figure for reference:

The weight would allow the fit to place more importance on the linear component of the figure, corresponding to what seems like values before 100. My question for this method would be how to determine what kind of weight to even use and how to implement it.

I am biased towards the first method so far, but would appreciate any advice there is.

Mark H · Accepted Answer · 2020-03-25T10:43:30.820

First, the safer and simpler option:

Fitting with weights

The usual way to fit a line to data is to use a least-squares fit. That is, if you want to fit the line $y = Ax + B$ to some set of data $\{x_i, y_i\}$, then you choose $A$ and $B$ to minimize $$\sum_i [y_i - (Ax_i + B)]^2$$

For exponential data, you can transform the expected exponential equation $y = Ae^{Bx}$ to a linear equation by taking the logarithm of both sides: $\log y = \log A + Bx$. One would think that you could use the same procedure above to find values for $A$ and $B$ by minimizing $$\sum_i [\log y_i - (\log A + Bx_i)]^2$$ However, because the logarithm magnifies differences between small values and shrinks differences between large values, the smallest, noisiest values end up dominating the fit. The solution to this is to weight the sums by the $y$ values. So, the sum minimize becomes: $$\sum_i y_i [\log y_i - (\log A + Bx_i)]^2$$ This way, the bins with the most hits (and thus the most data and the smallest relative errors) determines the fit.

The resulting values for $A$ and $B$ when you solve this minimization analytically can be found here.

Then again, you could just solve for $A$ and $B$ numerically with a computer program:

Make a guess at the initial values of $A$ and $B$.
Calculate $\sum_i [y_i - (Ae^{Bx})]^2.$
Make random small adjustments to $A$ and $B$.
Recompute the sum in step 2 with the new values.
If the sum is lower, keep the new values of $A$ and $B$. If not, go back to the previous values.
If the sum from step 2 is still decreasing, go to step 3.

See this answer for details and a rant against fitting to logs of data.

Excluding data points

The danger with manipulating data directly is that you may be tempted (consciously or not) to discard data points not because they are true outliers, but because doing so makes your data look better. You should come up with a rule for discarding data points before your run your experiment, apply the rule to the data without looking at it, and see what you get. It's fine to experiment with several different rules, but each rule must be finalized before the experiment is run and applied blindly to the data. You do not pick which data points are discarded, the rule does.

I don't know your experimental setup, but let's say your experiment involves detecting decays from a sample of a radioactive source to measure the half-life. Before measuring the sample, you could run your experiment without the sample to count the number of spurious signals in the detectors. These spurious signals could be from noise in the detectors, Earth's natural background radiation, cosmic rays, or other sources of radiation. Then, after you have collected data from the experiment with the radioactive sample, you can subtract the counts from the null run (possibly scaled by some factor if the null run and the real run were not the same length of time) to get what should be data that only came from your sample.

Optimizing Muon Lifetime Data

1 Answers1

Fitting with weights

Excluding data points