First, the safer and simpler option:
Fitting with weights
The usual way to fit a line to data is to use a least-squares fit. That is, if you want to fit the line $y = Ax + B$ to some set of data $\{x_i, y_i\}$, then you choose $A$ and $B$ to minimize
$$\sum_i [y_i - (Ax_i + B)]^2$$
For exponential data, you can transform the expected exponential equation $y = Ae^{Bx}$ to a linear equation by taking the logarithm of both sides: $\log y = \log A + Bx$. One would think that you could use the same procedure above to find values for $A$ and $B$ by minimizing
$$\sum_i [\log y_i - (\log A + Bx_i)]^2$$
However, because the logarithm magnifies differences between small values and shrinks differences between large values, the smallest, noisiest values end up dominating the fit. The solution to this is to weight the sums by the $y$ values. So, the sum minimize becomes:
$$\sum_i y_i [\log y_i - (\log A + Bx_i)]^2$$
This way, the bins with the most hits (and thus the most data and the smallest relative errors) determines the fit.
The resulting values for $A$ and $B$ when you solve this minimization analytically can be found here.
Then again, you could just solve for $A$ and $B$ numerically with a computer program:
- Make a guess at the initial values of $A$ and $B$.
- Calculate $\sum_i [y_i - (Ae^{Bx})]^2.$
- Make random small adjustments to $A$ and $B$.
- Recompute the sum in step 2 with the new values.
- If the sum is lower, keep the new values of $A$ and $B$. If not, go back to the previous values.
- If the sum from step 2 is still decreasing, go to step 3.
See this answer for details and a rant against fitting to logs of data.
Excluding data points
The danger with manipulating data directly is that you may be tempted (consciously or not) to discard data points not because they are true outliers, but because doing so makes your data look better. You should come up with a rule for discarding data points before your run your experiment, apply the rule to the data without looking at it, and see what you get. It's fine to experiment with several different rules, but each rule must be finalized before the experiment is run and applied blindly to the data. You do not pick which data points are discarded, the rule does.
I don't know your experimental setup, but let's say your experiment involves detecting decays from a sample of a radioactive source to measure the half-life. Before measuring the sample, you could run your experiment without the sample to count the number of spurious signals in the detectors. These spurious signals could be from noise in the detectors, Earth's natural background radiation, cosmic rays, or other sources of radiation. Then, after you have collected data from the experiment with the radioactive sample, you can subtract the counts from the null run (possibly scaled by some factor if the null run and the real run were not the same length of time) to get what should be data that only came from your sample.