6

I'm calculation the % difference in completion rate (before vs after) of Product A (0.3%), Product B (16.7%) and the combination of Product A and B (17.0%). I'm unable to explain why the combination of Product A and B doesn't have a % difference between the one of Product A and B. (Attached the image with specific calculation)

Can someone can explain why the % difference of combination of Product A and B isn't in between 0.3% and 16.7%, as I would expect?

Image Calculation

JMP
  • 21,771
  • I'm not using straight average of percentages as there's no even weight on each percentage as the sample/population size of product users in each percentage is not the same. The calculation is correct, however, I'm unable to explain myself why the combination % difference is higher than both of the product % differences – QueryStack77 Feb 01 '23 at 02:02
  • 2
    Looks similar or related to Simpson's paradox, which the combined trend may even be reversed (e.g. decrease in % completion). – peterwhy Feb 01 '23 at 03:38
  • The way to approach something like this when the match’s are within your reach: formulate what you expect as a general theorem, try to prove it, see why you fail. – Carsten S Feb 01 '23 at 10:11

5 Answers5

3

You need to account for the weightages of (apparently) two batches in each

When I do that, the weighted differences come to $\approx 49\% \;and\; 21.25\%$

and on aggregating, $\approx 25.21\%$ which is in between the two

3

Here's a simple reason why the overall percentage difference being between the two is too much to hope for. Suppose, for the moment, we only know all the percentages for the top four groups and let's see what we can say about the combined percentages.

The overall before value is a weighted average of the before values for product A (48.9%) and product B (20%). It will therefore be somewhere between 20% and 48.9%. However, the weighting depends on the relative amounts of product A and product B in the before column. For different values of these amounts, it could be anywhere between 20% and 48.9%.

Similarly, the overall after value could be anywhere between 23.3% and 49.1%. Where, exactly, depends on the relative amounts of product A and product B in the after column - and these proportions are not the same as they are for the before column, or even similar.

So just looking at the individual percentages, the overall %difference could be, at maximum the difference from 20% to 49.1%, i.e. 145.5%, and at minimum the difference from 48.9% to 23.3%, i.e. -52.4%.

Note that even though both individual differences are positive, it is possible for the overall difference to be negative. This is an example of Simpson's paradox.

2

First of all I show how the values have been calculated. Let $x^A_{ct}, x^B_{ct}$ the amounts of the products $A$ and $B$. The subscript $c$ is the index for the number of column 1 or 2. The subscript $t$ is the index for the time. It is $1$, if it is the amount at the start and it is $2$, if it is the completed amount. Then the relative change of product A is

$r^A=\Large{\frac{\frac{x^A_{22}}{x^A_{21}}-\frac{x^A_{12}}{x^A_{11}}}{\frac{x^A_{12}}{x^A_{11}}}}=\frac{\frac{39,400}{80,300}-\frac{46,000}{94,000}}{\frac{46,000}{94,000}}=\normalsize{0.00265...\approx 0.3\%} $ (rounded to one decimal place)

And the relative change of product B is

$r^B=\Large{\frac{\frac{x^B_{22}}{x^B_{21}}-\frac{x^B_{12}}{x^B_{11}}}{\frac{x^B_{12}}{x^B_{11}}}}=\frac{\frac{70,000}{300,000}-\frac{100,000}{500,000}}{\frac{100,000}{500,000}}=\normalsize{0.1666...\approx 16.7\%} $ (rounded to one decimal place)

And the overall relative change is

$r^{A+B}=\Large{\frac{\frac{x^A_{22}+x^B_{22}}{x^A_{21}+x^B_{21}}-\frac{x^A_{12}+x^B_{12}}{x^A_{11}+x^B_{11}}}{\frac{x^A_{12}+x^B_{12}}{x^A_{11}+x^B_{11}}}}=\frac{\frac{39,400+70,000}{80,300+300,000}-\frac{46,000+100,000}{94,000+500,000}}{\frac{46,000+100,000}{94,000+500,000}}=\normalsize{0.17037...\approx 17.0\%} $

(rounded to one decimal place)

So the overall index calculated in a way the relatives changes of the sum of product A and product B and not something like the average change. Therefore it is not surprisingly that it is not between $r^A$ and $r^B$.

It is worth to mention that $r^A$ and $r^B$ do not sum up to $r^{A+B}$ if we look at the exact numbers, but they do it approximately.

callculus42
  • 30,550
2

enter image description here

  1. The simple arithmetic mean of the Before-After percentage change of Product A and of Product B is $8.5\%.$

  2. The weighted arithmetic mean of the Before-After percentage change of Product A and of Product B, with Product A's and Product B's total starting amounts as the respective weights, is $13.7\%.$

  3. Finally, their overall Before-After percentage change, taking into account the relative weights of Product A's and Product B's 'Before' starting figures, as well as the relative weights of Product A's and Product B's 'After' starting figures, is $17.0\%.$

    (With reference to the breakdown in callculus42's answer: $\frac{4+2}{5+3}$ is the weighted average of $\frac45$ and $\frac23$.)

    This is neither an average, nor the sum, of the Before-After percentage change of Product A and of Product B.

ryang
  • 38,879
  • 14
  • 81
  • 179
2

This is a structure-effect. For Product B, %Completed is low (Before and After).

The weight of Product B(compared to product A) is decreasing (500000 Before, and only 300000 after). So the total percentage reflects something betwen A and B in first period ( with a big weight for B) and it reflects something different in 2nd period (lowest weight for B)

We decrease the weight for B, we decrease the weight for the product with low %Completed, so %Completed increase.

Play with this set of data, you will be surprised :

Product A

Started 94000 80300

Completed 47000 40150

Product B

Started 500000 300000

Completed 100000 60000

Difference is $0$ everywhere, in all detail-rows, but it is not $0$ for total.

Lourrran
  • 1,059