So i have a scatter diagram and generated a regression line. There are some outliner that heavily influence that line. I would like them to be ignored for the calculation of the linear regression. It should be done based on their Y values.
There is already a similar question. But the answer there is to skip the first few X values. Unfortunately that is not what I need.
So that is my current code:
\documentclass{article}
\usepackage{pgfplots}
\pgfplotsset{compat=1.18}
\usepackage{pgfplotstable}
\begin{document}
\begin{tikzpicture}
\begin{axis}[scatter/classes={a={mark=*,draw=black}}]
\pgfplotstableread{
a b
0 0.5
1 48
2 1.4
3 37
4 3.4
5 6.8
6 4.5
7 3.9
8 10
9 13
}\datatable
\addplot[scatter, only marks, scatter src=explicit symbolic]
table[
x=a,
y=b,
] {\datatable};
\addplot[
thick,
%% y filter/.expression={y<35 ? y : nan},
]
table [
x = a,
%% y expr = {(\thisrow{b} > 35 ? nan : \thisrow{b} )},
y = {create col/linear regression={y=b}},
] {\datatable};
\end{axis}
\end{tikzpicture}
\end{document}
That code generates into this:
I already tried the y filter and y expr, but that doesn't really work.
I also thought about split the outliner into a several file. But as my real graph has 4 regression lines and I would end up having ~8 files. That doesn't seem practical for me.
So my question: How can I ignore Y values over 30 for the calculation of the linear regression?

