5

I know that I can use PGFPlots to calculate a linear regression with the create col/linear regression command.

I have a set of data that is curved for the first few points, but linear for the remainder of the data set. I want to find a linear regression but ignore the first n points from my data file in my regression. Is this possible with PGFPlots?

A contrived example of this data would be:

x     y
0     0
1     15
2     25
3     28
4     30
5     31
6     32
7     33
8     34
9     35
10    36

where the curve is linear starting at x = 4. A plot of this looks like:

Sample plot

One (not great) solution I can think of is to make a new data set without the data points and run my regression on this new set.

2 Answers2

7

The first lines of the table can be ignored by option skip first n. The first plot of the following example shows the calculated regression line for the calculated area between point 5 and the last point. The second plot draws the line over the full range using the calculated parameters of the regression line.

\begin{filecontents*}{\jobname-plot.dat}
x     y
0     0
1     15
2     25
3     28
4     30
5     31
6     32
7     33
8     34
9     35
10    36
\end{filecontents*}

\documentclass{article}
\usepackage{pgfplots}
\pgfplotsset{compat=1.12}
\usepackage{pgfplotstable}

\begin{document}
  \begin{tikzpicture}
    \begin{axis}
      \addplot[only marks, mark=*, blue]
        table {\jobname-plot.dat};
      \addplot[]
        table[header=false,skip first n=5,
          y={create col/linear regression},
        ] {\jobname-plot.dat};
    \end{axis}
  \end{tikzpicture}

  \begin{tikzpicture}
    \begin{axis}
      \addplot[only marks, mark=*, blue]
        table {\jobname-plot.dat};
      \addplot[draw=none]
        table[
          header=false,
          skip first n=5,
          y={create col/linear regression},
        ] {\jobname-plot.dat};  
      \addplot[domain=0:10, red]
        {\pgfplotstableregressiona*x + \pgfplotstableregressionb};
    \end{axis}
  \end{tikzpicture}
\end{document}

Result

Heiko Oberdiek
  • 271,626
1

Here is a solution using R and knitr embedded inside of a LaTeX file. This assumes you have R installed and understand how to run the knitr package. http://yihui.name/knitr/

\begin{filecontents}{data.txt}
x    y
0     0
1    15
2    25
3    28
4    30
5    31
6    32
7    33
8    34
9    35
10   36
\end{filecontents}
\documentclass[10pt,letterpaper]{article}

\begin{document}
<<echo=FALSE,out.width="3in">>=
dd<-read.table("data.txt",skip=4,header=TRUE)
with(dd,plot(x,y,pch=16,main="Regression line with all the data points"))
model<-with(dd,lm(y~x))
abline(model)
x1<-with(dd,x[3:length(x)])
y1<-with(dd,y[3:length(x)])
newdd<-data.frame(x1,y1)
with(newdd,plot(x1,y1,pch=16,main="Regression line with first two points removed"))
model1<-with(newdd,lm(y1~x1))
abline(model1)
@
\end{document}

enter image description here