Why is this loop so much faster under Windows?

Question

In a recent question, Yiannis Lazarides was asking about benchmarking loops. Strangely his results were about 800 times faster than mine. I couldn't quite accept that my computer is that outdated (as I only bought it a couple of months ago). The main difference seemed to be that I was compiling the file on Linux while he was compiling it on Windows. So I installed MikTeX on my Windows partition to try it myself and the result has quite surprised me.

This is a slightly simplified version of Yiannis' code (to be run with pdflatex)

\documentclass{article}
\usepackage{ifthen}

\def\startTimer{\pdfresettimer}
\def\stopTimer{\the\pdfelapsedtime\,scaled seconds}

\begin{document}
\startTimer
\newcounter{acount}
\whiledo{\value{acount}<888}{%
\stepcounter{acount}% 
 \theacount, }
\stopTimer
\end{document}

On Linux (Ubuntu 10.10 64bit, TeX Live 2010 (both 32bit and 64bit)) the timer reports about 1700 scaled seconds. On Windows (7, 64bit, MikTeX 2.9 (32bit?)) the result is 2 scaled seconds.

I am pretty sure that a factor of 850 is not solely attributable to the operating system (both tests are run on the same machine). Also, compiling a “real life” document seems to take about the same time on both operating systems. Where does the difference come from?

for such a short document, most of the time is spent for disk access or other uncontrollable things, so the result is worthless. First, increase the document size (i.e. the number 888) by a factor of 100 or so. Then run the test a hundred times and calculate the mean, standard deviation, minimum and maximum. Only then you have something comparable. — Philipp, Jan 17 '11 at 06:53
That only shows the superiority of windows. :) I get 4581 scaled seconds on my two year old mac notebook. — topskip, Jan 17 '11 at 07:00
I agree with @Philipp: the timing is going to be dominated by things such as whether everything can be fitted in memory, OS swap performance, and disk R/W speed. Running a series of tests and look for discontinuities in the graph: this should tell you where various performance thresholds kick in. — Charles Stewart, Jan 17 '11 at 07:46
It is probably even worse than that: at this amount of running time, the results are likely to be dominated by the implementation differences between the windows and linux system function calls that are used by pdftex. Pdftex does not run the same code on both platforms, and the function implementation that is used on Windows (ftime) is infamous for being unreliable at small intervals. — Taco Hoekwater, Jan 17 '11 at 09:10
Maybe this helps: On my Linux machine I get 0 (in words: zero) scaled seconds. — Hendrik Vogt, Jan 17 '11 at 15:33
I should have added that I know this isn't a good benchmark. I was more interested in where this (reproducible) discrepancy comes from, as it might affect profiling real life code. — Caramdir, Jan 17 '11 at 17:13
@Philipp, @Charles: This example is so small, that everything should be loaded in memory (and probably in the CPU cache) before the timer is started. Anyway, on my computer, Linux has the faster hd. — Caramdir, Jan 17 '11 at 17:15
@Taco: This (the ftime inaccuracy) is the sort of answer I was looking for. Additional testing shows that there is a huge jump in time at a certain (fixed!) amount of iterations, presumably when there is enough code executed between the two calls of ftime. Could you please add that to your answer. — Caramdir, Jan 17 '11 at 17:19

Taco Hoekwater · Accepted Answer · 2011-01-18T08:16:27.697

At this (very small) amount of running time, the results are likely to be dominated by the implementation differences between the windows and linux system function calls that are used by pdftex. Pdftex does not run the same code on both platforms, and the function implementation that is used on Windows (ftime) is infamous for being unreliable at small intervals.

Besides that, also bear in mind that \pdfelapsedtime measures wall clock time differences, not the actual computer usage of pdftex. Lots of other stuff could (and probably will) be going on on the computer at the same time. Intervening task switches of the operating system itself can have a pretty large effect on the results.

As I wrote in that other thread already: you should not trust any benchmarking values that are less than a whole second (65536 scaled seconds). Also make sure you run the code multiple time on a machine that does as little else as possible.

I think one shouldn't trust the results on Windows at all. (On the other hand on Linux I get pretty good linear scaling of the run time from 20 to 100000 iterations, so that this method of profiling seems to be at least somewhat realiable.) — Caramdir, Jan 17 '11 at 17:26

Philipp · Answer 2 · 2011-01-17T10:44:54.597

Responding to the comments: Here is a test that is a bit more realistic.

TeX file:

\documentclass{article}
\usepackage{ifthen}

\newwrite\BenchmarkStream
\def\startTimer{%
  \pdfresettimer
  \immediate\openout\BenchmarkStream=\jobname.dat
}
\def\stopTimer{%
  \immediate\write\BenchmarkStream{\number\pdfelapsedtime}%
  \immediate\closeout\BenchmarkStream
}

\begin{document}
\startTimer
\newcounter{acount}
\whiledo{\value{acount}<28888}{%
\stepcounter{acount}% 
 \theacount, }
\stopTimer
\end{document}

Python script:

#!/usr/bin/env python2.6

from __future__ import unicode_literals, division, print_function

import subprocess
import numpy

def run():
    args = ["pdflatex", "--interaction=batchmode", "test.tex"]
    subprocess.check_call(args)
    with open("test.dat", "rt") as stream:
        return int(stream.read()) / 0x10000

def main():
    # warm up
    for i in xrange(10):
        run()
    count = 50
    result = numpy.empty(count)
    for i in xrange(count):
        print("Iteration", i)
        result[i] = run()
    imin = result.argmin()
    imax = result.argmax()
    print("Count:", count)
    print("Minimum:", result[imin], "at", imin)
    print("Maximum:", result[imax], "at", imax)
    print("Mean:", result.mean())
    print("Median:", numpy.median(result))
    print("Standard deviation:", result.std())

if __name__ == "__main__":
    main()

Results on my Linux system are:

Minimum: 1.05038452148 at 0
Maximum: 1.20942687988 at 30
Mean: 1.07401489258
Median: 1.05834197998
Standard deviation: 0.039375040446

Results on my OSX 10.6: Minimum: 1.4040222168 at 48 Maximum: 1.85803222656 at 4 Mean: 1.47822814941 Median: 1.4388885498 Standard deviation: 0.0861917498638 — Juan A. Navarro, Jan 17 '11 at 13:44

Why is this loop so much faster under Windows?

2 Answers2