Doing very little, quickly.

What did I do?

A few days ago, I wondered how long it took to launch the smallest program I could make in a few different languages. That was fun, but the results were very predictable.

However, when I had completed that - I also made a version that did just a tiny bit more. I made the programs print out ten million numbers, starting from 0. It tests a simple loop, converting the number to a string and prints to the screen. The results surprised me a little, so I’ve written them down here.

I don’t really have a deep analysis of this, just a set of fun results.

The code

I picked C, Go, Python and Scala as a small selection of different languages. C I predicted would be the fastest, I like Python so it had to be included, I’d just finished the Go tutorial, and I wanted to include a language that runs on the JVM.

In C:


int main()
    char buffer[7];
    for (int v = 0; v < 10000000; v++)
        sprintf(buffer, "%d", v);
        printf("%s\n", buffer);
    return 0;

update: It’s been pointed out to me that of course I don’t need to use both sprintf and prinf there. While that is a bit of a bizarre hickup - the overall results are still valid I think. Thank you Al and Pascal - I’ll have words with my brain later this evening.

In Python:

#!/usr/bin/env python
from __future__ import print_function

def main(iterations=10000000):
    for i in xrange(iterations):

if __name__ == '__main__':

In Go:

package main

import "fmt"

func main() {
    for i := 0; i < 10000000; i++ {

and finally, in Scala:

exec scala "$0" "$@"
object PerfTest extends App {
    0 to 10000000 map println


These aren’t all perfectly equivalent, but I think they’re close enough.

The measurement

I used perf stat -B <the code> > /dev/null to get a selection of measurements. I ran the code twice - once doing the full 10 million iterations, and once doing no iterations at all. I subtracted all the values in the second result set from the first. I hope this eliminates the startup overhead in the different languages.

Here’s what one result set looks like:

$ perf stat -B ./print_things > /dev/null

Performance counter stats for './print_things':

       710.502953 task-clock                #    0.999 CPUs utilized
              140 context-switches          #    0.197 K/sec
               24 cpu-migrations            #    0.034 K/sec
              315 page-faults               #    0.443 K/sec
    1,844,801,999 cycles                    #    2.596 GHz                     [50.11%]
  <not supported> stalled-cycles-frontend
  <not supported> stalled-cycles-backend
    1,835,887,563 instructions              #    1.00  insns per cycle         [75.25%]
      329,645,651 branches                  #  463.961 M/sec                   [75.25%]
        2,241,916 branch-misses             #    0.68% of all branches         [74.85%]

      0.711187355 seconds time elapsed

Of these values, I only looked at the cycles, instructions and time elapsed.

The results


448 cycles / loop.
877 instructions / loop.
1.62e-07 seconds / loop.


1222 cycles / loop.
2344 instructions / loop.
4.45e-07 seconds / loop.


1906 cycles / loop.
1831 instrutions / loop.
8.37e-07 seconds / loop.


8040 cycles / loop.
7697 instructions / loop.
3.26e-06 seconds / loop.

Pretty graphs of the above

Cycles, instructions, time per loop. Made with Vincent.

That C was the fastest isn’t a surprise - but that Python was faster than both Go and Scala certainly was to me. I’m also intrigued by the fact that there’s more instructions involved in the python code than the go code, but that each loop still finishes quicker. The extra instructions could possibly be namespace lookups, function call overhead or perhaps just opcode dispatch - I’m really not sure.

The deeper meaning

Possibly none at all, this is a really trivial set of code after all. However, it was interesting to me to see how much is involved in doing such simple things such as just looping and printing to the screen.

That Scala was so slow compared to the others was a bit of a surprise. On the other hand, both Go and Python did a lot better than I thought they would. Next would be taking a look at the actual code generated and run, but that’s for some other time.