Saturday, November 3, 2007

RubyConf 2007: Profiling and Tuning Ruby 1.8

Ed Borasky

Slides

The project name on RubyForge is Cougar.

Things you will not hear Ed say:
  • Premature optimization is the root of all evil
  • Dynamic languages are inherently slow
Is Ruby 1.8 slow?

Benchmark and collect the results. Settle on a common unit. Take the geometric mean (n-th root of (A1 * A2 * ... * An)) of the numbers. Alioth Shootout Results.

Hemibel Thinking

The art of only caring about ratios greater than a hemibel (half of a bel (magnitude of an order of 10) = sqrt(10) ~= 3). Modifying the Alioth Shootout Results to deal with hemibels, shows that in practical terms, perl, yarv, python and php are about the same speed. "Ruby is sort of slow."

But that's an oversimplified answer. Is there a "geometric standard deviation"? Yes, but box and whisker plots (boxplot) are more interesting. Yarv shows very little variation, followed by python and ruby. Although, yarv has some outliers.

Ok ... it's sort of slow ... now what?
  • Throw hardware at it?
  • Wait for 1.9?
  • JRuby?
  • IronRuby?
  • Rubinius?
  • Cardinal/Parrot?
  • Tune Ruby 1.8?
Let's tune it
  • Collect a benchmark suite
  • Profile the Ruby 1.8 interpreter on the benchmark suite
  • Identify the bottlenecks
  • Remove them
Ruby 1.9 has a benchmark suite included that will let you compare yarv with your installed ruby. There is also pet store from thoughtworks. The JRuby team is using it. gprof is one of the earliest profiling tools. The output of all the runs on RubyForge site: cougar/ProfilingAndTuningRuby/benchmarks_profiles. Surpisingly, most of the time executing Ruby is spent in the interpreter and not the actual work being done. Ed's added benchmark test of doing Matrix operations in rational arithmetic, rather than in floating point. The function that returns the remainder of bignum division was nowhere near the top of the list (sorted by relative time spent in function).

Using gcc/gcov, he does some lower level analysis on the function where most of the time is spent. There is a switch statement that has 86 branches. The top six or seven branches cover 77% of the time spent in this function. The interesting thing is that these are not the first six or seven branches in the switch statement (they are actually near the end).

Acovea is a genetic optimizer over GCC compiler flags. Although, it is unlikely to do better than "march=athlon64 -O3".

This is the second time I've been in an afternoon session where I've heard applause coming from the other room, making me wonder what I am missing. The other time was while I was in the Ropes talk.

He delves into some more advanced techniques.

This was an interesting talk from an academic perspective. This talk was more for the ruby core team.

Have you tried profiling it with valgrind and cachegrind? No. He mentions code analyst.

No comments: