DTracing a Ruby Application
11 February 2016

Though I’m far from an expert, I do enjoy programming in Ruby. Some time ago, I wrote a stupid little Sinatra app which generated agendas for pretentious, fictional DevOps meetups. The code is here, and you can see it’s extremely small. Because of its size, I thought it might make for a simple example of DTracing a Ruby application.

I’m interested in performance because, who isn’t? I can’t see much room for optimization in my own code (there’s only about thirty lines of it for a start) but as Ruby devs we tend to pull in gems like crazy, without much thought of the quality or efficiency of the code they contain.

As of 2.0, Ruby has proper DTrace support. If you build on a 21st century operating system, pardon me, an operating system which supports DTrace (Solaris, Illumos, OS X, FreeBSD) you get it by default. I’m doing this on SmartOS with Ruby 2.2.3 built as shown here.

As a first step, how about looking where the executing program spends most of its time?

The probes available are documented here, and it looks to me like method-entry is a good place to start. Here’s its signature.

ruby:::method-entry(classname, methodname, filename, lineno);

The two arguments are the class and method names, so how about running a script which counts calls to each class::method, and prints them out? I’m doing this on a dev zone and I’m going to run my D script, curl an HTTP request, and kill the script to get output. Crude, but good enough for this use-case.

ruby$1:::method-entry
{
  @time_arr[copyinstr(arg0)] = count();
}

The method-entry probe fires immediately before a method is entered.

The $1 in the provider name is required because DTrace dynamically creates a provider for every running Ruby process. That provider’s name is ruby<pid>, and DTrace will expand $1 to the first argument given to the script. Therefore, invoking the above script with the PID of the Ruby program we wish to trace will have the desired effect.

We dynamically create an associative array called time_arr, and add elements whose name is that of the class owning the called method. Prefixing the array name with an @ turns it into an aggregator, and in conjunction with the count() function, it will count up for us all the times a function in that class is called.

# dtrace -qs ruby\-example.d $(pgrep -f meetup) | tail -20 &; sleep 1; \
  curl http://meetup.localnet; kill -2 $!
  BasicObject                             1
  Logger                                  1
  Logger::Formatter                       1
  Logger::LogDevice                       1
  Rack::ContentLength                     1
  Rack::Head                              1
  Rack::Lint::ErrorWrapper                1
  Rack::Lint::InputWrapper                1
  Rack::Logger                            1
  Rack::Mime                              1
  Rack::Protection::HttpOrigin            1
  Rack::Protection::IPSpoofing            1
  Rack::Protection::XSSHeader             1
  Rack::ShowExceptions                    1
  Rack::TempfileReaper                    1
  Sinatra::CommonLogger                   1
  Sinatra::ShowExceptions                 1
  Sinatra::Wrapper                        1
  Thin::Server                            1
  URI                                     1
  URI::RFC2396_Parser                     1
  EventMachine::PeriodicTimer             2
  MonitorMixin                            2
  Rack::Chunked                           2
  Rack::MethodOverride                    2
  Rack::Protection::FrameOptions          2
  Rack::Protection::JsonCsrf              2
  Rack::Protection::PathTraversal         2
  Rack::Utils::KeySpaceConstrainedParams                            2
  Sinatra::ExtendedRack                   2
  EventMachine                            3
  Set                                     3
  Thin::Backends::Base                    3
  Tilt::Cache                             3
  Rack::CommonLogger                      4
  Rack::Response                          4
  Rack::Utils                             4
  Haml::Engine                            5
  Rack::Protection::Base                  5
  Tilt                                    5
  Tilt::HamlTemplate                      5
  EventMachine::Deferrable                6
  Sinatra::Helpers                        6
  Thin::Logging                           6
  Sinatra::Response                       7
  Sinatra::Templates                      7
  EventMachine::Connection                8
  Thin::Response                          8
  Tilt::Mapping                           9
  Object                                 10
  Thin::Headers                          11
  Rack::BodyProxy                        12
  Rack::Lint                             14
  Thin::Connection                       14
  Thin::Request                          15
  Sinatra::Application                   17
  Tilt::Template                         17
  Rack::Request                          25
  Rack::Utils::HeaderHash                34
  Haml::Helpers                          48
  Haml::Parser::ParseNode                57
  Haml::Buffer                           61
  Rack::Lint::Assertion                  84
  Sinatra::Base                          87
  Haml::Util                            215
  Haml::Compiler                        445
  Haml::Options                         593
  Haml::Parser::Line                    705
  Haml::Parser                         1085

Gosh, HAML is busy, isn’t it? Let’s change the D to record the method name as well as the parent class.

ruby$1:::method-entry
{
  @time_arr[copyinstr(arg0), copyinstr(arg1)] = count();
}
# dtrace -qs ruby\-example.d $(pgrep -f meetup) &; sleep 1; \
  curl http://meetup.localnet; kill -2 $
  Haml::Options             xhtml?                                  5
  Object                    generate_item                           5
  Object                    generate_job                            5
  Rack::Utils::HeaderHash   new                                     5
  Haml::Buffer              attributes                              6
  Haml::Buffer              html4?                                  6
  Haml::Buffer              html5?                                  6
  Haml::Buffer              html?                                   6
  Haml::Buffer              merge_attrs                             6
  Haml::Parser              parse_old_attributes                    6
  Haml::Parser              parse_static_hash                       6
  Rack::BodyProxy           respond_to?                             6
  Rack::Utils::HeaderHash   each                                    6
  Rack::Utils::HeaderHash   []=                                     7
  Thin::Headers             []=                                     7
  Haml::Options             defaults                                8
  Sinatra::Base             root                                    8
  Haml::Compiler            compile_plain                           9
  Haml::Parser              div                                     9
  Haml::Compiler            push_text                              10
  Haml::Util                handle_interpolation                   11
  Haml::Util                unescape_interpolation                 11
  Haml::Buffer              format_script_false_false_false_false_false_true_false               12
  Haml::Buffer              toplevel?                              12
  Haml::Compiler            compile_script                         12
  Haml::Compiler            push_script                            12
  Haml::Options             suppress_eval?                         12
  Haml::Parser              check_push_script_stack                12
  Haml::Parser              handle_ruby_multiline                  12
  Haml::Parser              is_ruby_multiline?                     12
  Haml::Parser              script                                 12
  Haml::Util                static_method_name                     12
  Rack::Utils::HeaderHash   []                                     12
  Haml::Compiler            filter_and_join                        14
  Haml::Parser              mid_block_keyword?                     15
  Sinatra::Application      app_file                               16
  Haml::Parser              balance                                17
  Haml::Util                balance                                17
  Haml::Compiler            concat_merged_text                     18
  Haml::Compiler            push_generated_script                  18
  Haml::Compiler            resolve_newlines                       18
  Haml::Parser              plain                                  20
  Haml::Helpers             html_escape                            24
  Haml::Helpers             preserve                               24
  Haml::Compiler            prerender_tag                          27
  Haml::Parser              block_keyword                          27
  Haml::Parser              close                                  27
  Haml::Compiler            build_attributes                       33
  Haml::Compiler            compile_tag                            33
  Haml::Parser              parse_class_and_id                     33
  Haml::Parser              parse_tag                              33
  Haml::Parser              tag                                    33
  Haml::Options             html4?                                 36
  Haml::Options             html?                                  36
  Sinatra::Base             settings                               36
  Haml::Options             html5?                                 37
  Haml::Util                contains_interpolation?                53
  Haml::Parser              process_indent                         56
  Haml::Parser              process_line                           56
  Haml::Parser              push                                   56
  Haml::Compiler            compile                                57
  Haml::Parser              handle_multiline                       57
  Haml::Parser              is_multiline?                          57
  Haml::Parser::ParseNode   initialize                             57
  Haml::Parser              next_line                              68
  Haml::Parser              raw_next_line                          68
  Haml::Options             format                                 74
  Haml::Compiler            push_merged_text                       82
  Rack::Lint::Assertion     assert                                 84
  Haml::Compiler            nuke_inner_whitespace?                 94
  Haml::Util                inspect_obj                           108
  Haml::Parser              flat?                                 180
  Haml::Parser              block_opened?                         208
  Haml::Options             []                                    380
  Haml::Parser::Line        tabs                                  705

We’re using the method_entry probe here, which means we are only looking at pure Ruby methods. Switching to the cmethod_entry probe shows you calls to core class methods written in C. Things like Kernel::is_a? or Hash::each.

But back to the output, and Look at all those HAML method calls. At this point, I had to get the big guns out. I pulled down rb_flowtime.d and made a minor modification, changing the ruby* providers to ruby$1, as there are a few Rubyish things running on this box that I don’t want to trace. I ended up with this:

#pragma D option quiet
#pragma D option switchrate=10
#pragma D option bufpolicy=fill
#pragma D option bufsize=1m

self int depth;

dtrace:::BEGIN
{
  printf("%3s %-16s %-16s %9s -- %s\n", "C", "TIME(us)", "FILE",
      "DELTA(us)", "CLASS::METHOD");
}

ruby$1:::method-entry,
ruby$1:::method-return
/self->last == 0/
{
  self->last = timestamp;
}

ruby$1:::method-entry
{
  this->delta = (timestamp - self->last) / 1000;
  printf("%3d %-16d %-16s %9d %*s-> %s::%s\n", cpu, timestamp / 1000,
      copyinstr(arg2), this->delta, self->depth * 2, "",
      copyinstr(arg0), copyinstr(arg1));
  self->depth++;
  self->last = timestamp;
}

ruby$1:::method-return
{
  this->delta = (timestamp - self->last) / 1000;
  self->depth -= self->depth > 0 ? 1 : 0;
  printf("%3d %-16d %-16s %9d %*s<- %s::%s\n", cpu, timestamp / 1000,
      basename(copyinstr(arg2)), this->delta, self->depth * 2, "",
      copyinstr(arg0), copyinstr(arg1));
  self->last = timestamp;
}

The comments on the script tell you to be wary of processes switching CPU. That’s surely of no concern at all in this exercise, but SmartOS has all Solaris’ lovely processor-set stuff, so we might as well use it. To bind the meetup-generator process to VCPU 2 is as easy as:

$ pbind -b 2 $(pgrep -f meetup)
process id 5636: was not bound, now 2

Now capture some output:

# ./rb_flowtime.d $(pgrep -f meetup) >out &; sleep 1; curl http://meetup.localnet; kill -2 $!
$ wc -l out
  987 out
$ sed -n 40,55p out
  2 54552225965   connection.rb      14  -> Thin::Connection::pre_process
  2 54552225980   connection.rb      14    -> Thin::Connection::remote_address
  2 54552225995   connection.rb      14      -> Thin::Connection::socket_address
  2 54552226010   connection.rb      15        -> EventMachine::Connection::get_peername
  2 54552226032   connection.rb      22        <- EventMachine::Connection::get_peername
  2 54552226063   connection.rb      31      <- Thin::Connection::socket_address
  2 54552226077   connection.rb      13    <- Thin::Connection::remote_address
  2 54552226091   request.rb         14    -> Thin::Request::remote_address=
  2 54552226107   request.rb         15    <- Thin::Request::remote_address=
  2 54552226125   request.rb         18    -> Thin::Request::async_callback=
  2 54552226145   request.rb         20    <- Thin::Request::async_callback=
  2 54552226161   base.rb            15    -> Thin::Backends::Base::ssl?
  2 54552226174   base.rb            13    <- Thin::Backends::Base::ssl?
  2 54552226195   content_length.rb  20    -> Rack::ContentLength::call
  2 54552226211   chunked.rb         15      -> Rack::Chunked::call
  2 54552226226   base.rb            14        -> Rack::CommonLogger::call

The first column is the VCPU the proces is on (showing the bind we just did, worked); the second is a timestamp; the third is the file containing the method being entered or exited; the fourth is the microsecond time delta between the current step and the previous one. Then we see an arrow denoting entry or return, and finally, the class and method.

I’m interested in the fourth and final columns. Let’s get awk to sum the time spent in each class.

# class_time.awk
{
  sub(/:.*/, "", $6);
  class_time[$6] += $4;
}
END {
  for (t in class_time)
    print class_time[t], t;
}
$ awk -f class_time.awk <out | sort -nr
4504 Haml
3721 Rack
2497 Sinatra
753 Thin
496 Object
472 Tilt
262 EventMachine
93 Logger
92 BasicObject
84 URI
63 MonitorMixin
0 CLASS

There you can see that almost as much is being spent rendering a very simple HTML page from a very simple template than is being spent by the whole Sinatra and Thin webserver framework.

I’ve heard that Slim is a faster templating system than HAML. I had a look at it, and I thought it looked nicer as well. A quick code change later, and re-running my DTrace and awk scripts tells me:

$ awk -f class_time.awk <out_slim | sort -nr
3759 Temple
3531 Rack
2458 Sinatra
729 Thin
465 Tilt
399 Object
327 Array
205 EventMachine
87 Logger
76 URI
59 MonitorMixin
0 CLASS

According to that one trace, it’s spending about 12% less time rendering. (Temple is the main dependency of Slim.)

As a token gesture to not just pulling numbers out of my arse, I ran several traces against both versions of the code, and got very similar results each time. Still, I’m not going to pretend this in any way resembles a meaningful benchmark.

But benchmarking wasn’t the point of the exercise. Hopefully I have succeeded in showing that DTracing Ruby is very easy, and pretty cool.

tags