Though I’m far from an expert, I do enjoy programming in Ruby. Some time ago, I wrote a stupid little Sinatra app which generated agendas for pretentious, fictional DevOps meetups. The code is here, and you can see it’s extremely small. Because of its size, I thought it might make for a simple example of DTracing a Ruby application.
I’m interested in performance because, who isn’t? I can’t see much room for optimization in my own code (there’s only about thirty lines of it for a start) but as Ruby devs we tend to pull in gems like crazy, without much thought of the quality or efficiency of the code they contain.
As of 2.0, Ruby has proper DTrace support. If you build on a 21st century operating system, pardon me, an operating system which supports DTrace (Solaris, Illumos, OS X, FreeBSD) you get it by default. I’m doing this on SmartOS with Ruby 2.2.3 built as shown here.
As a first step, how about looking where the executing program spends most of its time?
The probes available are documented
here,
and it looks to me like method-entry
is a good place to start.
Here’s its signature.
ruby:::method-entry(classname, methodname, filename, lineno);
The two arguments are the class and method names, so how about
running a script which counts calls to each class::method
, and
prints them out? I’m doing this on a dev zone and I’m going to run
my D script, curl
an HTTP request, and kill the script to get
output. Crude, but good enough for this use-case.
ruby$1:::method-entry
{
@time_arr[copyinstr(arg0)] = count();
}
The method-entry
probe fires immediately before a method is
entered.
The $1
in the provider name is required because DTrace
dynamically creates a provider for every running Ruby process. That
provider’s name is ruby<pid>
, and DTrace will expand $1
to the
first argument given to the script. Therefore, invoking the above
script with the PID of the Ruby program we wish to trace will have
the desired effect.
We dynamically create an associative array called time_arr
, and
add elements whose name is that of the class owning the called
method. Prefixing the array name with an @
turns it into an
aggregator, and in conjunction with the count()
function, it will
count up for us all the times a function in that class is called.
# dtrace -qs ruby\-example.d $(pgrep -f meetup) | tail -20 &; sleep 1; \
curl http://meetup.localnet; kill -2 $!
BasicObject 1
Logger 1
Logger::Formatter 1
Logger::LogDevice 1
Rack::ContentLength 1
Rack::Head 1
Rack::Lint::ErrorWrapper 1
Rack::Lint::InputWrapper 1
Rack::Logger 1
Rack::Mime 1
Rack::Protection::HttpOrigin 1
Rack::Protection::IPSpoofing 1
Rack::Protection::XSSHeader 1
Rack::ShowExceptions 1
Rack::TempfileReaper 1
Sinatra::CommonLogger 1
Sinatra::ShowExceptions 1
Sinatra::Wrapper 1
Thin::Server 1
URI 1
URI::RFC2396_Parser 1
EventMachine::PeriodicTimer 2
MonitorMixin 2
Rack::Chunked 2
Rack::MethodOverride 2
Rack::Protection::FrameOptions 2
Rack::Protection::JsonCsrf 2
Rack::Protection::PathTraversal 2
Rack::Utils::KeySpaceConstrainedParams 2
Sinatra::ExtendedRack 2
EventMachine 3
Set 3
Thin::Backends::Base 3
Tilt::Cache 3
Rack::CommonLogger 4
Rack::Response 4
Rack::Utils 4
Haml::Engine 5
Rack::Protection::Base 5
Tilt 5
Tilt::HamlTemplate 5
EventMachine::Deferrable 6
Sinatra::Helpers 6
Thin::Logging 6
Sinatra::Response 7
Sinatra::Templates 7
EventMachine::Connection 8
Thin::Response 8
Tilt::Mapping 9
Object 10
Thin::Headers 11
Rack::BodyProxy 12
Rack::Lint 14
Thin::Connection 14
Thin::Request 15
Sinatra::Application 17
Tilt::Template 17
Rack::Request 25
Rack::Utils::HeaderHash 34
Haml::Helpers 48
Haml::Parser::ParseNode 57
Haml::Buffer 61
Rack::Lint::Assertion 84
Sinatra::Base 87
Haml::Util 215
Haml::Compiler 445
Haml::Options 593
Haml::Parser::Line 705
Haml::Parser 1085
Gosh, HAML is busy, isn’t it? Let’s change the D to record the method name as well as the parent class.
ruby$1:::method-entry
{
@time_arr[copyinstr(arg0), copyinstr(arg1)] = count();
}
# dtrace -qs ruby\-example.d $(pgrep -f meetup) &; sleep 1; \
curl http://meetup.localnet; kill -2 $
Haml::Options xhtml? 5
Object generate_item 5
Object generate_job 5
Rack::Utils::HeaderHash new 5
Haml::Buffer attributes 6
Haml::Buffer html4? 6
Haml::Buffer html5? 6
Haml::Buffer html? 6
Haml::Buffer merge_attrs 6
Haml::Parser parse_old_attributes 6
Haml::Parser parse_static_hash 6
Rack::BodyProxy respond_to? 6
Rack::Utils::HeaderHash each 6
Rack::Utils::HeaderHash []= 7
Thin::Headers []= 7
Haml::Options defaults 8
Sinatra::Base root 8
Haml::Compiler compile_plain 9
Haml::Parser div 9
Haml::Compiler push_text 10
Haml::Util handle_interpolation 11
Haml::Util unescape_interpolation 11
Haml::Buffer format_script_false_false_false_false_false_true_false 12
Haml::Buffer toplevel? 12
Haml::Compiler compile_script 12
Haml::Compiler push_script 12
Haml::Options suppress_eval? 12
Haml::Parser check_push_script_stack 12
Haml::Parser handle_ruby_multiline 12
Haml::Parser is_ruby_multiline? 12
Haml::Parser script 12
Haml::Util static_method_name 12
Rack::Utils::HeaderHash [] 12
Haml::Compiler filter_and_join 14
Haml::Parser mid_block_keyword? 15
Sinatra::Application app_file 16
Haml::Parser balance 17
Haml::Util balance 17
Haml::Compiler concat_merged_text 18
Haml::Compiler push_generated_script 18
Haml::Compiler resolve_newlines 18
Haml::Parser plain 20
Haml::Helpers html_escape 24
Haml::Helpers preserve 24
Haml::Compiler prerender_tag 27
Haml::Parser block_keyword 27
Haml::Parser close 27
Haml::Compiler build_attributes 33
Haml::Compiler compile_tag 33
Haml::Parser parse_class_and_id 33
Haml::Parser parse_tag 33
Haml::Parser tag 33
Haml::Options html4? 36
Haml::Options html? 36
Sinatra::Base settings 36
Haml::Options html5? 37
Haml::Util contains_interpolation? 53
Haml::Parser process_indent 56
Haml::Parser process_line 56
Haml::Parser push 56
Haml::Compiler compile 57
Haml::Parser handle_multiline 57
Haml::Parser is_multiline? 57
Haml::Parser::ParseNode initialize 57
Haml::Parser next_line 68
Haml::Parser raw_next_line 68
Haml::Options format 74
Haml::Compiler push_merged_text 82
Rack::Lint::Assertion assert 84
Haml::Compiler nuke_inner_whitespace? 94
Haml::Util inspect_obj 108
Haml::Parser flat? 180
Haml::Parser block_opened? 208
Haml::Options [] 380
Haml::Parser::Line tabs 705
We’re using the method_entry
probe here, which means we are only
looking at pure Ruby methods. Switching to the cmethod_entry
probe
shows you calls to core class methods written in C. Things like
Kernel::is_a?
or Hash::each
.
But back to the output, and Look at all those HAML method calls. At
this point, I had to get the big guns out. I pulled down
rb_flowtime.d
and made a minor modification, changing the ruby*
providers to
ruby$1
, as there are a few Rubyish things running on this box that
I don’t want to trace. I ended up with this:
#pragma D option quiet
#pragma D option switchrate=10
#pragma D option bufpolicy=fill
#pragma D option bufsize=1m
self int depth;
dtrace:::BEGIN
{
printf("%3s %-16s %-16s %9s -- %s\n", "C", "TIME(us)", "FILE",
"DELTA(us)", "CLASS::METHOD");
}
ruby$1:::method-entry,
ruby$1:::method-return
/self->last == 0/
{
self->last = timestamp;
}
ruby$1:::method-entry
{
this->delta = (timestamp - self->last) / 1000;
printf("%3d %-16d %-16s %9d %*s-> %s::%s\n", cpu, timestamp / 1000,
copyinstr(arg2), this->delta, self->depth * 2, "",
copyinstr(arg0), copyinstr(arg1));
self->depth++;
self->last = timestamp;
}
ruby$1:::method-return
{
this->delta = (timestamp - self->last) / 1000;
self->depth -= self->depth > 0 ? 1 : 0;
printf("%3d %-16d %-16s %9d %*s<- %s::%s\n", cpu, timestamp / 1000,
basename(copyinstr(arg2)), this->delta, self->depth * 2, "",
copyinstr(arg0), copyinstr(arg1));
self->last = timestamp;
}
The comments on the script tell you to be wary of processes switching CPU. That’s surely of no concern at all in this exercise, but SmartOS has all Solaris’ lovely processor-set stuff, so we might as well use it. To bind the meetup-generator process to VCPU 2 is as easy as:
$ pbind -b 2 $(pgrep -f meetup)
process id 5636: was not bound, now 2
Now capture some output:
# ./rb_flowtime.d $(pgrep -f meetup) >out &; sleep 1; curl http://meetup.localnet; kill -2 $!
$ wc -l out
987 out
$ sed -n 40,55p out
2 54552225965 connection.rb 14 -> Thin::Connection::pre_process
2 54552225980 connection.rb 14 -> Thin::Connection::remote_address
2 54552225995 connection.rb 14 -> Thin::Connection::socket_address
2 54552226010 connection.rb 15 -> EventMachine::Connection::get_peername
2 54552226032 connection.rb 22 <- EventMachine::Connection::get_peername
2 54552226063 connection.rb 31 <- Thin::Connection::socket_address
2 54552226077 connection.rb 13 <- Thin::Connection::remote_address
2 54552226091 request.rb 14 -> Thin::Request::remote_address=
2 54552226107 request.rb 15 <- Thin::Request::remote_address=
2 54552226125 request.rb 18 -> Thin::Request::async_callback=
2 54552226145 request.rb 20 <- Thin::Request::async_callback=
2 54552226161 base.rb 15 -> Thin::Backends::Base::ssl?
2 54552226174 base.rb 13 <- Thin::Backends::Base::ssl?
2 54552226195 content_length.rb 20 -> Rack::ContentLength::call
2 54552226211 chunked.rb 15 -> Rack::Chunked::call
2 54552226226 base.rb 14 -> Rack::CommonLogger::call
The first column is the VCPU the proces is on (showing the bind we just did, worked); the second is a timestamp; the third is the file containing the method being entered or exited; the fourth is the microsecond time delta between the current step and the previous one. Then we see an arrow denoting entry or return, and finally, the class and method.
I’m interested in the fourth and final columns. Let’s get awk
to
sum the time spent in each class.
# class_time.awk
{
sub(/:.*/, "", $6);
class_time[$6] += $4;
}
END {
for (t in class_time)
print class_time[t], t;
}
$ awk -f class_time.awk <out | sort -nr
4504 Haml
3721 Rack
2497 Sinatra
753 Thin
496 Object
472 Tilt
262 EventMachine
93 Logger
92 BasicObject
84 URI
63 MonitorMixin
0 CLASS
There you can see that almost as much is being spent rendering a very simple HTML page from a very simple template than is being spent by the whole Sinatra and Thin webserver framework.
I’ve heard that Slim is a
faster templating system than HAML. I had a look at it, and I
thought it looked nicer as well. A quick code change later, and
re-running my DTrace and awk
scripts tells me:
$ awk -f class_time.awk <out_slim | sort -nr
3759 Temple
3531 Rack
2458 Sinatra
729 Thin
465 Tilt
399 Object
327 Array
205 EventMachine
87 Logger
76 URI
59 MonitorMixin
0 CLASS
According to that one trace, it’s spending about 12% less time rendering. (Temple is the main dependency of Slim.)
As a token gesture to not just pulling numbers out of my arse, I ran several traces against both versions of the code, and got very similar results each time. Still, I’m not going to pretend this in any way resembles a meaningful benchmark.
But benchmarking wasn’t the point of the exercise. Hopefully I have succeeded in showing that DTracing Ruby is very easy, and pretty cool.