— modern ops stuff —
DTracing Slow Puppet Runs
06 February 2016 // SmartOS

The site I’m working on builds everything on Ubuntu, with Puppet. Lately a lot of people have been complaining (they’re ops people, what else do you expect them to do?) that our Puppet runs are too slow. And they’re right. They’re incredibly, painfully slow. But why? What’s going on? We don’t do an awful lot to the boxes. We use a masterless setup, so the catalog has to be compiled every time. but my Wavefront Puppet reporter shows me that by far the bulk of the time being taken is on the actual run, not the compilation.

Puppet spends most of its time exec()ing external programs, getting them to do the real work. So I thought it might be interesting to see which externals were being called, and how much time was spent doing what.

I could have used sysdig, but I’m far more comfortable with DTrace, and I already do the vast majority of my dev work in LX branded zones on SmartOS. For those who don’t know, SmartOS is a Solaris-derived platform which can already run safe, secure containers, virtualize the network stack, use a parallel init system with restarters, allows safe dynamic tracing, and has a filesystem/volume manager not rooted in the early 1980s. Normally you get the SmartOS user environment, which is based on Solaris but incorporates most of the modern GNU toolchain. However, if you wish, you can run Linux (LX) branded zones, which are containers with the ability to map the Linux ABI to the SunOS one: this means most Linux stuff works perfectly and seamlessly. You also get /native, which lets you run proper, grown-up things like dtrace(1) and the proper versions of the ptools.

A DTrace Srcipt

I wanted this process to be completely external to the normal bootstrapping and Puppetizing process. That is, I didn’t want to have to change anything, or interact with the bootstrapping zone at all.

Though you can run DTrace in the local zone, I chose to run it from the global (hypervisor) zone. This meant I was looking in from the outside, not interfering at all with the standard bootstrapping and Puppet processes.

I played with a couple of ways of doing this. In the end I decided to grab the contract ID of the Puppet process when it started, and trace processes only with that same contract ID.

Contracts are groups of related processes. They’re used primarily by SMF (think systemd, but finished) as a way of knowing when a process has died and guaranteeing it is restarted. I’m a little murky on exactly how they work in LX zones, but poking around leads me to think that my bootstrapper and all its children will run under their own unique contract.

So, I wrote the following DTrace script:

#pragma D option bufpolicy=fill
#pragma D option bufsize=1m

BEGIN {
  contract = -1;
}

proc:::exec-success
/execname == "puppet-apply.sh" && contract == -1/
{
  puppet_contract = curpsinfo->pr_contract;
}

proc:::exec-success
/curpsinfo->pr_contract == puppet_contract/
{
  time_arr[curpsinfo->pr_pid] = timestamp;
}

proc:::exit
/time_arr[curpsinfo->pr_pid]/
{
  printf("%d\t%s\t%s\n", timestamp - time_arr[curpsinfo->pr_pid],
         curpsinfo->pr_fname, curpsinfo->pr_psargs);
}

proc:::exit
/execname == "puppet-apply.sh"/
{
  exit(0);
}

What does it do? Well, the #pragmas tell DTrace to buffer up the output rather than writing it to stdout. It will be flushed when the script exits.

The BEGIN block, like its awk namesake, runs once, right at the start of execution. I set the contract variable to -1 because a contract can never have that ID, but I need it to be set to something so as not to mess up the predicates later in the script.

The first proper block of code runs when the puppet-apply.sh process starts. proc:::exec-success refers to a probe, which fires each time a process is successfully kicked off by exec(), or execve(), or one of their friends. The block of code in { braces } is only run if the /predicate/ passes. In this case we only want to run our code if the probe was triggered by puppet-apply.sh running for the first time. If it was, the script remembers the contract ID which that puppet-apply.sh has. I knew how to access the contract ID from the psinfo struct in /usr/include/sys/procfs.h.

The next block of code is very similar, but the predicate is different. Whereas before we looked to match the execname of the process which triggered the probe, this time we compare the contract ID of the process with the value we stored in the earlier block. For any matching process (loosely speaking, any part of the Puppet run) we append to an array called time_arr, with the index being the process PID, and the value being the time at which the process started: i.e. now.

The next block runs when a process exits. We don’t care what the process exits, just that it does. The /predicate/ says to use the PID of the exiting process to make a lookup in the time_arr array. It then prints out the time for which the process existed. This is the simple delta between start time and stop time. I don’t care how long it was blocked, on CPU, or anything else. I could do this quite easily if I wished, by accessing other data in the psinfo struct.

I chose to use the pr_fname, which is simply the path the executable, as well as pr_psargs, which gives a (shortened) command-line, including options and arguments. I want a very broad-brush view: a general flavour of what’s happening, and in my preliminary experiments I could see that the same binaries were being referred to implicitly and explicitly, and I didn’t want to have to amalgamate things later. More data means more options when I come to analyze it later.

Finally, I catch the wrapper script exiting, and terminate my tracing at that point.

An LX Branded Zone

I ran the D script in one window, in the global zone:

# dtrace -qs puppet-apply.d >out

then from a second terminal, spun up an Ubuntu LX zone using this manifest

{
 "brand": "lx",
 "image_uuid": "5917ca96-c888-11e5-8da0-e785a1ad1185",
 "alias": "ubuntu",
 "kernel_version": "3.13.0",
 "hostname": "xxxx_base",
 "alias": "xxxx_base",
 "max_physical_memory": 2048,
 "resolvers": ["192.168.1.26", "8.8.8.8"],
 "nics": [
  {
    "nic_tag": "admin",
    "ip": "192.168.1.84",
    "netmask": "255.255.255.0",
    "gateway": "192.168.1.1",
    "primary": "1"
  }
 ],
 "customer_metadata": {
   "user-script": "wget -P /var/tmp http://httpserv.localnet/xxxx_bootstrap.sh;
   chmod 700 /var/tmp/xxxx_bootstrap.sh; /var/tmp/xxxx_bootstrap.sh
   puppet-ops/ops:dev:base >/var/log/xxxx_bootstrap.log 2>&1"
 }
}

and this command.

# time vmadm create -f xxxx_base.json
Successfully created VM ca42db41-161d-438b-94e7-ea8d2a8dd87a

real  0m3.411s
user  0m0.846s
sys   0m1.002s

Doesn’t take long, does it?

The customer_metadata fetches a special bootstrap script from a local HTTP server, and runs it. This script is particular to my SmartOS setup (no one else where I work uses SmartOS) and acts as a thin wrapper around our actual bootstrap script. Essentially, it makes a fresh zone look like one of our standard Vagrant boxes. It then downloads and runs our real bootstrapper, which sets up repos and some external facts, then runs Puppet via a wrapper script called puppet-apply.sh.

image_uuid refers to an Ubuntu 14.04 base image supplied by Joyent. Everything else should be self-explanatory.

I flipped back to my DTrace window, and waited for stuff to occur.

When the DTrace exited and gave me back my prompt, I knew the Puppet run was complete, and I had data.

Results

$ wc -l out
3741 out
$ sed 10q out
1408894 kmod  /sbin/lsmod
1543007 grep  grep -q vbox
1541005 uname /bin/uname -s
1405667 hostname  /bin/hostname
1858056 hostname  /bin/hostname -f
4196447 dash  sh -c /bin/hostname -f 2> /dev/null
1767484 hostname  /bin/dnsdomainname
4072433 dash  sh -c /bin/dnsdomainname 2> /dev/null
1770362 hostname  /bin/hostname -f
4078016 dash  sh -c /bin/hostname -f 2> /dev/null

So lots of programs ran, and we have recorded the PID, exec time, path and arguments of each one. To recap, the first column is the time the command took, in nanoseconds; the second is the program which ended up running, and the remainder is the command that was invoked. The first line illustrates that lsmod is a symlink to kmod, and later we see that Ubuntu’s sh is really dash.

I recently re-read The AWK Programming Language (if you haven’t read it, do, if you have, read it again, you’ve probably forgotten how good awk is) so there seems a very obvious next step. Let’s tot up all those exec times for actual program calls with a little script called sum.awk and find out what we spent all that time doing.

{ cmd[$2] = cmd[$2] + $1; }
END { for (c in cmd)
        print cmd[c] / 1e9, c;
}
$ awk -f sum.awk < out  | sort -nr | sed 20q
428.482 dash
401.509 puppet-apply.sh
401.497 flock
401.477 puppet
345.846 apt-get
213.512 dpkg
41.5535 https
41.0096 http
20.8534 dpkg-preconfigur
18.8725 sshd
16.02   apt-extracttempl
14.8066 gem1.9.1
10.6841 man-db.postinst
10.4272 mandb
8.32055 frontend
8.25165 dpkg-deb
6.274   gpgv
5.77478 copy
5.35583 postfix.postinst
.442013 apt-show-version

We can ignore the first four entries, which are puppet-apply.sh and the shell it used (plus the odd other sh command), the lock which wraps our Puppet runs, and the Puppet process itself. We can also disregard sshd. Not sure quite how that crept in. Possibly it ended up in the same contract as Puppet when it was restarted after a config change.

So, clearly, the reason our Puppet runs take a long time is because we’re pulling in a lot of packages. Let’s use the long command fields to find out a bit more.

{ full_cmd = substr($0, index($0, $3));
  cmd[full_cmd] = cmd[full_cmd] + $1;
}
END { for (c in cmd)
        print cmd[c] / 10e9, c;
}
$ awk -f sum.awk < out  | sort -nr | grep apt-get | sed 10q
116.858 /usr/bin/apt-get -q -y -o DPkg::Options::=--force-confold install td-agent
37.9265 /usr/bin/apt-get -q -y -o DPkg::Options::=--force-confold install bgch-complian
25.3062 /usr/bin/apt-get -q -y -o DPkg::Options::=--force-confold install mailutils
23.8725 /usr/bin/apt-get -q -y -o DPkg::Options::=--force-confold install awscli
14.5637 /usr/bin/apt-get -q -y -o DPkg::Options::=--force-confold --force-yes install b
10.9446 /usr/bin/apt-get update
7.62643 /usr/bin/apt-get -q -y -o DPkg::Options::=--force-confold install apparmor-util
6.90385 /usr/bin/apt-get -q -y -o DPkg::Options::=--force-confold install gettext
5.84451 /usr/bin/apt-get -q -y -o DPkg::Options::=--force-confold install apt-show-vers
5.36272 /usr/bin/apt-get -q -y -o DPkg::Options::=--force-confold install ssh

The fluentd (td-agent) pacakge takes a good while to install (unless you compare it to any package installed, ever, with IPS). I don’t know why we even install mailutils. We’d already trimmed out packages with heavy dependency lists (check out the dependencies on the Ubuntu collectd package), a couple of which took for ever to install (Aide was one). But, still the vast majority of time on the initial Puppet runs was downloading and installing packages.

Upshot: I re-baked our AMIs and Vagrant boxes so they already contained the packages installed by our base Puppet role. You could, I’m pretty sure, have guessed what was taking all the time, but it was fun to do a bit of DTracing for the first time in ages, and it’s always interesting to see what’s going on underneath the veneer.

Tags: