Monitoring cron with DTrace
14 February 2019

cron is a pain. It’s not that it doesn’t work – it’s worked great for thirty-odd years. The problem is knowing that it’s worked. Or not worked. Once upon a time I wrote a truly horrible thing called the Cron Reporting and Auditing Portal (named for its acronym) which tried to put together a view of cron jobs across a decent sized estate. That was a pain to make. cron’s logs are difficult read, and more difficult to parse. It’s designed to do nothing on success, and to send an email on failure.

Alas, no one these days will pay me to work on Solaris or SmartOS, but I still run both at home. They do all kinds of things, using many cron jobs in many zones. For ages, I’ve meant to put something in place to monitor them.

The “correct” way to do this, I suppose, would be to monitor the outcome, by having the jobs themselves emit telemetry. For some things, I already do this. I don’t want to know my backup job ran, I want to know that it worked. But most of my cron jobs are small scripts and one liners, and the tasks they perform often don’t lend themselves to proper telemetry.

In the dozen or so years I’ve been building zoned systems, I’ve learnt that the way to monitor zones is from the global. The global can see everything, and no one inside a zone can interfere with what’s happening there. Bearing in mind that a modern system can easily run many hundreds of zones, you’re better off, I think, with one smart job running globally than a dumb job in every zone. There’s much less overhead, and much less to go wrong.

Sometimes your monitoring has to go into the zones, but running zlogin, or peeking in /zones should be a last resort. Better to observe the global view, and there’s nothing better at doing that than DTrace. So, I decided to write a DTrace script which looked for cron jobs, reporting whatever it sees to Wavefront.

I thought the best place to start would be a simple notification that cron had started a job.

To shorten my iterative cycle I set up three test cron jobs, covering three cases of execution. One works (exits 0); one fails (exits 1); one doesn’t run because the command does not exist. I set all of them to fire every minute, (“5-star cron job. Will run again.”) Doing this reminded me of one of the worst design decisions in Unix: crontab -r. Maybe whoever decided that one used a keyboard where the r, wasn’t right next to the e.

The source told me tasks are launched via execl(), so I wrote

syscall::exece:entry
/execname == "cron"/
{
  printf("RUN pid=%d cmd='%s' zone=%s user=%d\n",
          pid, copyinstr(arg0), zonename, uid);
}

and saw

RUN pid=473 cmd='/bin/false' zone=cube-media user=264
RUN pid=468 cmd='/usr/bin/true' zone=cube-ws user=264
RUN pid=469 cmd='/bin/does-not-exist' zone=cube-ws user=264

Right. Fine.

This data is going to end up in Wavefront, so we need something to plot against wallclock time. Execution time seemed a sensible metric, so I recorded the start time of each job in a thread-local variable. Next I needed a block to fire at the end of the job. This needs to calculate the elapsed time and write information about the job.

syscall::exece:entry
/execname == "cron"/
{
  printf("RUN pid=%d cmd='%s' zone=%s user=%d\n",
          pid, copyinstr(arg0), zonename, uid);
  self->t_start = timestamp;
}

proc:::exit
/self->t_start/
{
  printf("END pid=%d prog=%s zone=%s user=%d duration=%d\n",
          pid, execname, zonename, uid, timestamp - self->t_start);

}

Now I see:

RUN pid=4021 cmd='/usr/bin/true' zone=cube-ws user=264
END pid=4021 prog=true zone=cube-ws user=264 duration=1688168
RUN pid=4020 cmd='/bin/false' zone=cube-media user=264
END pid=4020 prog=false zone=cube-media user=264 duration=1585124
RUN pid=4022 cmd='/bin/does-not-exist' zone=cube-ws user=264
END pid=4022 prog=cron zone=cube-ws user=264 duration=28568

Good, good. But the point of the exercise is to know whether or not a job has succeeded. I assumed this would be as easy as showing the arg0 of exit, and changed my proc block to

printf("END exit=%d pid=%d prog=%s zone=%s user=%d duration=%d\n",
       arg0, pid, execname, zonename, uid, timestamp - self->t_start);

which gave me

RUN pid=4538 cmd='/bin/does-not-exist' zone=cube-ws user=264
END exit=1 pid=4538 prog=cron zone=cube-ws user=264 duration=30673
RUN pid=4536 cmd='/bin/false' zone=cube-media user=264
END exit=1 pid=4536 prog=false zone=cube-media user=264 duration=1554937
RUN pid=4537 cmd='/usr/bin/true' zone=cube-ws user=264
END exit=1 pid=4537 prog=true zone=cube-ws user=264 duration=1702499

Everything exits 1? Even true? Of course not. The DTrace guide clearly states exit is a

Probe that fires when the current process is exiting. The reason for exit, which is expressed as one of the SIGCHLD siginfo.h(3HEAD) codes, is contained in args[0]

So that’s no good. I had no joy with proc::exec-failure either, which I now assume means that the exec() itself failed, not the thing that was exec()ed.

I started hunting through dtrace -l, and landed on fbt::proc_exit:entry. I’m a bit scared of the fbt probe. I remember back in the day it was a bit “here be dragons”, and I never properly learnt it. But I thought it might be what I need here. (Speaking of “back in the day”, I’m not sure I’ve ever felt quite so old as when I saw the right hand column of this page.)

I couldn’t find much on proc_exit() in the Illumos code, but I did learn

extern int proc_exit(int, int);

So I asked DTrace to look at both args for me. The first was always 1. Whether that’s the same thing I was seeing from the proc provider, I don’t know, but looking at arg1 was more illuminating. I changed my proc:::exit block to

fbt::proc_exit:entry
/self->t_start/
{
printf("END exit=%d pid=%d prog=%s zone=%s user=%d duration=%d\n",
       arg1, pid, self->job_name, zonename, uid, timestamp - self->t_start);
}

and got

RUN pid=16435 cmd='/bin/false' zone=cube-media user=264
END exit=255 pid=16435 prog=/bin/false zone=cube-media user=264 duration=1565076
RUN pid=16436 cmd='/usr/bin/true' zone=cube-ws user=264
END exit=0 pid=16436 prog=/usr/bin/true zone=cube-ws user=264 duration=1569024
RUN pid=16437 cmd='/bin/does-not-exist' zone=cube-ws user=264
END exit=127 pid=16437 prog=/bin/does-not-exist zone=cube-ws user=264 duration=32172

127 for the thing that can’t be run at all; 255 for the thing which ran with an error, and a big fat 0 for the thing that worked. I reckon that’s all I need to know. Just to be certain I checked with /var/cron/log in the various zones, and found matching exit codes for all my jobs.

On the home straight now, I just needed to change the output to something Wavefront understands. Normally I’d pipe this into the Wavefront CLI, but that’s written in Ruby, and my SmartOS global zones don’t have Ruby. But they do have nc(1), so I just had to construct a Wavefront wire format string.

That is:

<metricName> <metricValue> [<timestamp>] source=<source> [pointTags]

I dropped the printf() call from my syscall::exece: block, and put this one in my fbt:.

printf("cron.job %d %d source=cube cmd=\"%s\" zone=\"%s\" uid=%d exit_code=%d\n",
       walltimestamp - self->t_start,
       self->t_start / 1000000000,
       self->job_name,
       zonename,
       uid,
       arg1);

Because we need a real timestamp now, I changed the declaration of self->t_start from timestamp to walltimestamp. Running the script, and piping it into nc wavefront 2878 I had data. Here are the times of cron jobs running in my global zone.

If you hover over the points, you’ll see that the top one is zfs_snapshot. But the other two just say /usr/bin/sh.

When you have a call to a program as your cron task, like /bin/zfs_snapshot, that gets executed as-is. But anything “shell scripty” gets run through /usr/bin/sh. (Which on Solaris is ksh93, not that bash rubbish.) I looked up the /usr/bin/sh jobs in the relevant crontabs, and found they looked something like this. (Line breaks added for formatting and clarity.)

0,10,20,30,40,50 * * * * cd /home/rob/work/record-collector/tools && \
                         bundle exec value_to_wavefront \
                         >/var/log/cron_jobs/value_to_wavefront.log 2>&1

Everything the kernel knows about a process is available in DTrace via the curpsinfo structure. That has to be examined in the fbt:: block, because the active process in the exece: block is cron itself.

curpsinfo->pr_fname didn’t offer me anything of value. In the case of a short job, it’s the basename() of what I already have in self->job_name; in the case of the longer “shell scripty” things, it’s sh. So forget that. curpsinfo->pr_psargs is a little more variable. For short things, it’s the program basename again; for “shell scripty” things, it’s the whole command. For jobs which completely fail to run, it’s /usr/sbin/cron.

I decided my cmd tag only needed to hold enough information to identify a cron job. It doesn’t have to be perfect. Clearly I had to apply some basic logic to the two strings in self->job_name and curpsinfo->pr_psargs to make something sensible and consistent. The problem is, logic isn’t DTrace’s forte. In short, it has no if. (It also has no loops.) This, you may immediately think, is stupid. What kind of programming language doesn’t do conditional branching? Well, D isn’t really a programming language; at least not a Turing-complete, general-purpose one. It’s very domain-specific, and its top priority is not affecting the system on which it runs. If it gave you the ability to write branching code it would not be able to comprehensively check what you wanted to do and could guarantee the safety of the script and therefore the system. So it doesn’t.

But, DTrace does have a ternary operator, and using that I found a way to use a couple of clause-local variables (very fast, memory automatically freed) to pick out sufficiently specific information. I also made the source name dynamic by wrapping the whole thing in shell and passing it in as a macro (argument) variable. The final script is:

#!/bin/ksh

dtrace -qn '
syscall::exece:entry
/execname == "cron"/
{
  self->t_start = walltimestamp;
  self->job_name = copyinstr(arg0);
}

fbt::proc_exit:entry
/self->t_start/
{
  this->ocmd = self->job_name;
  this->cmd = (this->ocmd == "/usr/bin/sh") ? (string)curpsinfo->pr_psargs : this->ocmd;

  printf("cron.job %d %d source=%s cmd=\"%s\" zone=\"%s\" uid=%d exit_code=%d\n",
         walltimestamp - self->t_start,
         self->t_start / 1000000000,
         $$1,
         this->cmd,
         zonename,
         uid,
         arg1);
  self->t_start = 0;
}' $(uname -n)

Which gives me this. (I’ve selected just a few zones, for clarity.)

I wrapped everything up in an SMF manifest, running as a normal user but granting DTrace privileges.

<method_context>
  <method_credential
    group='daemon'
    privileges='basic,dtrace_kernel,dtrace_proc,dtrace_user,!file_link_any'
    user='cronmon'/>
</method_context>

Finally, I wrote a quick Puppet module for deployment, pushed it out, and it’s been running happily ever since on Solaris and SmartOS.

I fixed all the broken cron jobs, improved a couple that seemed slow, and now Wavefront tells me if anything fails or takes inordinately long. It’s good.

Imperfections? Most obviously, the user is only displayed by UID. It would be nicer if it could show the username, but I’m not aware of any way I could get that out of DTrace, at least not trivially. The command names aren’t perfect, but they’re good enough.

The connection between nc and Wavefront is brittle, so it would be better to have nc write to a local Telegraf socket, and have Telegraf pass everything on to Wavefront. Telegraf is smart enough to buffer points if the endpoint temporarily disappears. I’ve got a global zone Telegraf input about half written, and once that’s done, this will talk to it.

Back when I got paid to work on Solaris, I did all kinds of cute sysadmin type things like this with DTrace. I’m not a Real Programmer, so I don’t often trace my own code, but I’ve always loved the in-depth introspection dynamic tracing gives to the ops engineer, particularly combined with the zones model.

I spend my days now in the world of IAM and containers and Lambdas, and everything-as-a-service. cron seems a quaint relic as, I suppose, does Solaris. But I do feel there is still a place for “the old way” of doing things. Simple tools, well understood, their edges worn smooth from years of careful use.

I’d love to get paid to work on this kind of “real Unix” stuff again, instead of the the glorified plumbing of AWS. But at least I have a fun hobby.

tags