— modern ops stuff —
Solaris 11 zones part 1
01 May 2012 // Solaris

Throughout the lifespan of Solaris 10, zones got more flexible and more complicated, but without the fundamentals changing.

In Solaris 11, however, things are different. The commands are still zonecfg and zoneadm, and many of the subcommands are as they were, but things have changed at a lower level. ZFS and SMF integration are far tighter, Crossbow is used much more, and, of course, the packaging system has changed, which means things have to happen in a new way. In this article I’m going to try to explain the major changes. There’s a lot of automation in zone creation now, and I haven’t yet found a decent technical document which explains it, so most of what follows I’ve worked out myself, mostly by watching what happens with dtrace.

Networking

I’m starting with this because I think it’s the best change. You can still have shared IP instance zones, but now we have the power of Crossbow, there isn’t a persuasive case for using them. Zones are created with their own full IP stack, which implies a VNIC. But here’s the new thing: the VNIC doesn’t have to exist in the global zone. The idea behind this is that zones can be moved between systems, so the receiving host doesn’t need to have anything done to it: you just drop the new zone in, VNIC and all, and it works. It also has the nice side-effect of not leaving an unused VNIC hanging about when a zone is removed.

So if you just create a zone newzone with the ip-type=physical property, and no other information, you’ll find that once the zone is installed and configured, it has a single NIC called net0, and your global zone’s primary interface will have gained a VNIC called, sensibly, newzone/net0. You’ll only be able to see that through dladm - it’s not in ifconfig’s output, and you won’t find it via ipadm, as everything other than the virtual link belongs to the zone.

All this “magic” is done by a new zone property, called anet. It has many properties, but the main ones are lower-link, which specifies the (usually) physical link the VNIC will be created on top of. This defaults to auto, which so far as I can tell means the primary interface. As I said, the link is usually physical, but you can also use aggregates, which is nice. linkname specifies the name of the link inside the zone, and it defaults to the aforementioned net0.

Now, giving away a whole IP stack, particularly in a multi-tenant system, comes with risks. For instance, our guest has access to ipadm and may change his address to something which clashes with another system on our network. So, network security has been beefed up, and is managed with new zone properties. allowed-address is simply an address, list, or range of addresses to which that interface may be configured. If a user in the zone attempts to set any other address, they’ll be told

ipadm: cannot create address: Permission denied

If you set the anet’s configure-allowed-address property, then the interface will be reconfigured to that address on every reboot, whatever it may have been when the zone went down.

Further security is provided by the link-protection property. This works only if allowed-address is set, and can be set to one or more of the dladm protection properties - mac-nospoof, ip-nospoof, dhcp-nospoof, and restricted. The first two mean that any packets whose source MAC or IP do not match those configured in the zone properties, are dropped. restricted discards packets which aren’t IPv4, IPv6, or ARP. I’m not looking at DHCP today, since I don’t use it.

It’s possible to limit the bandwidth consumed by the zone with the maxbw property, and you can set a priority for the zone’s VNIC with the priority command. You may notice a lot of these properties have equivalents in dladm. That’s because whether we’re using dladm or a zone with an exclusive IP instance, we’re still referring to a VNIC. Like I said - Crossbow is everywhere now.

The old net property still exists if you want to use it. It’s as it always was but for the addition of the allowed-address property. If you want the dladm security and flow management, you have to use anet. I don’t really see any reason other than legacy to continue using net.

Filesystems

Back when zones first appeared we used to have to put their roots in UFS filesystems because UFS was all we had. Then ZFS came along, we moved our zone roots to ZFS so we could snapshot them, then later found that some patches wouldn’t work unless the zones were on UFS roots. So we moved them back. Now UFS seems about as current as 5.25” floppies, and zone/ZFS integration is tight.

When you create a zone newzone, the following ZFS datasets are created:

NAME                                           MOUNTPOINT               ZONED
rpool/zoneroot/newzone                         /zones/newzone             off
rpool/zoneroot/newzone/rpool                   /rpool                      on
rpool/zoneroot/newzone/rpool/ROOT              legacy                      on
rpool/zoneroot/newzone/rpool/ROOT/solaris      /zones/newzone/root         on
rpool/zoneroot/newzone/rpool/ROOT/solaris/var  /zones/newzone/root/var     on
rpool/zoneroot/newzone/rpool/export            /export                     on
rpool/zoneroot/newzone/rpool/export/home       /export/home                on

Note the strong use of zoned datasets. This means the zone completeley owns them, so snaphots and datasets, can be created from the zone, and any other ZFS operation carried out there as well. The mountpoint shown above refers to the mountpoint inside the zone. Zoned datasets are still visible from the global zone, under the zone’s mountpoint, whenever the zone is running. (/ and /var are always mounted, even when the zone is halted.)

I’m not sure I’m entirely comfortable with something creating filesystems all over the place, and there seem rather a lot of them to me. I certainly don’t want /export/home creating for each zone.

dataset has gained a new property, alias, which lets you make any old delegated dataset appear to the zone as a proper zpool. For instance, in the global zone:

# zfs create space/newzone_data # zonecfg -z newzone_data
zonecfg:newzone> add dataset zonecfg:newzone:dataset> set
name=space/newzone_data zonecfg:newzone:dataset> set alias=zdat
zonecfg:newzone:dataset> end

Then boot the zone, log into it and you’ll see:

root@newzone:~# zpool list
NAME    SIZE  ALLOC   FREE  CAP  DEDUP    HEALTH  ALTROOT
rpool  24.9G  20.2G  4.72G  81%  1.29x    ONLINE  -
zdat   1.58T  1.44T   137G  91%  1.05x    ONLINE  -

Pretty nifty eh? In Solaris 10 the space/newzone_data would have appeared as a ZFS filesystem, not a pool.

Zone Administration

admin is a new property that integrates with RBAC to let you grant administrative rights to normal (i.e. non-root users). For instance, to give myself the ability to log into newzone with zlogin, and to stop, start and reboot that zone, I would do the following in zonecfg.

zonecfg:newzone> select admin user=rob
zonecfg:newzone:admin> set user=rob
zonecfg:newzone:admin> set auths=login,manage

As I said, RBAC is doing the magic, so we need to “Simon says” everything with pfexec.

$ zlogin newzone
zlogin: You lack sufficient privilege to run this command (all privs required)
$ pfexec zlogin newzone
[Connected to zone 'newzone' pts/12]
Oracle Corporation      SunOS 5.11      11.0    November 2011
#

And I’m in as root. It won’t work on other zones though:

$ pfexec zlogin tap-ws
zlogin: rob is not authorized  to login to tap-ws zone.

If you check /etc/user_attr you’ll see how this works (I’ve broken the line to preserve the page format.)

rob::::type=normal;auths=solaris.zone.login/newzone,
solaris.zone.manage/newzone;profiles=Zone Management;roles=root;
defaultpriv=basic,dtrace_kernel,dtrace_proc

I’m not sure how useful this is, as I wouldn’t generally want my zone tenants anywhere near my global zone, but if you’re a big RBAC shop, this may be right up your street. Someone needed it, or it wouldn’t be in.

Resource Control

I don’t think the original incarnation of zones had any resource control at all. Gradually things like CPU and memory caps crept in, and that was good. I’m fairy sure that pretty much every kind of resource control was covered in the later Solaris 10 releases, so there wasn’t a whole lot to add to this.

max-proceses is new however: it’s pretty much an extension of the old max-lwps global and caps, as you would expect, the number of processes that can be in the zone’s process table at any one time.

Observability

In Solaris 10 all we had to monitor zones was the -Z option to prstat. It was helpful, but given that observability is one of the things which sets Solaris apart from other, more hobbyist, operating systems, it wasn’t quite good enough. Now we have zonestat.

zonestat runs in intervals, like mpstat or vmstat, an by default it prints a row for each zone informing us how much CPU, physical memory, virtual memory, and netowrk bandwidth that zone has consumed over the previous interval.

I won’t just regurgitate the man page, read it for yourself, but a couple of things I particularly like are the -p flag, which forces colon-separated parseable output for your scripting pleasure; and the fact that unlike the traditional *stat tools, it waits to begin output, not bothering with that pointless, misleading first line.

Like prstat, zonestat is a tool which superficially looks simple, and defaults to far and away the most useful format, but has a lot of hidden depth. Read the man page.

It’s interesting to compare where Oracle have taken Zones with what Joyent are doing with SmartOS VMs. Joyent, as one might expect given then personnel, have developed phenomenal instrumentation far more powerful than zonestat, and they also give you the ability to properly DTrace in local zones. Solaris 11 can’t really do that. The DTrace functionality is pretty much the same as in the latest Solaris 10 releases, which is to say that by default, you get no DTrace at all in a zone:

# dtrace -l
 ID   PROVIDER            MODULE                          FUNCTION NAME

To get some kind of tracing capability, you need to give the zone special privileges by setting limitpriv via zonecfg. Let’s have a look what the DTrace privileges are:

$ ppriv -l | grep dtrace
dtrace_kernel
dtrace_proc
dtrace_user

Now, for (I hope) obvious reasons, you’re not going to get dtrace_kernel in a local zone. (Try it, it’ll just tell you it’s not permitted.) You can have the other two though. dtrace_proc gives you the fasttrap and pid providers; dtrace_user gives you profile and syscall. I think pid and syscall are the most useful providers anyway, so it’s much better than nothing.

Turn the privileges on through zonecfg.

zonecfg>set limitpriv=default,dtrace_proc,dtrace_user

Now let’s count the probes. These commands are on global and local zones on the same physical host:

# zonename
global
# dtrace -l | wc -l
   97272
# zonename
tap-ws
# dtrace -l | wc -l
     620

The mis-match there is mostly down to not having the fbt provider in a zone, but even with the probes that are enabled, I’ve been continually frustrated trying to DTrace in zones. (Solaris 10 had fewer probes, around 400.)

Fortunately, I always have access to the global, so it’s not a big problem, but not everyone has that privilege.

DTrace works great in SmartOS zones though, and I have to say I’m not hugely confident about the future of DTrace in Solaris 11. I don’t think it’ll go, or shrink, but I can’t see it growing much. I wish Oracle would go back to an open Solaris, and start integrating some of the things the Illumos people are doing. (And vice-versa - I want ZFS crypto in Illumos!)

I’m digressing aren’t I? Probably time to wind things up.

Fin

That’ll have to do for now. I’m working on a piece about the way zones are installed and configured, the new solaris brand, zone boot environments, and the way the new packaging system works. Should be up soon.

Tags: