Moving to OmniOS 01: The Global Zone
25 September 2019

In the beginning was the word and the word was Solaris. And Sun did free its source, and Solaris begat OpenSolaris and happiness reigned. Yet a darkness spread over the world and Oracle seized Solaris and the people fled. OpenSolaris begat Illumos, and was imprisoned for its sin. Solaris lived in chains in a distant land, while Illumos begat SmartOS and OmniOS, and yet more progeny whose names are lost. Though their fathers lay slain by giants, these mighty works live on, revealing their secrets to the enlightened.

The Setup

I have computers at home, and for reasons I cannot explain, it’s somehow important to me that those computers print SunOS when I run uname -s.

It’s a kind of religious thing, and I don’t expect anyone to understand.

I no longer run a Solaris desktop, but it’s still on my fileserver. I have a lot of clients’ data on there and I need encryption at rest, so I had to stick with “proper” Oracle Solaris.

Crypto was added to ZFS after Oracle re-closed the Solaris source, so it’s never been available in any of the open source Illumos distributions. But now the OpenZFS project has it too, and it’s been merged into Illumos. So it looks like I can finally junk that last, unpatched, unsupported, Solaris box and migrate to something more libertarian.

If I had any sense I’d move to Linux or FreeBSD, but I don’t. I’m a glutton for punishment, and I have this inexplicable devotion to one particular kernel, so I’m installing the Community edition of OmniOS.

The Plan

cube is my Solaris server, named for its shape. It runs about twenty zones doing all manner of things, though its main responsibility is a file/media server. Right now it runs Solaris, and it’s going to be rebuilt with OmniOS. It’s fully config-managed, and I have my own tooling to deploy and redeploy zones in seconds with a single command.

tornado (shark died) is my SmartOS box. SmartOS has been my development platform for a few years, as I deploy to the Joyent Public Cloud, and like that homogeneity. But the JPC is being turned off next month, so tornado is redundant. Much as I like SmartOS, it’s very datacentre focussed, and not the ideal home computer OS, so I’m not sticking with it.

I plan to install OmniOS on tornado, and duplicate as much of cube’s functionality as possible, building everything with the Puppet code which already builds the zones on cube. When that’s all done, I’ll reinstall cube with OmniOS and play the config into it, and tornado can go back on eBay, from whence it came.

The Snags

I expect these things are going to be an issue.

The Installation

Couldn’t be easier.

OmniOS offers two release paths. Stable is, well, stable, including LTS releases. Bloody is the experimental “beta” release. I guess it’s not quite fully cooked.

I need ZFS crypto, which hasn’t hit stable yet, so I downloaded bloody, dd-ed it to a USB stick and booted tornado.

OmniOS’s Kayak the nicest manual installer I’ve used. It’s super quick, console-based, and constantly offers you the option to drop out to a shell and do whatever you might need to do. I was particularly happy about the flexibility of where to install the OS, having been frustrated too many times by “whole disk or nothing” installers. Blatting the root zpool onto disk takes less than ten seconds, and post-install config of networking and a pfexec-enabled admin user didn’t take much longer. Five minute job. Lovely.

By the time I’ve finished working all of this out, the crypto stuff will be in stable, and I’ll rebuild everything on that. I’m big on stability.

The Configuration

I had a nice Puppet setup for my Solaris and SmartOS boxes. So it should be simple to have Puppet configure OmniOS exactly the way it configured Solaris. BUT!

$ pkg search puppet
$

Looks like we’ll have to do this ourselves. Have we at least got Ruby?

$ pkg search mediator:ruby
INDEX      ACTION VALUE PACKAGE
mediator   link   ruby  pkg:/ooce/runtime/ruby-26@2.6.5-151033.0
mediator   link   ruby  pkg:/ooce/runtime/ruby-25@2.5.7-151033.0

Phew. If you didn’t know, “mediators” are how IPS lets you install different versions of a package side-by-side. I could install both those versions of Ruby, and flip between them by changing the mediator.

Even from the first couple of commands, pkg felt quicker on OmniOS than it does on Solaris. To stop everyday operations being annoyingly slow, OmniOS keeps only essential packages in its default repo. (Or, to be more correct default “publisher”.) Ruby is not in there, because most people probably won’t want it. It’s in “extra”. You can choose to have the extra publisher enabled as a post-installation option in Kayak, or add it at any time like this:

$ pfexec pkg set-publisher -g https://pkg.omniosce.org/bloody/extra/ extra.omnios

pkg publisher will show you which repos are in your search path.

$ pkg publisher
omnios                      origin   online F https://pkg.omniosce.org/bloody/core/
extra.omnios                origin   online F https://pkg.omniosce.org/bloody/extra/

There isn’t even that much stuff in extra, and I like that. OmniOS knows what it is: a server operating system. There’s more in the repo than core OS packages, but the tools have been chosen with care. What you need is likely there, but what you want might not be.

Anyway, I want Ruby, so let’s install it.

$ pfexec pkg install pkg:/ooce/runtime/ruby-26@2.6.5-151033.0
           Packages to install:  1
           Mediators to change:  1
       Create boot environment: No
Create backup boot environment: No
$ which ruby
/opt/ooce/bin/ruby

I’d have preferred it in /usr, but never mind. I stuck /opt/ooce/bin in the PATH by editing /etc/skel/.profile and moved on. We’ll make sure that gets Puppetted later.

To minimize friction, I’ll install the same version of Puppet that I have on Solaris. I’ll also tell gem not to install RubyDoc documentation, and to put its executables in the same place as the ruby binary. (I thought it would do this automatically, but it did not.)

$ pfexec gem install puppet -v 5.5.0 --no-document --bindir=/opt/ooce/bin
...
Successfully installed puppet-5.5.0
6 gems installed
$ puppet --version
5.5.0
$ facter operatingsystem
OmniOS

I already have a Puppet server, and my internal DNS resolves puppet to it. Let’s see what happens.

$ ping puppet
puppet is alive
$ pfexec puppet agent -t
Info: Creating a new SSL key for tornado.localnet
...
Info: Loading facts
Error: Could not retrieve catalog from remote server: Error 500 on SERVER:
...

Not bad! It’s talked to the server, done key exchange, and at least tried to run the code. I didn’t expect it to work, because OmniOS is not Solaris, and though I have a os/Solaris.yaml in my Hiera config, I do not have an os/OmniOS.yaml. So I copied the former to the latter and ran Puppet again. It ran, with a few errors. First:

ERROR: Puppet Management is not a valid profile name.

I said there’d probably be RBAC problems, and here’s the first one. That profile only exists in Solaris. In fact, I think it only exists in Solaris 11.4. Let’s drop it. We can, I suppose, build it into OmniOS’s profile list should we require it later, but we probably won’t. For now, at least, I’ll be using the Primary Administrator role, which lets me do anything as root simply by prefixing a command wich pfexec. (Oracle have removed Primary Administrator from Solaris; a decision which initially annoyed be greatly, but which I now think was probably wise.)

The next twenty or so errors were:

pkg install: The following packages all deliver file actions to usr/share/man/man1m/ntptime.1m:
pkg://omnios/service/network/ntp@4.2.8.13,5.11-151031.0:20190919T120003Z
pkg://omnios/service/network/ntpsec@1.1.7,5.11-151031.0:20190919T115848Z

I’ve seen this package conflict mentioned in the OmniOS release notes. It was coming from the Puppetlabs NTP module, which was making a best-guess at this being a Solaris system. I made an OmniOS-specific call to the ntp class, and included the following lines.

service_name => 'svc:/network/ntp:default',
package_name => ['pkg:/service/network/ntpsec'],

Fixt!

The next error reminds us that Puppet is only a fancy way to run shell scripts.

Error: Failed to apply catalog: Execution of '/usr/bin/svcprop -a -f *'
returned 2: /usr/bin/svcprop: illegal option -- a
Usage: svcprop [-fqtv] [-C | -c | -s snapshot] [-z zone] [-p [name/]name]...
         {FMRI | pattern}...
       svcprop -w [-fqtv] [-z zone] [-p [name/]name] {FMRI | pattern}

I use a fork of Oracle’s Solaris Puppet module, and it (not unreasonably) assumes “proper” Solaris. OmniOS’s svcprop, as you see, does not have the -a flag, because that relates to service templates, and they came from Oracle after the fork. (They’re rather nice, and it’s a shame to lose them.)

The fix for this is a bit of logic in lib/puppet/provider/svccfg/solaris.rb. I made a new method to quickly check the distribution:

def self.illumos?
  IO.read('/etc/release') =~ /Solaris/ ? false : true
end

Then added an if clause to call svcprop without the -a on Illumos.

The final error was:

Error: Could not find a suitable provider for process_scheduler

process_scheduler is a very simple provider, again from Oracle, which uses dispadmin to set the scheduler. I always set the scheduler to FSS on boxes with zones. The problem is in lib/puppet/provider/process_scheduler/solaris.rb:

confine :operatingsystem => [:solaris]

I changed that to

confine :operatingsystem => [:solaris, :omnios]

and my run completed. Hurrah! All my internal networking, users, security policies, automatic snapshotting, and various sundries, configured just how I like them. Oh, and look at this:

$ svcs | grep sysdef
online         15:33:18 svc:/sysdef/puppet:default
online         15:33:23 svc:/sysdef/cron_monitor:default
online         15:33:26 svc:/sysdef/diamond:default

I’ve already been down a similar path on SmartOS, and my Puppet config contains an SMF manifest and puppet.conf suitable for that OS. So we have a running Puppet agent, with a correctly configured interval and splay. (I just had to change /opt/local for /opt/ooce in a couple of places.)

We also have telemetry, from my Solaris Diamond fork and my cron-job monitor, and to prove it, here’s the CPU usage on this box and the one it will eventually replace. The spikes are Puppet runs.

I even started getting alerts from my telemetry system, because snapshots weren’t being created, on account of me not yet having migrated any ZFS datasets!

So that’s the global zone built and config-managed, with telemetry and alerting, all ready to be carved up into the zones which will host the applications. Didn’t take a lot of effort really, did it?

Next time: zones.

tags