I must begin this article by saying that, in all the Solaris virtualization work I’ve done, I haven’t found hypervisor level virtualization particularly useful. That is not a criticism of xVM or LDOMs, rather it’s a praise of zones.
If zones didn’t exist, or weren’t so powerful or flexible, I know I’d be using xVM and LDOMs for all kinds of things. I’ve certainly seen VMWare used to virtualize Windows and Linux systems to brilliant effect. But, I like the “thin separation” of zones. It’s so good to be able to manage multiple systems in one shot, and a single O/S pretending to be many O/Ses seems the right solution to almost all my virtualization problems. At least xVM has the advantage of letting you host multiple OSes. LDOMs just let you have multiple instances of Solaris 10, which has to be less useful. (Update: or Solaris 11, obviously.)
Sometimes though, LDOMs are the right way to go. Maybe two groups both want a cluster. Get them to buy one machine each, domain them, and cluster the domains. Nice.
In this article I’m going to give a bit of an outline of two kinds of domain: the primary (which some people call a ‘control’ domain), and the guest. You have to have one primary per physical host, because it’s what talks to the hyervisor and makes the domaining possible; the guest is where you run your applications. As guest domains don’t have dedicated I/O resources, you’d generally use them just for separation. Don’t put I/O intensive things like databases in them! There are also I/O domains and service domains, which allow you to farm out the distribution of resources from the primary to dedicated domains. I’ll leave those for another day.
First, you’ve got to be aware that the LDOM management software is picky
about patches and machine firmware versions. Check the README, or you’ll
end up in a proper pickle! Also make sure you’ve got the standard
Solaris LDOM packages installed. These should be installed on
sun4v
machines even in a minimal install, but it never hurts to
check.
$ pkginfo | grep SUNWldom
system SUNWldomr Solaris Logical Domains (Root)
system SUNWldomu Solaris Logical Domains (Usr)
Installing the LDOM Management software couldn’t be a lot easier. Just unpack it (the version I’m using is 1.2), and do
# LDoms_Manager-1_2/Install/install-ldm -d none
Because my Jumpstart server already locks my boxes down as hard as
nails, and I dislike JASS (I’ll tell you why one day), I’m using the
-d
flag to tell the installer not to apply a JASS profile to the
system. If you do an interactive install, this is equivalent to taking
the “Standard Solaris configuration” option. If unlike me, you’re a
conscientious sysadmin, you’ll choose “Hardened Solaris configuration
for LDoms”.
I told the installer not to launch the LDoms Configuration assistant. We’ll do things by hand to get a better feel for what’s really happening.
The LDOM software is installed in /opt/SUNWldm
, but if you do a
$ which ldm
You’ll see that the installer has kindly symlinked the binaries into the
normal superuser $PATH
.
We’d better check that we have the services we need, and find out what they’re doing.
$ svcs "*ldom*"
STATE STIME FMRI
disabled 14:30:18 svc:/ldoms/vntsd:default
online 14:25:09 svc:/ldoms/ldmd:default
The vntsd
service shouldn’t be up yet. In fact, if you try to start
it, it will fall into maintenence state, and you’ll get a line in
messages
to the effect of
Error opening VCC device control port:
We’ll come back to that later. So what’s the ldmd service doing?
$ svcs -Hp ldoms/ldmd
online 13:48:43 svc:/ldoms/ldmd:default
13:48:42 4209 ldmd
Not a lot. Just running the ldmd
, which is the “management daemon”.
It’s the glue between the hardware hypervisor and the domains.
# ldm list
------------------------------------------------------------------------------
Notice: the LDom Manager is running in configuration mode. Configuration and
resource information is displayed for the configuration under construction;
not the current active configuration. The configuration being constructed
will only take effect after it is downloaded to the system controller and
the host is reset.
------------------------------------------------------------------------------
NAME STATE FLAGS CONS VCPU MEMORY UTIL UPTIME
primary active -n-c-- SP 16 8064M 0.1% 1h 25m
From the top, the notice is a clue that, kind of like a cluster, you have to
go into “configuration mode” and later commit the configuration changes
to the system. You can see what configuration you’re using with the
ldm list-config
command.
$ ldm list-config
factory-default [current]
Back to the ldm list
output, and the bottom line shows us we have a
single domain, "primary" to which is currently assigned all of
the physical machine’s 16 VCPUs and 8Gb of RAM. (I’m using a fairly
low-spec T2000.)
The primary domain is also known as the control domain. You have to have one of these, and it’s where you create and administer all your other domains - the ones that run your apps and do some work. It’s also the source of your all-important console access. That is, it’s the one that you get to through your ALOM, and it runs the terminal concentrator that lets you access the consoles of all the other domains.
Your primary domain should be minimal and as secure as you can make it. Strip it right down. Turn everything off. As I said, all it does is manage the other domains, so it won’t need much in the way of resources. All it needs is CPU and a couple of GB of RAM. Note that the following commands will all trigger the notice we saw above. That’s fine.
# ldm set-mem 2G primary
# ldm set-mau 1 primary
# ldm set-vcpu 4 primary
Because initially all hardware resources were assigned to the primary
domain, the commands above effectively remove them. If you try to
remove all but one of the CPU cores, ldm
will complain that that
leaves an orphaned MAU. This makes sense, as each core has its own MAU.
That’s why I had to do the set-mau
commands before I set the CPUs. A
single CPU core on a T2000 is four VCPUs, hence the 4
.
Returning to the ldm list
output above, the STATE
of the primary
domain is always going to be active. The n
and c
flags show that the
domain is "normal" and a control domain, respectively. CONS
shows us the domain’s console. In this case, the SP.
It’s the job of the control domain to virtualize all the system’s
resources and present them to the other domains. The first thing to
present is a console. This is done via the VCC, or Virtual Console
Concentrator. It’s kind of like a Cyclades in software. To make a VCC we
need to decide what ports to use, what to call it, and what domain it
will be in. Convention seems to be to use ports 5000 to 5100, and as
it’s going to be in the primary domain, we’ll call it primary-vcc
.
# ldm add-vcc port-range=5000-5100 primary-vcc primary
The domains will also need some network connections. I’ve cabled up two of
the e1000g ports on my T2000. e1000g0 is on the 10.10.8.0
subnet, and
e1000g1 is on 10.10.4.0
I’m going to put a virtual switch on each of
those NICs, named in a way that identifies them. The syntax is similar
to the add-vcc
command. Again, the virtual switches belong to the
primary
domain.
# ldm add-vsw net-dev=e1000g0 primary-vsw-10108 primary
# ldm add-vsw net-dev=e1000g1 primary-vsw-10104 primary
So we have a pool of spare CPUs, a pool of spare memory, a terminal server, and a couple of switches to offer our guest domains. The only other thing they’ll need is I/O.
When you carve up big-iron into domains, you have to give each domain its own I/O channels so it can access disks or cards. My T2000 only has two channels, and it supposedly supports up to 32 LDOMs. Obviously they can’t all have their own dedicated I/O channel: they’re going to have to share. This sharing done through a VDS, or Virtual Disk Server. You only have one of these in your primary domain, and the way in which it’s created is probably not a surprise.
# ldm add-vds primary-vds primary
I also need some disk space to assign.
# zpool create mirror space c1t0d0s3 c1t1d0s3
# zfs set mountpoint=none space
# zfs set compression=on space
# zfs create space/ldoms
# zfs set mountpoint=/ldoms space/ldoms
Later we’ll create virtual disks in that ZFS dataset, and make them available to the guest domains via the VDS.
With the primary domain and all its virtualization interfaces, it’s time to store the configuration. You can give it any name you like.
# ldm add-config basic-conf
# ldm list-config
factory-default
basic-conf [current]
Because of the close tie between the hardware and the primary domain, a change to the config requires a reboot. So reboot.
# init 6
Once we’re rebooted, let’s see how things look.
# ldm list
NAME STATE FLAGS CONS VCPU MEMORY UTIL UPTIME
primary active -n-cv- SP 4 2G 0.4% 3m
Well, that all seems to be as we requested. Note that we’ve acquired a
v
in the flags. That denotes the domain is a "virtual I/O service
domain". That is, it provides the VCC, VSW and VDS we defined
earlier.
But is ldm
getting the VCPU and MEMORY values from the configuration
we created, or does the domain really look like that?
$ prtdiag | head -2
System Configuration: Sun Microsystems sun4v Sun Fire T200
Memory size: 2048 Megabytes
$ psrinfo
0 on-line since 11/10/2009 16:56:14
1 on-line since 11/10/2009 16:56:15
2 on-line since 11/10/2009 16:56:15
3 on-line since 11/10/2009 16:56:15
Yep. The hypervisor is only showing this domain the resources we reserved for it. Cool. How about the virtual switches?
# dladm show-dev
vsw0 link: up speed: 100 Mbps duplex: full
vsw1 link: up speed: 1000 Mbps duplex: full
...
(Don’t worry about the different speeds. One of the NICs is plugged into a pretty old switch.)
If you forget how you defined what, ldm list-bindings
will remind
you.
Back to work. Remember that vntsd
service from earlier? Well now we
should be able to start it up. It runs the vntsd
process, which
acts as a terminal server for our LDOMs.
# svcadm enable vntsd
Now we can create a guest domain. It’s pretty similar to creating the
primary domain, but you’re looking at the VCC, VSW and VDS from the
other side. My naming convention is to add l-something-num
on to the
hostname, where l
denotes a logical domain, something
denotes the
purpose of the domain, like ws, or db, and n
is an instance number.
This can get a bit of a mouthful, but it’s once you know the code you
can instantly tell that ls-w-01-l02 is logical domain 02 on webserver 01
machine 01, running Solaris, in London. You may wish to call the same
domain "Kryten". That’s up to you, and I won’t judge.
Let’s make a webserver domain, with 4Gb of RAM, 4 VCPUs and the attendant MAO. It’ll be on my 10.10.4 subnet, auto-booting, and I’m going to give it 10Gb of disk space. The hardware resources are easy, they’re just like when we did the primary. The machine I’m working on is cs-dev-02.
# ldm add-domain cs-dev-02-lws01
# ldm add-mau 1 $_
# ldm add-vcpu 4 $_
# ldm add-mem 4G $_
Next we’ll make a virtual disk in our ZFS pool, and assign it as a VDS device. This bit’s a little bit more complicated. I’ll call the disk domianname-boot, so it’s clear what it’s for.
# zfs create -V 10g space/ldoms/lws01-boot
# ldm add-vdsdev /dev/zvol/dsk/space/ldoms/lws01-boot lws01-boot@primary-vds
Note that you’re referring to a device file. You can infer from that that you could just as easily map a real disk, or a real multipathed SAN LUN, to a guest.
Whatever you map, once the VDS device exists, we can assign it to the domain as a vdisk,
# ldm add-vdisk lws01-boot lws01-boot@primary-vds cs-dev-02-lws01
then tell the domain that’s the boot disk.
# ldm set-variable boot-device=lws01-boot cs-dev-02-lws01
If that looks familiar, good. Each LDOM has its own OBP, and ldm
set-variable
is analagous to eeprom
. While we’re about it, we can
also set the domain to auto-boot, just like we would with a physical
server.
# ldm set-variable auto-boot\?=true cs-dev-02-lws01
Now we virtually plug the domain into the virtual switch.
# ldm add-vnet lws01-vnet-10104 primary-vsw-10104 cs-dev-02-lws01
I’m planning to Jumpstart the domains on this server, so each domain is going to need its own MAC address.
# ldm set-variable local-mac-address\?=true cs-dev-02-lws01
And we can store the configuration to the system controller. ldm
likes
to call this "binding".
# ldm bind-domain cs-dev-02-lws01
# ldm ls
NAME STATE FLAGS CONS VCPU MEMORY UTIL UPTIME
primary active -n-cv- SP 4 2G 0.3% 1h 5m
cs-dev-02-lws01 bound ------ 5000 4 4G
There it is. Let’s start it.
# ldm start-domain cs-dev-02-lws01
# ldm ls
NAME STATE FLAGS CONS VCPU MEMORY UTIL UPTIME
primary active -n-cv- SP 4 2G 0.4% 1h 6m
cs-dev-02-lws01 active -t---- 5000 4 4G 25% 20s
The t
flag means the domain is in a "transition" state,
and the 5000 in the CONS
column is the port through which we can
connect to the domain’s console. Let’s do that.
$ telnet localhost 5000
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
Connecting to console "cs-dev-02-lws01" in group "cs-dev-02-lws01" ....
Press ~? for control options ..
~ ?
{0} ok
{0} ok
There you go. Just like being on a proper computer.
{0} ok banner
Sun Fire T200, No Keyboard
Copyright 2009 Sun Microsystems, Inc. All rights reserved.
OpenBoot 4.30.3, 4096 MB memory available, Serial #83503631.
Ethernet address 0:14:4f:fa:2a:f, Host ID: 84fa2a0f.
Now you can set up your Jumpstart server in the usual way, and install
Solaris. It’s important to know that a virtual disk won’t have a target
number in its name, and the first assigned disk will be disk 0 on
controller 0. So, your root partition would be c0d0s0
.
It’s also vital to know that the MAC address shown in the banner doesn’t belong to the domain, it belongs to the virtual switch the domain is on. (Update: I don’t think that’s the case any more, but be aware of it just in case you’re using an old version of the softwar.) To get the domain’s real MAC, use
# ldm ls-bindings cs-dev-02l-s101
and get it from the network section.
Cheat
I’ve made a script that does all this for you, and I’ve even written documentation too.