Category Archives: Virtualization

Setting Up a vSphere Service Account for Pivotal BOSH Director using PowerCLI

BOSH Director requires a fairly powerful vCenter service account to do all of the things it does.

The list of permissions required is here, and it’s extensive.

You can always take the shortcut and make your account an Administrator of the vSphere environment, but that violates the whole “least privilege” principle and I don’t like that in production environments.

I wrote a working PowerCLI code function that will automatically create this vCenter role and add the specified user/group to it.

It greatly reduces the time to set this up.  Hope this helps someone out.

 

vRealize Orchestrator HTTP-REST – Cannot execute the request; Read timed out

I recently stumbled upon an issue with the HTTP-REST plugin in VRO that took some experimentation to understand.  For some reason, I kept getting a “Read timed out” error when my workflows would make a REST call that took more than 60 seconds to return a response.  There is an operationTimeout property you can set to govern this, but I found it is ignored under certain circumstances. It’s very confusing since you can examine the opeationTimeout property and it *appears* correct. I had to do quite a bit of testing to get to the bottom of the behavior.

In my implementation, I was using VRO 7.2 and transient HTTP REST host objects to do my REST calls. I favored that approach over using the HTTP-REST configuration workflows to add and remove every host and combination of operations I could some day invent. This approach seemed somewhat inflexible.

Here is my basic testing workflow:

Test Workflow

Here is the code in the script:

//  Username, password, and useTransientHost are input parameters.

var uri = "https://myrestapihost.domain.com/api/DoSomething/id44";
var method = "GET";
var body = "";  // For POST/PUT body content.  This has to be a JSON string. E.g.  body = "{ 'p1' : 1, 'p2' : 2 }";
var httpRestHost = null;

if ( useTransientHost )
{
  System.log("Using Transient host.");
  //  Create a dynamic REST host:

  var restHost = RESTHostManager.createHost("dynamicRequest");
  restHost.operationTimeout = 900;  //  This gets ignored!!!
  httpRestHost = RESTHostManager.createTransientHostFrom(restHost);
  httpRestHost.operationTimeout = 900;  //  Set it here too, just to be really really sure.
}
else
{
  System.log("Using NON-Transient host.");
  httpRestHost = RESTHostManager.getHost("71998784-d590-426d-8945-75ec0b1ad7b4");		//  Use the ID For your HTTP-REST host here
  httpRestHost.operationTimeout = 900;  //  This gets ignored!!!
}

System.log("OperationTimeout  is set to: " + httpRestHost.operationTimeout.toString());

//  Create the authentication:
var authParams = ['Shared Session', userName, password];
var authenticationObject = RESTAuthenticationManager.createAuthentication('Basic', authParams);
httpRestHost.authentication = authenticationObject;

//  Remove the endpoint from the URI:
var urlEndpointSplit = uri.split("/");
var urlEndpoint = urlEndpointSplit[urlEndpointSplit.length - 1];
uri = uri.split(urlEndpoint)[0];

httpRestHost.url = uri;

//  REST client only accepts method in all UPPER CASE:
method = method.toUpperCase();

var request = httpRestHost.createRequest(method, urlEndpoint, body);
request.contentType = "application/json";

System.debug("REST request to URI: " + method + " " + request.fullUrl);

var response = request.execute();   //  This should have a 90-second timeout
System.debug("Response status Code: " + response.statusCode);

if ( response.contentAsString )
{
  System.debug("Response: " + response.contentAsString);
}

I added three input parameters:

  • userName
  • password
  • useTransientHost

When I call it using a transient host, it always times out in 60s, no matter what I set the operationTimeout setting to. Here is the output from the run:

[2017-03-28 11:51:55.740] [I] Using Transient host.
[2017-03-28 11:51:55.747] [I] OperationTimeout is set to: 900
[2017-03-28 11:51:55.751] [D] REST request to URI: GET https://myrestapihost.domain.com/api/DoSomething/id44
[2017-03-28 11:52:55.902] [E] Error in (Workflow:Example REST API Call / HTTP Rest Call (item1)#43) Cannot execute the request: ; Read timed out

You can see the 60s timeout despite the fact the operationTimeout property was set to 900.

I ran it again and referenced a non-transient HTTP host:

2017-03-28 12:00:24.618] [I] Using NON-Transient host.
[2017-03-28 12:00:24.629] [I] OperationTimeout is set to: 900
[2017-03-28 12:00:24.634] [D] REST request to URI: GET https://myrestapihost.domain.com/api/DoSomething/id44
[2017-03-28 12:02:24.740] [E] Error in (Workflow:Example REST API Call / HTTP Rest Call (item1)#46) Cannot execute the request: ; Read timed out

In this case, it timed out in 120 seconds, not 60 (or 900). I found 120 came from what I entered in for operationTimeout when I created the host using the HTTP-REST/Configuration/Add a REST host workflow:

Test Host Settings

So, in the end, the following appears to be true:

  • OperationTimeout defaults to 60 seconds.
  • Though it appears you can, you CANNOT override by setting the operationTimeout property in code (this should be a read-only property if that is the case)
  • It instead uses the operationTimeout set on the HTTP-REST host object when you create (or update) it using the configuration workflows.
  • Transient hosts are always at 60s timeouts. No way to override this.

However:

You CAN override just about everything else, including URI and authentication.

This means I can get around this by adding a dummy HTTP-REST host as per normal using the Add a REST host workflow:

TestHost1TestHost2TestHost3

The URL, Authentication and other settings do not matter, they can be overridden in your code as I did above in my example. The ONLY setting that matters is the operationTimeout (and perhaps the connectionTimeout, which stands to reason may have the same issue, but I never tested it).

Then reference the host ID as I did in the code above and override the URL, authentication, and whatever else you need to.

I’ve engaged VMware tech support to log this as a bug. I really think the operationTimeout should be settable or read-only.  We’ll see where that goes…

UPDATE 4/6/2017 – I just got final word back from VMware tech support engineering.  The behavior I noted above is normal behavior, and the workaround I proposed is the accepted workaround.  Nothing to see here…

I did ask for a feature request to either have the operationTimeout property be programmatically changeable or to be set as read-only to reduce confusion.

Updated FC Drivers on ESXi

I had an issue where I had to change out the driver ESXi uses for it’s QLogic fiber channel hbas.  Below is the procedure I used to make the change from the qlnativefc and qla-2xxx driver sets.  I was running ESXi 5.5, but the procedure for other versions is likely similar.

Updating the Qlogic Driver on an ESXi Host

ESXi hosts with Qlogic adapters use the qlnativefc driver out of the box.

On some servers, including Dell PowerEdge R7xx servers, the adapter does not support NPIV without a different driver.  Follow the procedure below to update the driver

Update the Qlogic Driver to an NPIV-enabled Driver

      1. Download the qlogic driver for Vmware.

VMware ESXi5.x FC-FCoE Driver for QLogic and OEM-branded Fibre Channel and Converged Network Adapter

https://my.vmware.com/web/vmware/details?downloadGroup=DT-ESXI5X-QLOGIC-QLA2XXX-9345320-1VMW&productId=285

Example:  qla2xxx-934.5.38.0-1872208.zip

      1. Extract the zip file and find the offine_bundle.zip file if it’s not already extracted.
      2. SCP the offline bundle file to a datastore on the ESXi host that all of the other hosts can see.

Example:   /vmfs/volumes/vmware_host_swap_03/qla2xxx-934.5.38.0-offline_bundle-1872208.zip

For each host:

      1. Log onto the VM host
      2. Put the host in maintenance mode.
      3. Do the following:

esxcli software vib install -d /vmfs/volumes/vmware_host_swap_01/qla2xxx-934.5.38.0-offline_bundle-1872208.zip

Remove the old driver

esxcli software vib remove -n qlnativefc

      1. Reboot the host
      2. Verify after the host is back up:

esxcli software vib list | grep -I qla2xxx

Rolling Back the Driver Upgrade

Vmware QLNativeFC Driver:

https://my.vmware.com/group/vmware/details?downloadGroup=DT-ESXI55-QLOGIC-QLNATIVEFC-11390-1&productId=353&download=true&fileId=838e130270df93ad2aca6bd64be27f06&secureParam=&uuId=08b248a1-29dc-4ce5-8da3-6353d2997bce&downloadType=

Example:  VMW-ESX-5.5.0-qlnativefc-1.1.39.0-2243137.zip

      1. Extract the zip file and find the offine_bundle.zip file if it’s not already extracted.
      2. SCP the offline bundle file to a datastore on the ESXi host that all of the other hosts can see.

Example:   /vmfs/volumes/ah-3270-2:vmware_host_swap_03/VMW-ESX-5.5.0-qlnativefc-1.1.39.0-offline_bundle-2243137.zip

For each host:

      1. Log onto the VM host
      2. Put the host in maintenance mode.
      3. Do the following:

esxcli software vib install -d  /vmfs/volumes/ah-3270-2:vmware_host_swap_03/VMW-ESX-5.5.0-qlnativefc-1.1.39.0-offline_bundle-2243137.zip

      1. Remove the old driver

esxcli software vib remove -n scsi-qla2xxx

      1. Reboot the host
      2. Verify after the host is back up:

esxcli software vib list | grep -i ql

Setting up and troubleshooting NPIV on VSphere

I recently setup NPIV for several of my VMs in my production environment.  Let’s just say it was an adventure.  I wanted to detail my experience below, including some step-by-step instructions for setting it up and some advanced troubleshooting I had to do to resolve my issues.

My environment consists of VSphere ESXi 5.5 hosts with Brocade DCX backbones.  Though this setup revolves around my particular setup, the steps here should work with most setups.

Definition

NPIV is the process where a Fiber Channel switch port can advertise multiple Port WWPNs for the same fiber channel connection.  This allows you to have a single switch port represent the WWPN of an ESXI host port and also a virtualized WWPN assigned directly to a VM.  Cisco UCS also does this, with the physical ports of the edge device representing the different virtual WWPNs of the different virtual HBAs on the service profiles provisioned on the Cisco hardware.

This is useful when you want to identify a particular VMs fiber channel traffic by identifying it by its own unique WWPN.  Also, with Hitachi Server Priority Manager (QOS), you have to assign QOS limits by selecting a WWPN to assign a quota to.  Without NPIV, the best you could do is limit the ESXi server HBA to a limit, thereby limiting ALL vms on the ESXi host.

We use VirtualWisdom, which is a Fiber Channel performance and health monitoring software package made by VirtualInstruments.  With NPIV, we can now trace IOPS, MB/s and latency right to the VM as opposed to ESXi HBA port.

Requirements

NPIV has some strict requirements, and failing to adhere to them can give you headaches.

  • Your switch has to support NPIV (Most modern switches, including DCX backbones do)
  • NPIV needs to be enabled on your switch ports (Enabled by default on DCX backbones)
  • ESXi needs to support NPIV (versions past 4.0 do for sure.  I see references to it back to v3.5)
  • NPIV can only be used in conjunction with RDM devices.  .vmdk disks from a datastore are still identified by the ESXi host’s wwpns and cannot use NPIV.
  • The driver on the ESXi host has to support NPIV (Esxi qlnative drivers on 1.1.39 is confirmed to work)

Zoning Notice

You have to make sure the NPIV wwpns are zoned to see ALL STORAGE the cluster hosts can see, even if the VM does not attach to the storage.

Example Configuration

Enabling NPIV

VM Name:  npiv_test_01

1) First, shut down the VM.
2) Log into the vSphere web client and find the VM in your inventory.
3) Edit the settings of the VM.
4) Go to the VM Options Tab, then to “Fibre Channel NPIV”
5)  Uncheck the “Temporarily disable NPIV for this Virtual Machine”
6)  Select “Generate new WWNs”
7)  Select the appropriate number of wwpns vs wwnns.
For one FC fabric, one WWNN and one WWPN should work.  For two, I’d do one WWNN and two WWPNs to simulate a dual-port HBA,each connected to one fabric for redundancy.
8)  Click OK and exit.
9)  Edit settings again and note the WWPN(s) generated for your VM. You will need these later.  For purposes of this exercise, we will assume the following:

NPIV1

Node WWN:
28:2c:00:0c:29:00:00:30

Port WWNs:
28:2c:00:0c:29:00:00:31, 28:2c:00:0c:29:00:00:32

Connect your RDM to it at this time.  When we are done, the VM should be accessing your RDM via your NPIV-generated wwpns.  If this fails for some reason, it will fall back on the wwpns of the ESXi host HBAs.  Remember, it will ONLY see RDMs this way, not .vmdk disks sitting in a datastore.  This ALWAYS go through the ESXi host wwpns.

Now, before you power up the VM, you have to set the zoning and LUN masking or you will have issues while booting or vMotioning it.

FC Fabric Zoning and Configuration

Verify NPIV is enabled

Run the following command from a Brocade backbone

portcfgshow 1/26
(Where 1/26 is the slot/port of your ESXi host HBA port)

NPIV2

Verify the NPIV Capability setting is set to ON (which should be the default for Brocade)

Zone the Fabric(s)

1)  Create a FC alias on each fabric.  Each one will contain one of the two NPIV wwpns you generated above.  On a Brocade backbone:

Fabric 1
alicreate npiv_test_01_hba0,28:2c:00:0c:29:00:00:31

Fabric 2
alicreate npiv_test_01_hba1,28:2c:00:0c:29:00:00:32

2) Create your zones.  Recall that you *must* zone the NPIV wwpn to ALL storage systems your ESXi cluster hosts can see, even if the VM will never see or use the storage!

Fabric 1
zonecreate npiv_test_01_hba0_to_storage_array_01_1a,”ah_npiv_test_01_hba0; storage_array_01_1a”
cfgadd “mycfg”,”npiv_test_01_hba0_to_storage_array_01_1a”
.. add any additional zones for other storage systems…

Fabric 2
zonecreate npiv_test_01_hba1_to_storage_array_01_1b,”ah_npiv_test_01_hba1; storage_array_01_1b”
cfgadd “mycfg”,”npiv_test_01_hba1_to_storage_array_01_1b”
.. add any additional zones for other storage systems…

3) Save and commit your config.

cfgsave
cfgenable mycfg

Mask Your Luns

Now you have to mask your LUNs.  It’s best to add the wwpns for your NPIV VM to the same host or initiator group as your ESXi hosts.  In my environment, we create one group for all wwpns for all ESXI HBAs in the given cluster.  Here is an example:

esxi_host_01
HBA1 (goes to fabric 1):  59:00:00:00:00:00:00:01
HBA2 (goes to fabric 2):  59:00:00:00:00:00:00:02

esxi_host_02
HBA1 (goes to fabric 1):  59:00:00:00:00:00:00:03
HBA2 (goes to fabric 2):  59:00:00:00:00:00:00:04

Your existing host group will look like this:
esxi_cluster_01_target_1a:  59:00:00:00:00:00:00:01,59:00:00:00:00:00:00:03
esxi_cluster_01_target_1b:  59:00:00:00:00:00:00:02,59:00:00:00:00:00:00:04

So in this case, you’d add your fabric1 NPIV address to esxi_cluster_01_target_1a and your fabric2 wwpn to esxi-cluster_01_target_1b:

esxi_cluster_01_target_1a:
59:00:00:00:00:00:00:01,59:00:00:00:00:00:00:03,28:2c:00:0c:29:00:00:31
esxi_cluster_01_target_1b:
59:00:00:00:00:00:00:02,59:00:00:00:00:00:00:04,28:2c:00:0c:29:00:00:32

There is some flexibility at this point.  Different storage systems mask differently, and I don’t think NPIV will freak out if it can’t see the LUN.  In this case, it just won’t work and will fall back to the ESXi host wwpn.

Finishing the Configuration

Now you can power up the VM.  I’d advise you to watch the vmkernel.log as the VM boots so you can see if it worked.  Log into the ESXi host using SSH and use the following command:

tail -f /var/log/vmkernel.log

Messages like below indicate success.

2015-03-20T23:13:32.838Z cpu30:61525)ScsiNpiv: 1701: Physical Path : adapter=vmhba3, channel=0, target=3, lun=2

2015-03-20T23:13:32.838Z cpu30:61525)ScsiNpiv: 1160: NPIV vport rescan complete, [2:2] (0x41092fd6c1c0) [0x41092fd42440] vport exists

2015-03-20T23:13:32.838Z cpu30:61525)ScsiNpiv: 1848: Vport Create status for world:61526 num_wwpn=2, num_vports=4, paths=4, errors=0

If it posts within a minute or two, you should be OK.  If it takes a lot longer, you probably have an issue.

You can verify it worked by using this command:

cat /var/log/vmkernel.log | grep -i num_vport

…num_wwpn=2, num_vports=4, paths=4, errors=0

If you see errors=0, you should be OK.  You should see a vPort per path to the LUN, and no errors.

Verification

There are a number of ways to verify NPIV is working.  I used my VirtualInstruments software to verify it could see the NPIV wwpn.  I created a Host entity and added the two NPIV wwpns as HBA_port objects to it, then graphed it:

NPIV3

This tells me without a doubt an external system can see the traffic from the VM through the NPIV port.

Troubleshooting

I had a number of issues getting this working, but I was able to figure them all out with some help from VMware tech support.  I’ll detail them below:

Error code bad00030 

Translated, this means VMK_NOT_FOUND.  Basically, it means no LUN paths could be found via the NPIV wwpns.  In my case, this was due to a bad driver.  On my Dell PowerEdge 710/720 servers, I had to install the qla-2xxx driver as opposed to the qlnativefc driver to get NPIV to work. I have a separate post forthcoming that details this procedure.

Error code bad0040

Like bad00030, this indicates NPIV can’t reach the LUN via your npiv wwpns.  I watched the log and noticed it would scan each path up to 5 times for each HBA, throwing this error each time until it timed out.  This would take 15 or more minutes, then the VM would eventually post and come up.  If you tried to vMotion it, it would try to enable NPIV on the host it was moving to, time out, then try and fail back to the original host and go through the process again and time out all over again.  This would essentially hang the VM for up to 30 minutes before VSphere would drop the VM on its’ face and power it off.  The VMKernel messages look like this:

2015-03-13T18:15:35.770Z cpu2:330868)WARNING: ScsiPsaDriver: 1272: Failed adapter create path; vport:vmhba64 with error: bad0040
2015-03-13T18:15:37.772Z cpu2:330868)ScsiNpiv: 1152: NPIV vport rescan complete, [3:1] (0x41092fd6ad40) [0x41092fd3fe40] status=0xbad0040
2015-03-13T18:15:37.772Z cpu2:330868)WARNING: ScsiNpiv: 1788: Failed to Create vport for world 330869, vmhba2, rescan failed, status=bad0001
2015-03-13T18:15:37.773Z cpu24:33516)ScsiAdapter: 2806: Unregistering adapter vmhba64

Basically, you can see it creates vmhba64 (This is your virtual NPIV adapter.  The number after vmhba varies).  It tries to scan for your RDM LUN (Target 3, LUN id 1) and fails.  After several retries, it gives up and deletes the vmhba.

This was an issue caused by the fact that I did not zone my npiv wwpns to ALL storage systems my ESXi hosts were connected to.  We had zoned our ESXi cluster to a NetApp storage system, but at some point disconnected all of the LUNs from the ESXi cluster.  Even though the ESXI hosts didn’t need the zones, and the VM certainly did not have any LUNs from that storage system attached, not zoning the NPIV wwpns to that storage system broke NPIV.

How I figured this out was I turned on verbose logging for the qlnativefc driver using this command:

esxcfg-module -s ‘ql2xextended_error_logging=1’ qlnativefc

Then reboot the host for it to take effect.

Disable it later with:

esxcfg-module -s ‘ql2xextended_error_logging=0’ qlnativefc

Then reboot again.

After that, I ran the following:

/usr/lib/vmware/vmkmgmt_keyval/vmkmgmt_keyval -a | less

There is a section in this output for each vmhba, and it lists the target ports the physical ESXi hba sees and the target numbers it assigned to them:

vmhba2

FC Target-Port List:

scsi-qla0-target-0=500000000bbb1155:01ac00:0:Online;
scsi-qla0-target-1=500000000bbb1166:015400:1:Online;
scsi-qla0-target-2=500000000aaa1102:01d800:2:Online;
scsi-qla0-target-3=500000000aaa1103:019800:3:Online;

You can see the wwpns of your storage system target ports, followed by the internal ID ESXi assigns to it (01ac00), and its status.  In my case, the first two target ports were from the storage system I didn’t realize was connected.

With the verbose logging turned on, I ran tail -f /var/log/vmkernel.log while the VM booted up and noted the following:

2015-05-04T19:44:08.112Z cpu19:33504)qlnativefc: vmhba64(4:0.0): GID_PT entry – nn 282c000c29000030 pn 282c000c29000031 portid=012b01.
2015-05-04T19:44:08.113Z cpu19:33504)qlnativefc: vmhba64(4:0.0): GID_PT entry – nn 500000000aaa1101 pn 500000000aaa1102 portid=012900.
2015-05-04T19:44:08.113Z cpu19:33504)qlnativefc: vmhba64(4:0.0): GID_PT entry – nn 500000000aaa1101 pn 500000000aaa1103 portid=016900.
2015-05-04T19:44:08.132Z cpu19:33504)qlnativefc: vmhba64(4:0.0): device wrap (016900)
2015-05-04T19:44:08.132Z cpu19:33504)qlnativefc: vmhba64(4:0.0): Trying Fabric Login w/loop id 0x0008 for port 012900.
2015-05-04T19:44:08.132Z cpu19:33504)qlnativefc: vmhba64(4:0.0): qla24xx_login_fabric(6): failed to complete IOCB — completion status (31) ioparam=1b/fffc01.
2015-05-04T19:44:08.132Z cpu19:33504)qlnativefc: vmhba64(4:0.0): Trying Fabric Login w/loop id 0x0009 for port 012900.
2015-05-04T19:44:08.135Z cpu19:33504)qlnativefc: vmhba64(4:0.0): Fabric Login successful w/loop id 0x0009 for port 012900.
2015-05-04T19:44:08.135Z cpu19:33504)qlnativefc: vmhba64(4:0.0): Assigning new target ID 0 to fcport 0x410a6c407e00
2015-05-04T19:44:08.135Z cpu19:33504)qlnativefc: vmhba64(4:0.0): fcport 500000000aaa1102 (targetId = 0) ONLINE
2015-05-04T19:44:08.135Z cpu19:33504)qlnativefc: vmhba64(4:0.0): Trying Fabric Login w/loop id 0x000a for port 016900.
2015-05-04T19:44:08.138Z cpu19:33504)qlnativefc: vmhba64(4:0.0): Fabric Login successful w/loop id 0x000a for port 016900.
2015-05-04T19:44:08.138Z cpu19:33504)qlnativefc: vmhba64(4:0.0): Assigning new target ID 1 to fcport 0x410a6c410500
2015-05-04T19:44:08.138Z cpu19:33504)qlnativefc: vmhba64(4:0.0): fcport 500000000aaa1103 (targetId = 1) ONLINE
2015-05-04T19:44:08.138Z cpu19:33504)qlnativefc: vmhba64(4:0.0): LOOP READY

The lines that start with GID_PT show the target ports that the NPIV wwpn sees (this is a separate discovery process than the ESXi HBA).  You notice it only sees two of the target ports.

If we concentrate on the first target port it sees, it’s labeled ID 12900.  Later, you see it assigns target id 0 to the target port.  However, from the above steps, you can see using the vmkmgmt_keyval command the ESXi HBA sees wwpn 500000000aaa1102 as Target ID 2.  They don’t match, and thus NPIV fails to work with the bad0040 error.

Once I saw this, I traced the extra wwpns back to the storage system I realized I was connected to and remove it from the zoning of the ESXi HBA ports.  After a rescan of the storage on the ESXi cluster and a reboot of the hosts to be safe, I booted the VM up again and viola!

Summary

Make sure you are meeting all of the requirements for NPIV.  VERIFY you are zoning your NPIV wwpns to ALL STORAGE SYSTEMS the cluster can see, even if it will not use any storage from the storage systems.  Be aware that if you connect any new storage systems to a cluster with one or more NPIV-enabled  VMs, you will need to add the new zones so the NPIV wwpns can see the new storage system target ports, or it will probably start failing.

Setting Up Libvirt on an Ubuntu Machine to Power on ESXi VMs

I recently setup a MAAS cluster on Ubuntu Linux in my home lab so that I could play around with Openstack and Juju.  I wanted MAAS to be able to automatically power on and off my newly-provisioned vms, but I had a hard time finding an easy-to-follow set of instructions on how to get this working on VMware ESXi.  I did eventually get it working and put together a procedure below.  Hopefully it helps someone out.

My Environment

  • VMware ESXi (Standalone, no VirtualCenter server) version 5.5.0
  • Ubuntu MAAS on VMware VMs (no physical servers)
  • libvirt version 1.2.8
  • Ubuntu Linux 14.04

Procedure

Install and Build the libvirt 1.2.8

cd /tmp
wget
http://libvirt.org/sources/libvirt-1.2.8.tar.gz

  • Untar the file

tar -xzf /tmp/libvirt-1.2.8.tar.gz -C /tmp/

  • Uninstall any existing version of libvir

sudo apt-get remove libvirt-bin

  • Install the prerequisites

sudo apt-get install gcc make libxml2-dev libgnutls-dev libdevmapper-dev libcurl4-gnutls-dev  libpciaccess-dev  libnl-dev

  • Make and Install the new version
    • Specify the /usr directory when calling the configure command to specify where the non-architecture specific stuff gets installed

cd /tmp/libvirt-1.2.8
./configure –prefix=/usr –with-esx
make
sudo make install

Setup Authentication

You have to setup the /etc/libvirt/auth.conf file with credentials in order to use virsh without entering a password.  This is necessary for MAAS to power nodes up automatically.

sudo nano /etc/libvirt/auth.conf

virsh1

  • The credentials-esx block defines the credentials
    • credentials-esx  #  The esx in this part defines the name of the credential set
  • The second block defines the service and hostname
    • auth-esx #  This defines which credential block to use.  In this case, the esx credentials
    • auth-esx-hostname #  This defines the host to use the esx credentials for
  • Restart libvirt-bin

sudo service libvirt-bin restart

Test your Configuration

You can make a test connection to your ESXi host to test your virsh configuration with the command below:

virsh -c esx://root@esxhostname1?no_verify=1 list –all

This will prompt you for your root password and then list all VMs on the VM host.

3/25/2015 – UPDATE:  I found that this will only work on the paid (non-evaulation) version of vSphere ESXi.  If you have the eval version, it will work until your eval expires and reverts to the unpaid license.  MAAS will silently fail to power on or off VMs when this happens.  You will get this error when trying to run the command manually from the CLI:

virsh -c esx://root@esxhostname?no_verify=1 start –domain myvmname

error:  Failed to start the domain myvmname
error:  internal error:  HTTP response code 500 for call to ‘PowerOnVM_Task’.  Fault:  ServerFaultCode – Current license or ESXi version prohibits executio nof the requested operation.

Pretty clear, I guess.

Configuring MAAS for Virsh

Now you have to configure your new MAAS nodes with virsh.  To do this, we’ll simulate adding a new node to a MAAS cluster.

  • First, use the VSphere client to connect to your ESXi host and create a new VM.
  • Make sure and set the OS Type to Ubuntu Linux (I use 64-bit)
  • Once the vm is created, right-click it and Edit Settings
  • Make sure you go to the Boot tab and force it to the BIOS for the next boot.

virsh3

  • Start the new VM and go into the BIOS.
  • Go to the boot tab in the BIOS and set the network card to be the first boot option (Use SHIFT plus to move it up)

virsh4

  • Go to the Exit menu and choose Save.
  • Power off the VM.
  • Edit the settings again
  • Select the network card and copy the MAC address

virsh5

virsh2

  • Fill out the fields as specified:

virsh6

  • Power Type:  Select virsh from the drop down
  • Power Address:  esx://root@esxhost1/system?no_verify=1
    • Where esxhost1 is the name of your ESXi host
  • Power ID:  The name of the VM as displayed in the VM inventory in the vSphere client.
  • Now click “Add an additional MAC address”
  • Paste in the MAC address of the VMs virtual network card you got above.

Done!  I’ve not found a way to get past manually booting the node for the first time.  After you manually power it on once, it will enlist and shut itself off.  From there on out, you can click the node in the MAAS web interface and click “Start Node” or “Stop Node” to power it on and off.

References

http://www.gremwell.com/node/155
http://www.libvirt.org
http://manpages.ubuntu.com/manpages/lucid/man1/virsh.1.html

How to Setup Free VMware VM Backups with ThinWare vBackup

I run a VMware ESXi (free edition) host server with several VMs in my home lab.  I’ve been on the lookout for some time for a backup product for them.  Since  I’m not a company, I prefer the price of “free”.  There are some products out there, but most of them have onerous requirements unless you get the “pay” version, which is pretty far out of most of our price ranges.

I did find Thinware vBackup.  I takes a bit to setup, but it’s free, works well, and has a CLI you can use to schedule backups so you can automatically back up one or more VMs without user intervention

Below I’ve detailed my setup experience and how to get your own free backups running.

Prerequisites

–  Thinware vBackup is a Windows product.  You need a Windows machine to run the backup software from.
–  For the free version, you need VMWare Converter Standalone installed on the Windows backup machine.
–  You also need the VMware Virtual Disk Development Kit from VMware’s website.

Licensing

You need a license to use the software, even for the free Standard edition.  To get the license, you have to do a few things first:

First, you need to visit Thinware’s website and register for an account.  Registering is free.  Once you’ve done that, you can download the latest vBackup product version (v 4.0 as of this writing).

Decide which Windows machine will run your backups.  Then install the product on that machine.  Once installed you can launch it and it will immediately complain about licensing. Go to the licensing screen (Click “Tools” in the drop-down menu and select “Configure Licensing”).  The licensing screen will show you the hardware ID of your machine.  You *must* make sure and do this from the machine you will be running your backups from!  I’m pretty confident the key you get will only work on that machine.

Licensing

Once you’ve obtained your Hardware ID, then you go to the thinware.net site again, click the Products tab then click the Request a License link under the Standard Edition column.  Fill out the form thoroughly.  Do not forget to fill out your organization name (or your name if this is for personal use) and your FULL address (be sure to include City, State, and Zip code)  Also fill in the contact name and e-mail address.  If you forget to include all of this, your license will be denied (then you have to do it again).  They will email your license key in a few days.  Plug they key into the licensing screen on the software and you are ready to rock.

Adding your VM Entities

Once you are licensed, you have to tell Thinware what VM hosts and VMs it will be dealing with.  For each VM host, click Inventory -> Add Host server.

Incidentally, the software supports connecting to vCenter but that’s outside the scope of what we’re doing here today.

Fill in the details of your server including hostname and login credentials.  The port defaults will suffice in most environments and Management Server only applies if you have a vCenter server running.

AddHost1

The next screens allow you to review your details, validate connectivity and add any VMs discovered on the host.  In my environment, it didn’t detect the VMs automatically.  I had to add them later.  Finally you assign a license to the host, review the settings again and finish.

Right-click the newly-added host and click “add virtual machine” to add new machines.

AddVM1

Create a Backup Job

Before you can run backups, you have to define a backup job for each VM.  Click a VM and on the left-side pane click the Jobs tab.  Right-click the empty pane below and click “Add Job”.

I recommend you make the job name [vmname]-backup.  This makes it easy to remember later, but you can name it whatever you want.  You have three backup types to choose:

  • Backup-Image-SSH
    • This is the only option you must choose if you are using the free version and are running ESXi 5.x.
    • NOTE:  The SSH service needs to be enabled on the host server.
  • Image-VADP
    • This only works with the “paid” version of VMware VSphere.
    • Uses the vStorage APIs to perform the backup.
  • Image-VCB
    • This uses VMware Consolidated Backup, which must be installed on your machine.
    • Used for older, vSphere 4.x and older

I chose the Backup-Image-SSH, which works great for a small home environment with free ESXi.

BackupJob1

On the next screens, you specify the root directory you want your backups to go in (e.g. C:\thinware).  It will create a subfolder for each VM automatically when you run the first backup.  Specify the number of backups to keep before it automatically purges old backups.  Disk Exclusion is cool, and allows you to skip certain virtual disks from the backup process.  However, this does not work with the standard (free) version of Thinware.

The next screen has you configure quiescing of the guest file system (recommended).  Compression can be configured here too at three levels:  none, basic and advanced.  Unfortunately, you can’t choose any compression if you have the free version of Thinware.

Once complete, you submit the job and it shows up in the jobs pane.  You can right-click it and “Execute Now” to run it on-demand.

Scheduling a Backup Job

Scheduling a backup job is pretty easy.  From your Windows station with Thinware installed open the task scheduler.  Add a new task.  For the action, you will use the following:

“C:\program files (x86)\Thinware\vBackup\vBackup.exe” -v vmname -j backup-job-name

Example:
“C:\program files (x86)\Thinware\vBackup\vBackup.exe” -v myvm01 -j myvm01-backup

Easy enough.  Each backup takes a full image backup of the VM.  Depending on how large your .vmdk files are, these can get rather large.  With the free version, there’s no differential/incremental type backup scheme.   For the price in a home lab though, I can’t complain.

Enjoy!