Setting up and troubleshooting NPIV on VSphere

I recently setup NPIV for several of my VMs in my production environment.  Let’s just say it was an adventure.  I wanted to detail my experience below, including some step-by-step instructions for setting it up and some advanced troubleshooting I had to do to resolve my issues.

My environment consists of VSphere ESXi 5.5 hosts with Brocade DCX backbones.  Though this setup revolves around my particular setup, the steps here should work with most setups.

Definition

NPIV is the process where a Fiber Channel switch port can advertise multiple Port WWPNs for the same fiber channel connection.  This allows you to have a single switch port represent the WWPN of an ESXI host port and also a virtualized WWPN assigned directly to a VM.  Cisco UCS also does this, with the physical ports of the edge device representing the different virtual WWPNs of the different virtual HBAs on the service profiles provisioned on the Cisco hardware.

This is useful when you want to identify a particular VMs fiber channel traffic by identifying it by its own unique WWPN.  Also, with Hitachi Server Priority Manager (QOS), you have to assign QOS limits by selecting a WWPN to assign a quota to.  Without NPIV, the best you could do is limit the ESXi server HBA to a limit, thereby limiting ALL vms on the ESXi host.

We use VirtualWisdom, which is a Fiber Channel performance and health monitoring software package made by VirtualInstruments.  With NPIV, we can now trace IOPS, MB/s and latency right to the VM as opposed to ESXi HBA port.

Requirements

NPIV has some strict requirements, and failing to adhere to them can give you headaches.

  • Your switch has to support NPIV (Most modern switches, including DCX backbones do)
  • NPIV needs to be enabled on your switch ports (Enabled by default on DCX backbones)
  • ESXi needs to support NPIV (versions past 4.0 do for sure.  I see references to it back to v3.5)
  • NPIV can only be used in conjunction with RDM devices.  .vmdk disks from a datastore are still identified by the ESXi host’s wwpns and cannot use NPIV.
  • The driver on the ESXi host has to support NPIV (Esxi qlnative drivers on 1.1.39 is confirmed to work)

Zoning Notice

You have to make sure the NPIV wwpns are zoned to see ALL STORAGE the cluster hosts can see, even if the VM does not attach to the storage.

Example Configuration

Enabling NPIV

VM Name:  npiv_test_01

1) First, shut down the VM.
2) Log into the vSphere web client and find the VM in your inventory.
3) Edit the settings of the VM.
4) Go to the VM Options Tab, then to “Fibre Channel NPIV”
5)  Uncheck the “Temporarily disable NPIV for this Virtual Machine”
6)  Select “Generate new WWNs”
7)  Select the appropriate number of wwpns vs wwnns.
For one FC fabric, one WWNN and one WWPN should work.  For two, I’d do one WWNN and two WWPNs to simulate a dual-port HBA,each connected to one fabric for redundancy.
8)  Click OK and exit.
9)  Edit settings again and note the WWPN(s) generated for your VM. You will need these later.  For purposes of this exercise, we will assume the following:

NPIV1

Node WWN:
28:2c:00:0c:29:00:00:30

Port WWNs:
28:2c:00:0c:29:00:00:31, 28:2c:00:0c:29:00:00:32

Connect your RDM to it at this time.  When we are done, the VM should be accessing your RDM via your NPIV-generated wwpns.  If this fails for some reason, it will fall back on the wwpns of the ESXi host HBAs.  Remember, it will ONLY see RDMs this way, not .vmdk disks sitting in a datastore.  This ALWAYS go through the ESXi host wwpns.

Now, before you power up the VM, you have to set the zoning and LUN masking or you will have issues while booting or vMotioning it.

FC Fabric Zoning and Configuration

Verify NPIV is enabled

Run the following command from a Brocade backbone

portcfgshow 1/26
(Where 1/26 is the slot/port of your ESXi host HBA port)

NPIV2

Verify the NPIV Capability setting is set to ON (which should be the default for Brocade)

Zone the Fabric(s)

1)  Create a FC alias on each fabric.  Each one will contain one of the two NPIV wwpns you generated above.  On a Brocade backbone:

Fabric 1
alicreate npiv_test_01_hba0,28:2c:00:0c:29:00:00:31

Fabric 2
alicreate npiv_test_01_hba1,28:2c:00:0c:29:00:00:32

2) Create your zones.  Recall that you *must* zone the NPIV wwpn to ALL storage systems your ESXi cluster hosts can see, even if the VM will never see or use the storage!

Fabric 1
zonecreate npiv_test_01_hba0_to_storage_array_01_1a,”ah_npiv_test_01_hba0; storage_array_01_1a”
cfgadd “mycfg”,”npiv_test_01_hba0_to_storage_array_01_1a”
.. add any additional zones for other storage systems…

Fabric 2
zonecreate npiv_test_01_hba1_to_storage_array_01_1b,”ah_npiv_test_01_hba1; storage_array_01_1b”
cfgadd “mycfg”,”npiv_test_01_hba1_to_storage_array_01_1b”
.. add any additional zones for other storage systems…

3) Save and commit your config.

cfgsave
cfgenable mycfg

Mask Your Luns

Now you have to mask your LUNs.  It’s best to add the wwpns for your NPIV VM to the same host or initiator group as your ESXi hosts.  In my environment, we create one group for all wwpns for all ESXI HBAs in the given cluster.  Here is an example:

esxi_host_01
HBA1 (goes to fabric 1):  59:00:00:00:00:00:00:01
HBA2 (goes to fabric 2):  59:00:00:00:00:00:00:02

esxi_host_02
HBA1 (goes to fabric 1):  59:00:00:00:00:00:00:03
HBA2 (goes to fabric 2):  59:00:00:00:00:00:00:04

Your existing host group will look like this:
esxi_cluster_01_target_1a:  59:00:00:00:00:00:00:01,59:00:00:00:00:00:00:03
esxi_cluster_01_target_1b:  59:00:00:00:00:00:00:02,59:00:00:00:00:00:00:04

So in this case, you’d add your fabric1 NPIV address to esxi_cluster_01_target_1a and your fabric2 wwpn to esxi-cluster_01_target_1b:

esxi_cluster_01_target_1a:
59:00:00:00:00:00:00:01,59:00:00:00:00:00:00:03,28:2c:00:0c:29:00:00:31
esxi_cluster_01_target_1b:
59:00:00:00:00:00:00:02,59:00:00:00:00:00:00:04,28:2c:00:0c:29:00:00:32

There is some flexibility at this point.  Different storage systems mask differently, and I don’t think NPIV will freak out if it can’t see the LUN.  In this case, it just won’t work and will fall back to the ESXi host wwpn.

Finishing the Configuration

Now you can power up the VM.  I’d advise you to watch the vmkernel.log as the VM boots so you can see if it worked.  Log into the ESXi host using SSH and use the following command:

tail -f /var/log/vmkernel.log

Messages like below indicate success.

2015-03-20T23:13:32.838Z cpu30:61525)ScsiNpiv: 1701: Physical Path : adapter=vmhba3, channel=0, target=3, lun=2

2015-03-20T23:13:32.838Z cpu30:61525)ScsiNpiv: 1160: NPIV vport rescan complete, [2:2] (0x41092fd6c1c0) [0x41092fd42440] vport exists

2015-03-20T23:13:32.838Z cpu30:61525)ScsiNpiv: 1848: Vport Create status for world:61526 num_wwpn=2, num_vports=4, paths=4, errors=0

If it posts within a minute or two, you should be OK.  If it takes a lot longer, you probably have an issue.

You can verify it worked by using this command:

cat /var/log/vmkernel.log | grep -i num_vport

…num_wwpn=2, num_vports=4, paths=4, errors=0

If you see errors=0, you should be OK.  You should see a vPort per path to the LUN, and no errors.

Verification

There are a number of ways to verify NPIV is working.  I used my VirtualInstruments software to verify it could see the NPIV wwpn.  I created a Host entity and added the two NPIV wwpns as HBA_port objects to it, then graphed it:

NPIV3

This tells me without a doubt an external system can see the traffic from the VM through the NPIV port.

Troubleshooting

I had a number of issues getting this working, but I was able to figure them all out with some help from VMware tech support.  I’ll detail them below:

Error code bad00030 

Translated, this means VMK_NOT_FOUND.  Basically, it means no LUN paths could be found via the NPIV wwpns.  In my case, this was due to a bad driver.  On my Dell PowerEdge 710/720 servers, I had to install the qla-2xxx driver as opposed to the qlnativefc driver to get NPIV to work. I have a separate post forthcoming that details this procedure.

Error code bad0040

Like bad00030, this indicates NPIV can’t reach the LUN via your npiv wwpns.  I watched the log and noticed it would scan each path up to 5 times for each HBA, throwing this error each time until it timed out.  This would take 15 or more minutes, then the VM would eventually post and come up.  If you tried to vMotion it, it would try to enable NPIV on the host it was moving to, time out, then try and fail back to the original host and go through the process again and time out all over again.  This would essentially hang the VM for up to 30 minutes before VSphere would drop the VM on its’ face and power it off.  The VMKernel messages look like this:

2015-03-13T18:15:35.770Z cpu2:330868)WARNING: ScsiPsaDriver: 1272: Failed adapter create path; vport:vmhba64 with error: bad0040
2015-03-13T18:15:37.772Z cpu2:330868)ScsiNpiv: 1152: NPIV vport rescan complete, [3:1] (0x41092fd6ad40) [0x41092fd3fe40] status=0xbad0040
2015-03-13T18:15:37.772Z cpu2:330868)WARNING: ScsiNpiv: 1788: Failed to Create vport for world 330869, vmhba2, rescan failed, status=bad0001
2015-03-13T18:15:37.773Z cpu24:33516)ScsiAdapter: 2806: Unregistering adapter vmhba64

Basically, you can see it creates vmhba64 (This is your virtual NPIV adapter.  The number after vmhba varies).  It tries to scan for your RDM LUN (Target 3, LUN id 1) and fails.  After several retries, it gives up and deletes the vmhba.

This was an issue caused by the fact that I did not zone my npiv wwpns to ALL storage systems my ESXi hosts were connected to.  We had zoned our ESXi cluster to a NetApp storage system, but at some point disconnected all of the LUNs from the ESXi cluster.  Even though the ESXI hosts didn’t need the zones, and the VM certainly did not have any LUNs from that storage system attached, not zoning the NPIV wwpns to that storage system broke NPIV.

How I figured this out was I turned on verbose logging for the qlnativefc driver using this command:

esxcfg-module -s ‘ql2xextended_error_logging=1’ qlnativefc

Then reboot the host for it to take effect.

Disable it later with:

esxcfg-module -s ‘ql2xextended_error_logging=0’ qlnativefc

Then reboot again.

After that, I ran the following:

/usr/lib/vmware/vmkmgmt_keyval/vmkmgmt_keyval -a | less

There is a section in this output for each vmhba, and it lists the target ports the physical ESXi hba sees and the target numbers it assigned to them:

vmhba2

FC Target-Port List:

scsi-qla0-target-0=500000000bbb1155:01ac00:0:Online;
scsi-qla0-target-1=500000000bbb1166:015400:1:Online;
scsi-qla0-target-2=500000000aaa1102:01d800:2:Online;
scsi-qla0-target-3=500000000aaa1103:019800:3:Online;

You can see the wwpns of your storage system target ports, followed by the internal ID ESXi assigns to it (01ac00), and its status.  In my case, the first two target ports were from the storage system I didn’t realize was connected.

With the verbose logging turned on, I ran tail -f /var/log/vmkernel.log while the VM booted up and noted the following:

2015-05-04T19:44:08.112Z cpu19:33504)qlnativefc: vmhba64(4:0.0): GID_PT entry – nn 282c000c29000030 pn 282c000c29000031 portid=012b01.
2015-05-04T19:44:08.113Z cpu19:33504)qlnativefc: vmhba64(4:0.0): GID_PT entry – nn 500000000aaa1101 pn 500000000aaa1102 portid=012900.
2015-05-04T19:44:08.113Z cpu19:33504)qlnativefc: vmhba64(4:0.0): GID_PT entry – nn 500000000aaa1101 pn 500000000aaa1103 portid=016900.
2015-05-04T19:44:08.132Z cpu19:33504)qlnativefc: vmhba64(4:0.0): device wrap (016900)
2015-05-04T19:44:08.132Z cpu19:33504)qlnativefc: vmhba64(4:0.0): Trying Fabric Login w/loop id 0x0008 for port 012900.
2015-05-04T19:44:08.132Z cpu19:33504)qlnativefc: vmhba64(4:0.0): qla24xx_login_fabric(6): failed to complete IOCB — completion status (31) ioparam=1b/fffc01.
2015-05-04T19:44:08.132Z cpu19:33504)qlnativefc: vmhba64(4:0.0): Trying Fabric Login w/loop id 0x0009 for port 012900.
2015-05-04T19:44:08.135Z cpu19:33504)qlnativefc: vmhba64(4:0.0): Fabric Login successful w/loop id 0x0009 for port 012900.
2015-05-04T19:44:08.135Z cpu19:33504)qlnativefc: vmhba64(4:0.0): Assigning new target ID 0 to fcport 0x410a6c407e00
2015-05-04T19:44:08.135Z cpu19:33504)qlnativefc: vmhba64(4:0.0): fcport 500000000aaa1102 (targetId = 0) ONLINE
2015-05-04T19:44:08.135Z cpu19:33504)qlnativefc: vmhba64(4:0.0): Trying Fabric Login w/loop id 0x000a for port 016900.
2015-05-04T19:44:08.138Z cpu19:33504)qlnativefc: vmhba64(4:0.0): Fabric Login successful w/loop id 0x000a for port 016900.
2015-05-04T19:44:08.138Z cpu19:33504)qlnativefc: vmhba64(4:0.0): Assigning new target ID 1 to fcport 0x410a6c410500
2015-05-04T19:44:08.138Z cpu19:33504)qlnativefc: vmhba64(4:0.0): fcport 500000000aaa1103 (targetId = 1) ONLINE
2015-05-04T19:44:08.138Z cpu19:33504)qlnativefc: vmhba64(4:0.0): LOOP READY

The lines that start with GID_PT show the target ports that the NPIV wwpn sees (this is a separate discovery process than the ESXi HBA).  You notice it only sees two of the target ports.

If we concentrate on the first target port it sees, it’s labeled ID 12900.  Later, you see it assigns target id 0 to the target port.  However, from the above steps, you can see using the vmkmgmt_keyval command the ESXi HBA sees wwpn 500000000aaa1102 as Target ID 2.  They don’t match, and thus NPIV fails to work with the bad0040 error.

Once I saw this, I traced the extra wwpns back to the storage system I realized I was connected to and remove it from the zoning of the ESXi HBA ports.  After a rescan of the storage on the ESXi cluster and a reboot of the hosts to be safe, I booted the VM up again and viola!

Summary

Make sure you are meeting all of the requirements for NPIV.  VERIFY you are zoning your NPIV wwpns to ALL STORAGE SYSTEMS the cluster can see, even if it will not use any storage from the storage systems.  Be aware that if you connect any new storage systems to a cluster with one or more NPIV-enabled  VMs, you will need to add the new zones so the NPIV wwpns can see the new storage system target ports, or it will probably start failing.

9 thoughts on “Setting up and troubleshooting NPIV on VSphere

  1. Brian Miller

    This is really great info. One question: Does the VM involved need to *only* have RDM mapped drives to it? Or can it be a mix of regular mapped storage through the hosts (say the OS drive) and then the needed drives (SQL dat, index, log, and quorum drives) mapped as RDM?

    Liked by 1 person

    Reply
    1. Brandon Post author

      NPIV only affects RDM drives. You can still have other drives that are attached as .vmdk disks from a datastore. I’m pretty sure however traffic to those drives will not show up under the NPIV-enabled wwpn.

      Like

      Reply
  2. Vadim

    On of most detailed articles regarding NPIV on ESXi seen so far, tnx a lot.

    are virtual wwns somehow bound to physical HBA or wwns can jump from one HBA to another after VM or host reboot?
    can wwns be bound to specific HBA, in case, for example, when you have 4 HBA but want to use only specific 2.

    Like

    Reply
    1. Brandon Post author

      Hi there, from what I’ve seen the virtual WWPNs can be assigned to either (or both) physical HBAs. I never did this however. What I did was assign a virtual WWPN to a physical HBA in my planning, then I zoned it that way. Meaning:

      vwwpn1 -> physical HBA1 -> Fabric 1
      vwwpn2 -> physical HBA2 -> Fabric 2

      If you zone it that way, things appeared good to me on the fabric switches. Fab1 saw vwwpn1 and fab2 saw vwwpn2.

      Hope that helps!

      Like

      Reply
      1. Niranjan

        On the VMWARE clien do you see the HBA adapters from its OS? Does the HBA WWPN get listed under /sys/class/fchost/ .. ?

        Regards,
        Niranjan

        Like

      2. Brandon Post author

        You mean the VM virtual wwpns, I presume. If so, then yes, the OS should reflect the virtual wwpns. I can’t specifically verify for your flavor of Linux/Unix, but I recall in Windows I could see the virtual wwpns by looking in the OS. I have no reason to believe you won’t see them in Linux/Unix as well.

        Like

  3. Kevin

    Looking for clarification as this is the first time I am actually considering using NPIV.

    We are running vSphere 6.0u2 and I am connected to two fabrics, all current model Cisco switches that are up to date. In this environment we are going 100% virtualized. Most of our Windows SQL clusters are being implemented via always-on, so they are not an issue; however, we have one cluster that needs to be implemented using traditional shared disk clustering. I was going to implement this with RDMs; however, I was debating using NPIV instead. Please correct me if I am wrong, but using NPIV, can I not just bypass vSphere all together and go straight to the SAN? I should be able to do the zoning using the WWPNs from the NPIV adapters and connect them directly to storage, thereby presenting LUNs to the VM without having to present them to vSphere first. If I have to first present them as RDMs to the VM, then that kind of defeats the purpose in my case.

    Thanks for your assistance in advance,
    Kevin

    Like

    Reply
    1. Brandon Post author

      Hi Kevin,
      NPIV does not replace RDMs. NPIV can only be implemented *on* RDMs. Whether you use NPIV or not, you will always have to present the storage to the ESXi hosts. When you use NPIV, you also have to present it to your new virtualized HBAs as well. It’s somewhat more complicated.

      The benefit is not that you can bypass vSphere, quite the opposite. The benefit is your VM can now be identified by unique HBA wwpns. Normally, you can only see that traffic is going to or from the ESXi hosts HBA wwpns, so you don’t know what VM traffic is what.

      It sounds given what you are describing that NPIV is *not* the way to go for you. It will complicate your environment even more, which is likely not what you want.

      Hope that helps!

      Like

      Reply
      1. Kevin

        Yeah, I was hoping it would be similar to iSCSI. I assumed that if I just zoned the LUN/SAN to both the physical HBA and the virtual HBA that I would then be able to present it to the VM and work with it as if it were zoned to a physical host. I may have to look into FCoE for possible future projects then to see if it can be done that way.

        Thanks again for the quick reply!
        Kevin

        Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s