Tag Archives: fiber channel

Updated FC Drivers on ESXi

I had an issue where I had to change out the driver ESXi uses for it’s QLogic fiber channel hbas.  Below is the procedure I used to make the change from the qlnativefc and qla-2xxx driver sets.  I was running ESXi 5.5, but the procedure for other versions is likely similar.

Updating the Qlogic Driver on an ESXi Host

ESXi hosts with Qlogic adapters use the qlnativefc driver out of the box.

On some servers, including Dell PowerEdge R7xx servers, the adapter does not support NPIV without a different driver.  Follow the procedure below to update the driver

Update the Qlogic Driver to an NPIV-enabled Driver

      1. Download the qlogic driver for Vmware.

VMware ESXi5.x FC-FCoE Driver for QLogic and OEM-branded Fibre Channel and Converged Network Adapter

https://my.vmware.com/web/vmware/details?downloadGroup=DT-ESXI5X-QLOGIC-QLA2XXX-9345320-1VMW&productId=285

Example:  qla2xxx-934.5.38.0-1872208.zip

      1. Extract the zip file and find the offine_bundle.zip file if it’s not already extracted.
      2. SCP the offline bundle file to a datastore on the ESXi host that all of the other hosts can see.

Example:   /vmfs/volumes/vmware_host_swap_03/qla2xxx-934.5.38.0-offline_bundle-1872208.zip

For each host:

      1. Log onto the VM host
      2. Put the host in maintenance mode.
      3. Do the following:

esxcli software vib install -d /vmfs/volumes/vmware_host_swap_01/qla2xxx-934.5.38.0-offline_bundle-1872208.zip

Remove the old driver

esxcli software vib remove -n qlnativefc

      1. Reboot the host
      2. Verify after the host is back up:

esxcli software vib list | grep -I qla2xxx

Rolling Back the Driver Upgrade

Vmware QLNativeFC Driver:

https://my.vmware.com/group/vmware/details?downloadGroup=DT-ESXI55-QLOGIC-QLNATIVEFC-11390-1&productId=353&download=true&fileId=838e130270df93ad2aca6bd64be27f06&secureParam=&uuId=08b248a1-29dc-4ce5-8da3-6353d2997bce&downloadType=

Example:  VMW-ESX-5.5.0-qlnativefc-1.1.39.0-2243137.zip

      1. Extract the zip file and find the offine_bundle.zip file if it’s not already extracted.
      2. SCP the offline bundle file to a datastore on the ESXi host that all of the other hosts can see.

Example:   /vmfs/volumes/ah-3270-2:vmware_host_swap_03/VMW-ESX-5.5.0-qlnativefc-1.1.39.0-offline_bundle-2243137.zip

For each host:

      1. Log onto the VM host
      2. Put the host in maintenance mode.
      3. Do the following:

esxcli software vib install -d  /vmfs/volumes/ah-3270-2:vmware_host_swap_03/VMW-ESX-5.5.0-qlnativefc-1.1.39.0-offline_bundle-2243137.zip

      1. Remove the old driver

esxcli software vib remove -n scsi-qla2xxx

      1. Reboot the host
      2. Verify after the host is back up:

esxcli software vib list | grep -i ql

Setting up and troubleshooting NPIV on VSphere

I recently setup NPIV for several of my VMs in my production environment.  Let’s just say it was an adventure.  I wanted to detail my experience below, including some step-by-step instructions for setting it up and some advanced troubleshooting I had to do to resolve my issues.

My environment consists of VSphere ESXi 5.5 hosts with Brocade DCX backbones.  Though this setup revolves around my particular setup, the steps here should work with most setups.

Definition

NPIV is the process where a Fiber Channel switch port can advertise multiple Port WWPNs for the same fiber channel connection.  This allows you to have a single switch port represent the WWPN of an ESXI host port and also a virtualized WWPN assigned directly to a VM.  Cisco UCS also does this, with the physical ports of the edge device representing the different virtual WWPNs of the different virtual HBAs on the service profiles provisioned on the Cisco hardware.

This is useful when you want to identify a particular VMs fiber channel traffic by identifying it by its own unique WWPN.  Also, with Hitachi Server Priority Manager (QOS), you have to assign QOS limits by selecting a WWPN to assign a quota to.  Without NPIV, the best you could do is limit the ESXi server HBA to a limit, thereby limiting ALL vms on the ESXi host.

We use VirtualWisdom, which is a Fiber Channel performance and health monitoring software package made by VirtualInstruments.  With NPIV, we can now trace IOPS, MB/s and latency right to the VM as opposed to ESXi HBA port.

Requirements

NPIV has some strict requirements, and failing to adhere to them can give you headaches.

  • Your switch has to support NPIV (Most modern switches, including DCX backbones do)
  • NPIV needs to be enabled on your switch ports (Enabled by default on DCX backbones)
  • ESXi needs to support NPIV (versions past 4.0 do for sure.  I see references to it back to v3.5)
  • NPIV can only be used in conjunction with RDM devices.  .vmdk disks from a datastore are still identified by the ESXi host’s wwpns and cannot use NPIV.
  • The driver on the ESXi host has to support NPIV (Esxi qlnative drivers on 1.1.39 is confirmed to work)

Zoning Notice

You have to make sure the NPIV wwpns are zoned to see ALL STORAGE the cluster hosts can see, even if the VM does not attach to the storage.

Example Configuration

Enabling NPIV

VM Name:  npiv_test_01

1) First, shut down the VM.
2) Log into the vSphere web client and find the VM in your inventory.
3) Edit the settings of the VM.
4) Go to the VM Options Tab, then to “Fibre Channel NPIV”
5)  Uncheck the “Temporarily disable NPIV for this Virtual Machine”
6)  Select “Generate new WWNs”
7)  Select the appropriate number of wwpns vs wwnns.
For one FC fabric, one WWNN and one WWPN should work.  For two, I’d do one WWNN and two WWPNs to simulate a dual-port HBA,each connected to one fabric for redundancy.
8)  Click OK and exit.
9)  Edit settings again and note the WWPN(s) generated for your VM. You will need these later.  For purposes of this exercise, we will assume the following:

NPIV1

Node WWN:
28:2c:00:0c:29:00:00:30

Port WWNs:
28:2c:00:0c:29:00:00:31, 28:2c:00:0c:29:00:00:32

Connect your RDM to it at this time.  When we are done, the VM should be accessing your RDM via your NPIV-generated wwpns.  If this fails for some reason, it will fall back on the wwpns of the ESXi host HBAs.  Remember, it will ONLY see RDMs this way, not .vmdk disks sitting in a datastore.  This ALWAYS go through the ESXi host wwpns.

Now, before you power up the VM, you have to set the zoning and LUN masking or you will have issues while booting or vMotioning it.

FC Fabric Zoning and Configuration

Verify NPIV is enabled

Run the following command from a Brocade backbone

portcfgshow 1/26
(Where 1/26 is the slot/port of your ESXi host HBA port)

NPIV2

Verify the NPIV Capability setting is set to ON (which should be the default for Brocade)

Zone the Fabric(s)

1)  Create a FC alias on each fabric.  Each one will contain one of the two NPIV wwpns you generated above.  On a Brocade backbone:

Fabric 1
alicreate npiv_test_01_hba0,28:2c:00:0c:29:00:00:31

Fabric 2
alicreate npiv_test_01_hba1,28:2c:00:0c:29:00:00:32

2) Create your zones.  Recall that you *must* zone the NPIV wwpn to ALL storage systems your ESXi cluster hosts can see, even if the VM will never see or use the storage!

Fabric 1
zonecreate npiv_test_01_hba0_to_storage_array_01_1a,”ah_npiv_test_01_hba0; storage_array_01_1a”
cfgadd “mycfg”,”npiv_test_01_hba0_to_storage_array_01_1a”
.. add any additional zones for other storage systems…

Fabric 2
zonecreate npiv_test_01_hba1_to_storage_array_01_1b,”ah_npiv_test_01_hba1; storage_array_01_1b”
cfgadd “mycfg”,”npiv_test_01_hba1_to_storage_array_01_1b”
.. add any additional zones for other storage systems…

3) Save and commit your config.

cfgsave
cfgenable mycfg

Mask Your Luns

Now you have to mask your LUNs.  It’s best to add the wwpns for your NPIV VM to the same host or initiator group as your ESXi hosts.  In my environment, we create one group for all wwpns for all ESXI HBAs in the given cluster.  Here is an example:

esxi_host_01
HBA1 (goes to fabric 1):  59:00:00:00:00:00:00:01
HBA2 (goes to fabric 2):  59:00:00:00:00:00:00:02

esxi_host_02
HBA1 (goes to fabric 1):  59:00:00:00:00:00:00:03
HBA2 (goes to fabric 2):  59:00:00:00:00:00:00:04

Your existing host group will look like this:
esxi_cluster_01_target_1a:  59:00:00:00:00:00:00:01,59:00:00:00:00:00:00:03
esxi_cluster_01_target_1b:  59:00:00:00:00:00:00:02,59:00:00:00:00:00:00:04

So in this case, you’d add your fabric1 NPIV address to esxi_cluster_01_target_1a and your fabric2 wwpn to esxi-cluster_01_target_1b:

esxi_cluster_01_target_1a:
59:00:00:00:00:00:00:01,59:00:00:00:00:00:00:03,28:2c:00:0c:29:00:00:31
esxi_cluster_01_target_1b:
59:00:00:00:00:00:00:02,59:00:00:00:00:00:00:04,28:2c:00:0c:29:00:00:32

There is some flexibility at this point.  Different storage systems mask differently, and I don’t think NPIV will freak out if it can’t see the LUN.  In this case, it just won’t work and will fall back to the ESXi host wwpn.

Finishing the Configuration

Now you can power up the VM.  I’d advise you to watch the vmkernel.log as the VM boots so you can see if it worked.  Log into the ESXi host using SSH and use the following command:

tail -f /var/log/vmkernel.log

Messages like below indicate success.

2015-03-20T23:13:32.838Z cpu30:61525)ScsiNpiv: 1701: Physical Path : adapter=vmhba3, channel=0, target=3, lun=2

2015-03-20T23:13:32.838Z cpu30:61525)ScsiNpiv: 1160: NPIV vport rescan complete, [2:2] (0x41092fd6c1c0) [0x41092fd42440] vport exists

2015-03-20T23:13:32.838Z cpu30:61525)ScsiNpiv: 1848: Vport Create status for world:61526 num_wwpn=2, num_vports=4, paths=4, errors=0

If it posts within a minute or two, you should be OK.  If it takes a lot longer, you probably have an issue.

You can verify it worked by using this command:

cat /var/log/vmkernel.log | grep -i num_vport

…num_wwpn=2, num_vports=4, paths=4, errors=0

If you see errors=0, you should be OK.  You should see a vPort per path to the LUN, and no errors.

Verification

There are a number of ways to verify NPIV is working.  I used my VirtualInstruments software to verify it could see the NPIV wwpn.  I created a Host entity and added the two NPIV wwpns as HBA_port objects to it, then graphed it:

NPIV3

This tells me without a doubt an external system can see the traffic from the VM through the NPIV port.

Troubleshooting

I had a number of issues getting this working, but I was able to figure them all out with some help from VMware tech support.  I’ll detail them below:

Error code bad00030 

Translated, this means VMK_NOT_FOUND.  Basically, it means no LUN paths could be found via the NPIV wwpns.  In my case, this was due to a bad driver.  On my Dell PowerEdge 710/720 servers, I had to install the qla-2xxx driver as opposed to the qlnativefc driver to get NPIV to work. I have a separate post forthcoming that details this procedure.

Error code bad0040

Like bad00030, this indicates NPIV can’t reach the LUN via your npiv wwpns.  I watched the log and noticed it would scan each path up to 5 times for each HBA, throwing this error each time until it timed out.  This would take 15 or more minutes, then the VM would eventually post and come up.  If you tried to vMotion it, it would try to enable NPIV on the host it was moving to, time out, then try and fail back to the original host and go through the process again and time out all over again.  This would essentially hang the VM for up to 30 minutes before VSphere would drop the VM on its’ face and power it off.  The VMKernel messages look like this:

2015-03-13T18:15:35.770Z cpu2:330868)WARNING: ScsiPsaDriver: 1272: Failed adapter create path; vport:vmhba64 with error: bad0040
2015-03-13T18:15:37.772Z cpu2:330868)ScsiNpiv: 1152: NPIV vport rescan complete, [3:1] (0x41092fd6ad40) [0x41092fd3fe40] status=0xbad0040
2015-03-13T18:15:37.772Z cpu2:330868)WARNING: ScsiNpiv: 1788: Failed to Create vport for world 330869, vmhba2, rescan failed, status=bad0001
2015-03-13T18:15:37.773Z cpu24:33516)ScsiAdapter: 2806: Unregistering adapter vmhba64

Basically, you can see it creates vmhba64 (This is your virtual NPIV adapter.  The number after vmhba varies).  It tries to scan for your RDM LUN (Target 3, LUN id 1) and fails.  After several retries, it gives up and deletes the vmhba.

This was an issue caused by the fact that I did not zone my npiv wwpns to ALL storage systems my ESXi hosts were connected to.  We had zoned our ESXi cluster to a NetApp storage system, but at some point disconnected all of the LUNs from the ESXi cluster.  Even though the ESXI hosts didn’t need the zones, and the VM certainly did not have any LUNs from that storage system attached, not zoning the NPIV wwpns to that storage system broke NPIV.

How I figured this out was I turned on verbose logging for the qlnativefc driver using this command:

esxcfg-module -s ‘ql2xextended_error_logging=1’ qlnativefc

Then reboot the host for it to take effect.

Disable it later with:

esxcfg-module -s ‘ql2xextended_error_logging=0’ qlnativefc

Then reboot again.

After that, I ran the following:

/usr/lib/vmware/vmkmgmt_keyval/vmkmgmt_keyval -a | less

There is a section in this output for each vmhba, and it lists the target ports the physical ESXi hba sees and the target numbers it assigned to them:

vmhba2

FC Target-Port List:

scsi-qla0-target-0=500000000bbb1155:01ac00:0:Online;
scsi-qla0-target-1=500000000bbb1166:015400:1:Online;
scsi-qla0-target-2=500000000aaa1102:01d800:2:Online;
scsi-qla0-target-3=500000000aaa1103:019800:3:Online;

You can see the wwpns of your storage system target ports, followed by the internal ID ESXi assigns to it (01ac00), and its status.  In my case, the first two target ports were from the storage system I didn’t realize was connected.

With the verbose logging turned on, I ran tail -f /var/log/vmkernel.log while the VM booted up and noted the following:

2015-05-04T19:44:08.112Z cpu19:33504)qlnativefc: vmhba64(4:0.0): GID_PT entry – nn 282c000c29000030 pn 282c000c29000031 portid=012b01.
2015-05-04T19:44:08.113Z cpu19:33504)qlnativefc: vmhba64(4:0.0): GID_PT entry – nn 500000000aaa1101 pn 500000000aaa1102 portid=012900.
2015-05-04T19:44:08.113Z cpu19:33504)qlnativefc: vmhba64(4:0.0): GID_PT entry – nn 500000000aaa1101 pn 500000000aaa1103 portid=016900.
2015-05-04T19:44:08.132Z cpu19:33504)qlnativefc: vmhba64(4:0.0): device wrap (016900)
2015-05-04T19:44:08.132Z cpu19:33504)qlnativefc: vmhba64(4:0.0): Trying Fabric Login w/loop id 0x0008 for port 012900.
2015-05-04T19:44:08.132Z cpu19:33504)qlnativefc: vmhba64(4:0.0): qla24xx_login_fabric(6): failed to complete IOCB — completion status (31) ioparam=1b/fffc01.
2015-05-04T19:44:08.132Z cpu19:33504)qlnativefc: vmhba64(4:0.0): Trying Fabric Login w/loop id 0x0009 for port 012900.
2015-05-04T19:44:08.135Z cpu19:33504)qlnativefc: vmhba64(4:0.0): Fabric Login successful w/loop id 0x0009 for port 012900.
2015-05-04T19:44:08.135Z cpu19:33504)qlnativefc: vmhba64(4:0.0): Assigning new target ID 0 to fcport 0x410a6c407e00
2015-05-04T19:44:08.135Z cpu19:33504)qlnativefc: vmhba64(4:0.0): fcport 500000000aaa1102 (targetId = 0) ONLINE
2015-05-04T19:44:08.135Z cpu19:33504)qlnativefc: vmhba64(4:0.0): Trying Fabric Login w/loop id 0x000a for port 016900.
2015-05-04T19:44:08.138Z cpu19:33504)qlnativefc: vmhba64(4:0.0): Fabric Login successful w/loop id 0x000a for port 016900.
2015-05-04T19:44:08.138Z cpu19:33504)qlnativefc: vmhba64(4:0.0): Assigning new target ID 1 to fcport 0x410a6c410500
2015-05-04T19:44:08.138Z cpu19:33504)qlnativefc: vmhba64(4:0.0): fcport 500000000aaa1103 (targetId = 1) ONLINE
2015-05-04T19:44:08.138Z cpu19:33504)qlnativefc: vmhba64(4:0.0): LOOP READY

The lines that start with GID_PT show the target ports that the NPIV wwpn sees (this is a separate discovery process than the ESXi HBA).  You notice it only sees two of the target ports.

If we concentrate on the first target port it sees, it’s labeled ID 12900.  Later, you see it assigns target id 0 to the target port.  However, from the above steps, you can see using the vmkmgmt_keyval command the ESXi HBA sees wwpn 500000000aaa1102 as Target ID 2.  They don’t match, and thus NPIV fails to work with the bad0040 error.

Once I saw this, I traced the extra wwpns back to the storage system I realized I was connected to and remove it from the zoning of the ESXi HBA ports.  After a rescan of the storage on the ESXi cluster and a reboot of the hosts to be safe, I booted the VM up again and viola!

Summary

Make sure you are meeting all of the requirements for NPIV.  VERIFY you are zoning your NPIV wwpns to ALL STORAGE SYSTEMS the cluster can see, even if it will not use any storage from the storage systems.  Be aware that if you connect any new storage systems to a cluster with one or more NPIV-enabled  VMs, you will need to add the new zones so the NPIV wwpns can see the new storage system target ports, or it will probably start failing.

Fiber Channel Primer

Lately I’ve been spending a lot of time in a software product called VirtualWisdom (By VirtualInstruments).  This software dives into our fiber channel network down to the frame level and reports tons of information about performance and the status of our HBAs and Fiber Channel links.  It’s a ton of data, and I’ve spent the better part of a week diving into Fiber Channel basics to try and understand the data better.  Below are some notes I’ve put together from a number of sources (listed below).  There’s a lot out there, but it’s hard to get it all in one place in an easy-to-understand format. I’m hoping this helps someone out someday.  I’m pretty sure I have all of the data accurate, but please don’t hesitate to contact me with any corrections.

In this first post, I’m just going over the basics.  In subsequent posts, I want to dive into error detection and handling.  This is one of the key areas VirtualWisdom gives you great insight into, and understanding the different error types and where they come from will help you make sense of  the data.

Fiber Channel Topology

Three basic types:

  • Fabric (Switched)
  • Point-to-point
  • Arbitrated Loop

FC Toplology

Port Types

  • F-Port is a fabric port.  This port is on the switch-side.
  • N-Port is a node port.  This is on the initiator or target side.
  • NL-Port is a node loop port.  This refers to a node whe there are multiple nodes in an arbitrated loop.
  • FL-port is a fabric loop Port.  Switch side of the of an NL-port connection.
  • Fx-port refers to either a F-port or an FL port.
  • Nx-port refers to N-port or an NL-port
  • E-Port refers to a port on either side of a switch ISL (Inter-switch link)

FC Layers

FC Layers 1 FC Layers 2

FC-0 – The Physical Layer

Defines the physical link, including the fiber, connectors and optical and electrical parameters for data rates.  This layer is responsible for actually sending the bits of data across the link.

FC-1 – The Encode/Decode Layer

This layer takes character data and encodes it into link pulses.  It encodes the data 8-bytes at a time into 10-byte transmission characters.  This layer prepares the streams of bits for the FC0 layer to send.

FC-2 – The Framing Layer

Serves as the transport mechanism of FC.  Orders the data into frames, and defines the service class to be used.  Manages the sequence of data transfers.

Responsible for breaking down data into frames.

Building blocks:

  • Ordered set
  • Frame
  • Sequence
  • Exchange
  • Protocol

Ordered Set

4-byte transmission words

  • Contains data and special characters that have specific meaning
  • Three major types of Ordered Sets defined by signal protocol
    • Frame Delimiters
      • Start-of-Frame (SOF)
      • End-of-Frame (EOF)
    • Primitive Signals
      • Receiver Ready (R_RDY)
        • Indicates a port buffer can handle more frames
      • Idle
        • Transmitted to indicate a port is ready for Tx and Rx
    • Primitive Sequence
      • Transmitted continuously to indicate a certain status or condition of a port
      • Will be recognized when 3 consecutive sets of the same Primitive Sequence is received
      • Offline (OLS), Not operational (NOS), Link Reset (LR) and Link Reset Repsonse (LRR) are supported Primitive sequences

Frame

Frame

  • Frames are basic building blocks of an FC connection
  • Contains information:
    • Data to be transmitted (payload)
    • Source and destination ports
    • Link control information
  • Two basic types
    • Data Frames
      • Link_Data Frames
      • Device_Data Frames
    • Link Control Frames
      • Acknowledge (ACK)
      • Link_Response (Busy, Reject)

Sequence

  •  Is a set of one or more frames transmitted unidirectionally from one N-Port to another N-Port
  • Each frame in the sequence has a sequence count number
  • Error recovery is usually performed at the boundaries of sequences

Conversation

Exchange

  •  Is one or more nonconcurrent exchanges in a single operation.
  • Can be bidirectional
  • Think of it as a “Conversation”
  • In one exchange, only one sequence can be active at a time
  • Sequences of multiple exchanges can be concurrently active

Flow Control

  •  Depends on the class of service
  • Class 1 uses end-to-end flow control
    • Credits tracked only at the start and end points of the transmission, not the hops in between
  • Class 3 uses buffer-to-buffer
    • Credits tracked at each point of communication between the source and destination
    • Class 2 uses both
  • Managed by Sequence Initiator (Source) and Sequence Recipient (Destination) ports using two counters:
    • Credit
      • Number of credits issued by the recipient to the transmitter
    • Credit_CNT
      • Number of frames that have not been acknowledged by a R_RDY from the recipient
  • If there are not buffer credits available for incoming data, a “Busy” frame is sent back to the transmitter
  • If a frame error is received, a “Reject” frame is sent back

Service Classes

  • Class 1 is point-to-point
    • Sends and ACK for every frame sent
    • Frame delivery is in-order
    • Locks the entire link between the sender and trasmitter, does not share
    • Good for sustained, high-bandwith operations
  • Class 2 is frame-switched and multiplexed
    • Sends ACK for every frame sent
    • Frame delivery is not guaranteed
    • Connectionless
    • Good when the time needed to setup connection is longer than it takes to send the message
  • Class 3 is identical to class 2, but no ACK is sent on frame delivery
    • Ideal for real-time communication.  Data received too late is not of value anymore.
    • Frame delivery is not guaranteed

FC-3 – Common Services Layer

  • Striping – allows you to multiply bandwith by striping data over multiple N_Ports in parallel
  • Hunt Groups – Allows more than one N_Port to respond to a request using an alias address.   Reduces chance for a busy port.
  • Multicast – Allows broadcast of single signal to multiple destination ports.

FC-4 – Application Layer

  • Defines the interfaces that can transmit over FC
  • Supports both network and channel protocols
  • Suports
    • SCSI
    • IPI – Intelligent Peripheral Interface
    • HIPPI – High Performance Parallel Interface
    • IP
    • AAL5
    • FC-LE – Link Encapsulation
    • SBCCS
    • IEEE 802.2

References

http://www.virtualinstruments.com/sites/default/files/documents/wp-top-10-brocade-problems_0.pdf

http://jeff.nieusma.com/docs/fibre_channel.html

http://hsi.web.cern.ch/HSI/fcs/spec/overview.htm

http://www.rfc-editor.org/rfc/rfc4983.txt

http://www.t11.org/ftp/t11/pub/fc/fs-4/13-113v0.pdf