Alex Crowe bio photo

Alex Crowe

DevOps Engineer, London

Twitter LinkedIn Github

Nexenta Build

*please note I started this post awhile ago, so most of this refers to work from last summer

From my first experience of ZFS in FreeBSD 7 it was clear this was a game changing approach to storage. It’s mix of features, easy of use, scalability and robustness make it an ideal platform for enterprise storage, and the best part is it’s open source!

When we were looking at options for a NAS/SAN on a limited budget (when aren’t they) and offered us something we could grow with, the features of OpenSolaris made a lot of sense. It would allow us to continue to use commodity hardware of our choice (within some limits) and let us customise the system for our needs. But we also wanted support and a real company behind the product, enter Nexenta

I’m a big fan of choice, and when it comes to the world of storage the big boys still dominate. The “Big 5” primarily sell storage systems built on their own proprietary combination of hardware and software. This results in high prices, lock-in and lack of flexibility. Increasingly new OpenStorage companies have begun to emerge to try and take on the big boys and inject some more competition into the market.

Based on the solid core of OpenSolaris with Debian’s Apt package manager Nexenta has built a storage based OS distribution which offers ZFS, COMSTAR, Kernel based CIFS/NFS all wrapped in a web based GUI (don’t worry there is a CLI!).

Below Ill talk talk through how we got on using Nexenta. I hope someone will find some of the tips and information useful if they are setting out on a similar journey.

Initial Kit

Head-Node

  • HP Proliant DL360G7
  • Intel Xeon E5649 2.56GHz
  • 48GB RAM
  • 2x 146GB syspool
  • LSI 9200-8e
  • Intel CX4 10GbE NIC

JBOD

  • DataON DNS1600 (Dual Controllers)
  • 8x 1TB Toshiba 6G SAS (Data)
  • 2x ZeusRAM 8GB 6G SAS (Zil)
  • 1x Talos C 230GB SSD (L2Arc)

Network

  • 2x HP Procurve 2910al-24G
  • 2x HP dual-port 10GbE CX4 al Module

Configuration

The pool configuration is 4 mirrored vdevs with mirrored Zils and a single L2Arc. We opted for mirrored vdevs rather than raidzX for performance as our use-case was for a write heavy workload. Mirrored vdevs also make rebuild times more predictable and recovery easier should the worst happen easier.

We accelerated our pool using a mirrored set of ZeusRAM 8GB drives. These ultra fast RAM drives have large super capacitors to protect the data in case of power failure and suffer no write performance degradation over time like some SSDs can. We also added a 230GB OCZ Talos as a L2Arc (read cache).

This is the pool layout.

nmc@nexenta:/$ zpool status
pool: tank
 state: ONLINE
 scan: none requested
config:

        NAME                       STATE     READ WRITE CKSUM
        tank                       ONLINE       0     0     0
          mirror-0                 ONLINE       0     0     0
            c0t50000393B8C93020d0  ONLINE       0     0     0
            c0t50000393B8C93064d0  ONLINE       0     0     0
          mirror-1                 ONLINE       0     0     0
            c0t50000393B8C930A8d0  ONLINE       0     0     0
            c0t50000393B8C930ACd0  ONLINE       0     0     0
          mirror-2                 ONLINE       0     0     0
            c0t50000393B8C930DCd0  ONLINE       0     0     0
            c0t50000393B8C930E0d0  ONLINE       0     0     0
          mirror-3                 ONLINE       0     0     0
            c0t50000393B8C93104d0  ONLINE       0     0     0
            c0t50000393E8CAF744d0  ONLINE       0     0     0
        logs
          mirror-4                 ONLINE       0     0     0
            c0t5000A72030044473d0  ONLINE       0     0     0
            c0t5000A72030044478d0  ONLINE       0     0     0
        cache
          c0t5E83A970000020F1d0    ONLINE       0     0     0

errors: No known data errors

We created some thinly provisioned zvols which we made available to our hypervisors over iSCSI with COMSTAR. Taking advanced of the MPIO support in COMSTAR to add extra resiliency we used two HP 2910al-24G switches and created two completely separate paths to the storage from the hypervisors. In addition we configured our JBOD to be active/active by connecting each port of the LSI-9200-8e to a SAS port on each controller (see the tips on checking MPxIO is working).

Thoughts

Since the system went live we’ve had a few blips, most of the solutions to those are covered in the tips sections below. The biggest piece of advice I would give anyone looking a similar build is to find a partner who has experience of deploying such a system, ours has been invaluable when little problems have occurred.

We have been very pleased with the performance of the system and look forward to having it expand over the coming years.

Tips

Few tips if you’re looking at a Nexenta build

Use The HCL!

As NexentaStor is based on an older OpenSolaris build (4.0 will be based on Illumos) it’s driver support can be patchy as Nexenta has to backport drivers, so make sure you stick to kit on the HCL, Intel & LSI are good choices here.

We’ve had issues with our Intel CX4 NICs when turning on flow control, this bug is still being investigated by Nexenta and only happens with the mode set to bi or tx not rx (It should be noted this can only be changed from expert mode so it’s probably debatable whether it’s supported)

Don’t Upgrade the Community Edition

If you make use of the Community Edition and then choose to go Enterprise make sure you reinstall from scratch, we had issues with missing features and strange performance problems on an upgraded node.

Configure your network correctly

Make sure you are using decent quality networking kit suitable for iSCSI. Enable jumbo frames, disable spanning tree and enable flow control (this is useful if you’re going to be mixing 1Gbe with 10Gbe).

Check your disks are using MPxIO

Assuming you’re using a JBOD that supports active/active controllers you’ll want to make use of MPxIO. However often drives do not get automatically picked up as being multipath capable by the scsi_vhci driver. You can check if it’s enabled by looking under the ‘Attach’ column in disks on NMC.

nmc@muninn:/$ show lun disk
LUN ID      Device    Type      Size      Volume     Mounted Attach GUID
c0t5*020d0  sd50      disk      1TB       tank       no      mpxio  50000393b8c93020
c0t5*064d0  sd59      disk      1TB       tank       no      mpxio  50000393b8c93064
c0t5*074d0  sd56      disk      1TB       tank       no      mpxio  50000393b8c93074
c0t5*0A8d0  sd64      disk      1TB       tank       no      mpxio  50000393b8c930a8

If you see mpxio, you’re set. If you see mpt_sas you’ll need append your disk make and model to the /kernel/drv/scsi_vhci.conf file. This file has a specific format, to make my Toshiba, STEC and OCZ drives work I changed the default:

scsi-vhci-failover-override =
"3PARdataVV", "f_sym",
"COMPELNTCompellent Vol", "f_sym";

becomes

scsi-vhci-failover-override =
"3PARdataVV", "f_sym",
"COMPELNTCompellent Vol", "f_sym",
"OCZ     TALOS", "f_sym",
"STEC    ZeusRAM", "f_sym",
"TOSHIBA MK1001TRKB", "f_sym";

It’s important to note if the manufacturer name is less than 8 characters you need to add spaces before the module. Once updated you need to reboot for the changes to take effect. You can find your vendor and model by looking at the output of format or iostat -E

Hosts file

We had some issues with the management interface appearing to lock up, these turned out to be due to stale information in the /etc/hosts file. Once we updated this to reflect the real primary interface IP the NMC started working again!

XenServer Active/Active MPIO

We use XenServer as our primary hypervisor and found it works really well with NexentaStor when using iSCSI and MPIO. However to get it working in active/active mode you have to tweak the /etc/multipath.conf file as follows

...
device {
        vendor                  "NEXENTA"
        product                 "(COMSTAR|NEXENTASTOR)"
        path_grouping_policy    group_by_prio
        failback                immediate
        no_path_retry           queue
}
...

comments powered by Disqus