Nexenta Build
*please note I started this post awhile ago, so most of this refers to work from last summer
From my first experience of ZFS in FreeBSD 7 it was clear this was a game changing approach to storage. It’s mix of features, easy of use, scalability and robustness make it an ideal platform for enterprise storage, and the best part is it’s open source!
When we were looking at options for a NAS/SAN on a limited budget (when aren’t they) and offered us something we could grow with, the features of OpenSolaris made a lot of sense. It would allow us to continue to use commodity hardware of our choice (within some limits) and let us customise the system for our needs. But we also wanted support and a real company behind the product, enter Nexenta
I’m a big fan of choice, and when it comes to the world of storage the big boys still dominate. The “Big 5” primarily sell storage systems built on their own proprietary combination of hardware and software. This results in high prices, lock-in and lack of flexibility. Increasingly new OpenStorage companies have begun to emerge to try and take on the big boys and inject some more competition into the market.
Based on the solid core of OpenSolaris with Debian’s Apt package manager Nexenta has built a storage based OS distribution which offers ZFS, COMSTAR, Kernel based CIFS/NFS all wrapped in a web based GUI (don’t worry there is a CLI!).
Below Ill talk talk through how we got on using Nexenta. I hope someone will find some of the tips and information useful if they are setting out on a similar journey.
Initial Kit
Head-Node
- HP Proliant DL360G7
- Intel Xeon E5649 2.56GHz
- 48GB RAM
- 2x 146GB syspool
- LSI 9200-8e
- Intel CX4 10GbE NIC
JBOD
- DataON DNS1600 (Dual Controllers)
- 8x 1TB Toshiba 6G SAS (Data)
- 2x ZeusRAM 8GB 6G SAS (Zil)
- 1x Talos C 230GB SSD (L2Arc)
Network
- 2x HP Procurve 2910al-24G
- 2x HP dual-port 10GbE CX4 al Module
Configuration
The pool configuration is 4 mirrored vdevs with mirrored Zils and a single L2Arc. We opted for mirrored vdevs rather than raidzX for performance as our use-case was for a write heavy workload. Mirrored vdevs also make rebuild times more predictable and recovery easier should the worst happen easier.
We accelerated our pool using a mirrored set of ZeusRAM 8GB drives. These ultra fast RAM drives have large super capacitors to protect the data in case of power failure and suffer no write performance degradation over time like some SSDs can. We also added a 230GB OCZ Talos as a L2Arc (read cache).
This is the pool layout.
We created some thinly provisioned zvols which we made available to our hypervisors over iSCSI with COMSTAR. Taking advanced of the MPIO support in COMSTAR to add extra resiliency we used two HP 2910al-24G switches and created two completely separate paths to the storage from the hypervisors. In addition we configured our JBOD to be active/active by connecting each port of the LSI-9200-8e to a SAS port on each controller (see the tips on checking MPxIO is working).
Thoughts
Since the system went live we’ve had a few blips, most of the solutions to those are covered in the tips sections below. The biggest piece of advice I would give anyone looking a similar build is to find a partner who has experience of deploying such a system, ours has been invaluable when little problems have occurred.
We have been very pleased with the performance of the system and look forward to having it expand over the coming years.
Tips
Few tips if you’re looking at a Nexenta build
Use The HCL!
As NexentaStor is based on an older OpenSolaris build (4.0 will be based on Illumos) it’s driver support can be patchy as Nexenta has to backport drivers, so make sure you stick to kit on the HCL, Intel & LSI are good choices here.
We’ve had issues with our Intel CX4 NICs when turning on flow control, this bug is still being investigated by Nexenta and only happens with the mode set to bi or tx not rx (It should be noted this can only be changed from expert mode so it’s probably debatable whether it’s supported)
Don’t Upgrade the Community Edition
If you make use of the Community Edition and then choose to go Enterprise make sure you reinstall from scratch, we had issues with missing features and strange performance problems on an upgraded node.
Configure your network correctly
Make sure you are using decent quality networking kit suitable for iSCSI. Enable jumbo frames, disable spanning tree and enable flow control (this is useful if you’re going to be mixing 1Gbe with 10Gbe).
Check your disks are using MPxIO
Assuming you’re using a JBOD that supports active/active controllers you’ll want to make use of MPxIO. However often drives do not get automatically picked up as being multipath capable by the scsi_vhci driver. You can check if it’s enabled by looking under the ‘Attach’ column in disks on NMC.
If you see mpxio, you’re set. If you see mpt_sas you’ll need append your disk make and model to the /kernel/drv/scsi_vhci.conf
file. This file has a specific format, to make my Toshiba, STEC and OCZ drives work I changed the default:
It’s important to note if the manufacturer name is less than 8 characters you need to add spaces before the module. Once updated you need to reboot for the changes to take effect. You can find your vendor and model by looking at the output of format
or iostat -E
Hosts file
We had some issues with the management interface appearing to lock up, these turned out to be due to stale information in the /etc/hosts
file. Once we updated this to reflect the real primary interface IP the NMC started working again!
XenServer Active/Active MPIO
We use XenServer as our primary hypervisor and found it works really well with NexentaStor when using iSCSI and MPIO. However to get it working in active/active mode you have to tweak the /etc/multipath.conf
file as follows