Installing Lustre with ZFS & RAID-Z2 on Dell PowerEdge servers

by Scott Nolin, Andrew Wagner, John Lalande -- last modified May 29, 2014 10:00 AM

Important Note: The definitive source for Lustre documentation is the Lustre Operations Manual available at https://wiki.hpdd.intel.com/display/PUB/Documentation.

These documents are copied from internal SSEC working documentation that may be useful for some, but be we provide no guarantee of accuraccy, correctness, or safety. Use at your own risk.

 

This guide applies to systems with JBODs where ZFS manages the disk directly without a Dell Raid Controller in between. This guide is very specific for SSEC --- for example we use puppet to provide various software packages and configurations. However, it is included as some information may be useful to others.

Outline

  1. Lustre Server Prep Work
    1. OS Installation (RHEL6)
      1. You must use the RHEL/Centos 6.4 Kernel 2.6.32-358
      2. Use the "lustre" kickstart option which installs a 6.4 kernel
      3. Define the host in puppet so that it is not a default host - NOTE: We Use Puppet at SSEC to distribute various required packages, other environments will vary!
    2. Lustre 2.4 installation 
      1. Puppet Modules Needed
        • zfs-repo
        • lustre-healthcheck
        • ib-mellanox
        • check_mk_agent-ssec
        • puppetConfigFile
        • lustre-shutdown
        • nagios_plugins
        • lustre24-server-zfs
        • selinux-disable
        • collectl-tmpfile  
  class {'ganglia-zara':            gangliagroup => 'storage',          }
class {'zfs-repo':            stage => 'first',          }
 
  1. Configure Metadata Controller
    1. Map metadata drives to enclosures (with scripts to help)
      1. For our example mds system we made aliases for 'ssd0' ssd1 ssd2 and ssd3
      2. put these in /etc/zfs/vdev_id.conf - for example:
        1. alias arch03e07s6 /dev/disk/by-path/pci-0000:04:00.0-sas-0x5000c50056b69199-lun-0
      3. run udevadm trigger to load drive aliases
    2. On metadata controller, run mkfs.lustre to create metadata partition. On our example system:
      1. Use separate MGS for multiple filesystems on same metadata server.
      2. Separate MGS: mkfs.lustre --mgs --backfstype=zfs lustre-meta/mgs mirror d2 d3 mirror d4 d5
      3. Separate MDT: mkfs.lustre --fsname=arcdata1 --mdt --mgsnode=172.16.23.14@o2ib  --backfstype=zfs lustre-meta/arcdata1-meta
    3. Create /etc/ldev.conf and add the metadata partition. On example system, we added:
      1. geoarc-2-15 - MGS zfs:lustre-meta/mgs geoarc-2-15 - arcdata-MDT0000 zfs:lustre-meta/arcdata-meta
    4. Create /etc/modprobe.d/lustre.conf

      1. options lnet networks="o2ib" routes="tcp metadataip@o2ib0 172.16.24.[220-229]@o2ib0"
      2. NOTE: if you do not want routing, or if you are having trouble with setup, the simple options lnet networks="o2ib" is fine
    5. Start Lustre.  If you have multiple metadata mounts, you can just run service lustre start.
    6. Add lnet service to chkconfig and ensure on startup. We may want to leave lustre off on startup for metadata controllers.
  2. Configure OSTs
    1. Map drives to enclosures (with scripts to help!)
    2. Run udevadm trigger to load drive aliases.
    3. mkfs.lustre on MD1200s. 
      1. Example RAIDZ2 on one MD1200: mkfs.lustre --fsname=cove --ost --backfstype=zfs --index=0 --mgsnode=172.16.24.12@o2ib lustre-ost0/ost0 raidz2 e17s0 e17s1 e17s2 e17s3 e17s4 e17s5 e17s6 e17s7 e17s8 e17s9 e17s10 e17s11
      2. Example RAIDZ2 with 2 disks from each enclosure, 5 enclosures (our cove test example): mkfs.lustre --fsname=cove --ost --backfstype=zfs --index=0 --mgsnode=172.16.24.12@o2ib lustre-ost0/ost0 raidz2 e13s0 e13s1 e15s0 e15s1 e17s0 e17s1 e19s0 e19s1 e21s0 e21s1
      3.  Repeat as necessary for additional enclosures.
    4. Create /etc/ldev.conf
      1. Example on lustre2-8-11:
        lustre2-8-11 - cove-OST0000     zfs:lustre-ost0/ost0  lustre2-8-11 - cove-OST0001     zfs:lustre-ost1/ost1  lustre2-8-11 - cove-OST0002     zfs:lustre-ost2/ost2
    5. Start OSTs. Example: service lustre start. Repeat as necessary for additional enclosures.
    6. Add services to chkconfig and setup.
  3. Configure backup metadata controller (future)
  4. Mount the Lustre file system on clients
    1. Add entry to /etc/fstab. With our example system, our fstab entry is:
      172.16.24.12@o2ib:/cove         /cove            lustre  defaults,_netdev,user_xattr       0 0
    2. Create empty folder for mountpoint, and mount file system (e.g., mkdir /cove; mount /cove).
 

Helpful links

http://zfsonlinux.org/lustre-configure-single.html
http://www.ufb.rug.nl/ger/docs/lustre-zfs.txt
 
zfs and HA - https://github.com/chaos/lustre/commit/04a38ba7