Lustre ZFS Upgrade Procedure

by Andrew Wagner — last modified Jun 16, 2014 10:52 PM

 

Important Note: The definitive source for Lustre documentation is the Lustre Operations Manual available at https://wiki.hpdd.intel.com/display/PUB/Documentation.

These documents are copied from internal SSEC working documentation that may be useful for some, but be we provide no guarantee of accuraccy, correctness, or safety. Use at your own risk.

Notes on how to perform a Lustre upgrade for zfs based Lustre filesystems

Upgrade Lustre on ZFS

This details the process for a point release upgrade of Lustre on ZFS. For example, from Lustre 2.3 to 2.4 or from 2.4.0 to 2.4.2 type of upgrade.

These steps must be executed on the MDS and OSSs.

Unmount Filesystem from Clients and Unmount Lustre Volumes

Use lshowmount -l on the MDS to see where Lustre FS is mounted and unmount the volume from the found clients.

After clients are unmounted, unmount the MDS followed by the OSSs.

service lustre stop  service lnet stop

Uninstalling Lustre

The first step in this process is to uninstall the Lustre module from the system. This is done so that the new Lustre module can be built cleanly against the kernel that you will be using.

yum remove lustre lustre-dkms

An existing Lustre file system in a ZFS pool is untouched by this process. When the new Lustre is installed and the pool is mounted, it will automatically be upgraded to the new version of Lustre.

At this point, verify that the horrible Lustre weak-updates are gone from the system. If they are not, you will probably need to delete them to prevent conflicts when the new Lustre module is built.

ls -l /lib/modules/*/weak-updates

Install The New Kernel

Lustre updates usually bring in new kernel support. We should use this opportunity to upgrade the kernel on Lustre servers for performance improvements, bug fixes, security fixes.

yum update kernel-VERSION kernel-devel-VERSION kernel-firmware-VERSION

You do not want the latest kernel from the Centos repo, but instead you want the latest kernel supported by the version of Lustre you install. You will need to spell out the exact rpm names for the above updates for the correct VERSION.

FOR EXAMPLE:    yum install kernel*-2.6.32-358.23.2.el6.x86_64

Now, ensure that the new kernel is selected in /etc/grub.conf and reboot.

Update ZFS

After we're booted into the new kernel, you can update ZFS.

yum update zfs

That should take care of all of the dependencies from the ZFS repo.

Reinstall Lustre

At this point, we should be ready to build the Lustre module on the new kernel.

yum install lustre lustre-dkms

This step takes awhile, so let it chug away.

Reinstall ZFS/SPL Modules

THIS STEP IS IMPORTANT!!! DUE TO DKMS STUPIDITY WE HAVE TO EXECUTE THE FOLLOWING COMMANDS:

dkms remove --all spl  dkms remove --all zfs  dkms install --force spl  dkms install --force zfs  Reboot

That is the only way to get the modules to load correctly as of 6/16/2014. Otherwise, the zpool import will not detect the filesystems.

Remount Lustre

Ensure that the appropriate configuration files are still there:

/etc/zfs/vdev_id.conf

/etc/ldev.conf

/etc/modprobe.d/lustre.conf

If they are, ensure that the ib0 network connection is up.

Start Lustre on the MDS first, followed by the OSSs.

service lnet start  service lustre start

The filesystem should go through recovery. Test remounting on a single client. Monitor the /var/log/messages on MDS/OSS for errors.