January 2011

Technical Computing

Filesystems and Performance - Part One: I/O Bound Systems.

by Scott Nolin

For science computing systems, system performance is often limited by the filesystem. This month I’ll talk about something that we’ve helped people work through on many systems at SSEC – the problem of the I/O bound system.

You have 100 quick and simple jobs that you run hourly that read some data, processes it, and puts it somewhere else. It run fine, all is well. You know these jobs take very little CPU time and you have plenty of CPU power to spare. You want to take advantage of your extra processing power, so you add 1000 jobs that run hourly. Now none of your jobs ever seem to get done! Perhaps you even purchased a new computer that is extremely high performance, and moved your jobs to this new system, but it still does not perform well!

Why might this be? In this case above and many others, it’s possible your system may be i/o bound.In TC we often see that especially If you do a lot of routine processing via cron, getting a new system and adding workload to it or simply adding work to an existing system may result in becoming i/o bound.

What this means is your system is spending most of it's time actually waiting around for data to be written to or read from the disk drives. This can normally be detected by examining the system with tools such as iostat.
How to deal with this?

First, come talk to us in Technical Computing. We can look carefully at your problem and likely provide some really useful details about how to optimize it. Ideally talk with us before you purchase a new system and we can try to help you plan ahead if possible.

Some of the fixes that we have helped people implement:

Next month In Filesystems and Performace - Part Two I'll talk about filesystem metadata performance problems. In TC we have found some specific and hard to resolve problems related to filesystem metadata for some workloads, especially on Linux systems. This topic may not affect as many people as this month's, but the problem is fiendishly troublesome for high performance computing systems.

 

Back to Front Page

building top