As mentioned, our first instinct is to connect all 32 SSDs and let the FSA 200 fly, and we intend to do just that. However, there are architectural limitations that prevent us from utilizing all 32 SSDs simultaneously at their full native performance.
The FSA 200 features four canisters that each house up to 8 SSDs and an onboard PCIe switch. The switch aggregates the 80 available lanes inside of the canister (two x16 and six x8 electrical slots) into a single PCIe 3.0 x16 connection, which creates an internal bandwidth restriction. For instance, the eight PCIe 3.0 Intel SSDs in each canister utilize four lanes apiece, and thus would require 32 PCIe lanes to provide unrestricted performance.
The FSA 200 transmits all canister traffic across the PCIe 3.0 x16 connection as the data moves into the rear backplane and then travels into the host server. The hosts only support a maximum of two PCIe Gen 3.0 x16 adapters per node—or 32 lanes of PCIe 3.0 per host.
The appliance can present all four canisters (32 SSDs) to a single host through the magic of PCIe switching, but unfortunately the configuration still only employs a maximum of two PCIe adapters per host (which means the same PCIe 3.0 bandwidth limitations apply). We tested various configurations exhaustively and measured the best performance with 16 SSDs assigned to each host, and present test results accordingly.
The SSDs appear as local devices in Windows or Linux environments, and after installing the SSD drivers the user assembles the drives into an array with whatever software RAID flavor suits their taste. Software RAID comes with its own myriad of eccentricities that we had to navigate to extract the most performance possible. Software RAID consumes CPU cycles, and to eliminate any possible extraneous host CPU overhead we tested a single large volume with the open-source fio load generator in CentOS 7.
We record performance measurements every second, and usually present latency metrics with the same per-second granularity. However, we disabled per-second latency recording due to excessive system time calls (which happens when you push the boundaries). We still recorded average latency measurements and utilized up to 32 threads (workers) to spread the workload across the CPU cores and increase parallelism (where applicable). We stop at 256 OIO (Outstanding I/O) under normal conditions, but for these tests, we pushed up to 512 and 1,024 OIO in a blatant attempt to extract the most performance possible from a single volume.
We tested with two hosts simultaneously and received nearly identical performance metrics from each. However, accurately combining the two high-granularity datasets is not possible, so we present the results of both 8 and 16 SSDs in RAID, but remind readers that this is only half of the possible performance provided by the FSA 200. Our intent is to test the performance boundaries of the FSA 200, and not the underlying SSDs. We present RAID 0 results, which only the brave would use in a production environment. We experimented with a dizzying number of configurations, but found the best overall performance to be at the rather dull 256-chunk size.