Recently Facebook released the third generation of its internally developed, open-source caching system called Flashcache. The updated Flashcache version 3.0 reportedly makes better decisions about what data to cache, thus reducing the amount of wear and tear on the thousands of SSDs used to store frequently consulted Facebook data. The social network claims this new version increases the average hit rate from 60 percent to 80 percent, and cuts the disk operation rate nearly 50 percent.
"Facebook first started using Flashcache in 2010. At the time, we were trying to find a way around having to choose between a SAS or SATA disk-based solution and a purely flash-based solution," explains Facebook's Domas Mituzas. "None of the options were ideal: The SAS and SATA options in 2010 were slow (SATA) or required many disks (SAS), and the cost per gigabyte for flash in 2010 was high."
Mituzas said one solution would have been to split the databases into multiple tiers, but that would have been relatively complex, which would be less desirable given Facebook's scale. Instead, the company chose to take the software approach and initially considered adding support for an L2 cache directly into InnoDB. However Facebook chose to implement Flashcache as a Linux kernel device mapper target, and deployed it into production on a large scale.
For the latest Flashcache release, Mituzas breaks the update down into three sections: read-write distribution, cache eviction and write efficiency. For the latter, he reports that the team implemented a straightforward dirty data eviction method that doesn't segregate write and read operations. All pages are treated equally in this new method, and if cache wants to reclaim a page, it just looks at the oldest entries in the LRU. However if the oldest entry is dirty, then cache would schedule a background eviction of that entry and then reclaim the next clean one and use it for new data.
On the read-write distribution, Facebook chose to decrease the disk-side associativity size from 2 MB to 256 KB (using RAID stripe sized clustering), change the flash-side associativity size from 2 MB to 16 MB (4096 pages per set instead of 512), and move to random hashing instead of linear mapping. These changes disperse the "hot data" over more of the cache, he writes. For cache eviction, Mituzas says that Flashcache is running with the mid-point (implemented as LRU-2Q) insertion set to the 75th percentile, a conservative setting that allows for 25 percent old pages.
"With these three changes implemented, we are now turning our attention to future work," he writes. "We've already spent some time restructuring metadata structures to allow for more efficient data access, but we may still look at some changes to support our next-generation systems built on top of multi-TB cache devices and spanning tens of TB of disk storage. We’re also working on fine-grained locking to support parallel data access by multiple CPU cores."
Mituzas points out that even though the price of GB is coming down, it's still not low enough, thus making capacity planning highly complex. Given the limited write cycles and the premium prices of SSDs, Facebook also doesn't want them to wear out too early with unnecessary writes. The company even believes small flash devices could be problematic.
"With these improvements, Flashcache has become a building block in the Facebook stack," he writes. "We're already running thousands of servers on the new branch, with much-improved performance over flashcache-1.x."
To read the full blog post on Flashcache 3.0, head here.
Kevin Parrish is a contributing editor and writer for Tom's Hardware, Tom's Games, Tom's Guide and Tom’s IT Pro. He's also a graphic artist, CAD operator and network administrator.
See here for all of Kevin's Tom's IT Pro articles.