Sometimes we can work around the limitations of a data management system at a cost but sometimes the costs become so high it warrants considering an alternative data management system. Developers of large scale Web applications, like Facebook, can make relational databases work well at large scale but few organizations have their resources. For others, a NoSQL database might offer the right mix of functionality and maintainability.
Unlike relational databases, there is no single, underlying theoretical foundation for all NoSQL databases. You have several frameworks to choose from, including key-value data stores, graph databases, column store and document databases.
Key-value data stores allow developers to store arbitrary types of data using attributes that are defined as needed. Unlike relational databases, key-value data stores do not require a predefined schema with all attributes defined. This model is suited for applications storing simply structured data with the potential for a large number of frequently changing attributes. Examples of key value data stores include DynamoDB, Berkley DB and Redis. Even on low-end hardware, key-value data stores can achieve read and write rates of 100,000 I/Os per second.
Graph databases such as Neo4j use nodes and links between nodes as the basic building blocks. Networks are easily modeled with graph databases, making them suitable for social network analysis, workflow modeling, and other systems of linked or interacting entities. Graph databases allow one to easily create queries about paths and relationships while providing read performance comparable to other NoSQL databases. Write performance on graph databases may not achieve the same levels as other NoSQL databases so they might not be the best option for write intensive applications. InfiniteGraph and Sones are two other NoSQL graph database options.
Column store databases such as HBase andCassandra work well for big data applications that can benefit from MapReduce analysis. HBase’s distributed architecture is designed for applications storing up to billions of rows and millions of columns and may be a good option to replace a relational database that cannot support such large data sets. Cassandra uses a column family structure that includes support for column indexes and materialized views. Cassandra is a distributed database that can achieve linear increase in performance as nodes are added to the cluster. In spite of increased performance that comes with additional hardware, column data stores are not known for supporting complex queries and rapid query response times. Riak is another column store NoSQL database.