Beyond Hadoop: Graph Databases for Big Data

Beyond Hadoop: Graph Databases for Big Data
By

Graph databases a good solution for dealing with Big Data.Graph databases a good solution for dealing with Big Data.Hadoop and MapReduce are often used to solve big data problems but there are other data analysis models that lend themselves to big data.

Graph databases, for example, are well suited for problems that can be described in networks, such as social networks, workflows, transportation networks, and communication patterns. Certainly, you can solve many network problems using MapReduce (think of Google’s Page Rank algorithm) but it is often easier to solve problems when the data structure directly represents key aspects of the problem description.  Let’s first take a look at graph databases in general before turning our attention to specific business applications.

Databases are typically designed around a fundamental data structure.Relational databases, like MySQL, Oracle and SQL Server, are based on tables, which in turn are collections of rows of attributes. Online analytic processing (OLAP) databases are built on multi-dimensional cubes.  CouchDB and MongoDB are document-oriented databases that use JSON-like data structures.  Graph databases, like Neo4j, have a simple underlying data structure that consists of two types of objects: nodes and vertices.  This simplicity lends itself to modeling a wide range of problems.

A node is typically used to represent an entity, such as a person in a social network, a location in a transportation network, or Web page on the Internet. The relationships between nodes are represented in vertices which can be thought of as links between nodes.  For example, to model the fact that Alice and Bob work together we could create two nodes representing the employees and a vertex between them indicating their “work together” relationship. 

In the case of a transportation model, we could model two locations using two nodes and represent the connection between the two locations using a vertex.  Nodes and vertices are quite useful by themselves but the graph model becomes even more useful with the addition of properties on nodes and vertices.

Node properties can describe an entity just as attributes in a relational database table describe an object. The employee nodes described above could have attributes indicating the employee type, e.g. manager, analyst, or systems administrator.  The vertices similarly use properties to describe links.  For example, the link between two locations in a transportation network can indicate the distance between the two locations.  

Dan Sullivan is an author, systems architect, and consultant with over 20 years of IT experience with engagements in systems architecture, enterprise security, advanced analytics and business intelligence. He has worked in a broad range of industries, including financial services, manufacturing, pharmaceuticals, software development, government, retail, gas and oil production, power generation, life sciences, and education.  Dan has written 16 books and numerous articles and white papers about topics ranging from data warehousing, Cloud Computing and advanced analytics to security management, collaboration, and text mining.

See here for all of Dan's Tom's IT Pro articles.

(Shutterstock image credit: Cloud Data Folder)

Comments