Product and service reviews are conducted independently by our editorial team, but we sometimes make money when you click on links. Learn more.
 

Big Data Changes Business Intelligence: New Tools And Techniques

Big Data Changes Business Intelligence: New Tools And Techniques
By

As you develop a strategy for big data analysis, consider the broader array of information that is captured in big data and the new tools and techniques that are now available in business intelligence.

Business Intelligence (BI) has operated using well established practices and tools including dimensional modeling, extraction transformation and load (ETL), ad hoc reporting and dashboards. These techniques emerged from the need to support management reporting that was not generally available from transaction processing systems. If you need to compare this quarter's sales by region and product line to the previous quarter's, you were not going to get much help from your sales processing and order fulfillment systems -- you need to build a data warehouse or at least a data mart. 

Today, we still need these same kinds of management reports, but we also have the opportunity to tap into a broader array of information captured in big data. Here are some essential characteristics to keep in mind as you develop a strategy for big data analysis.

Big Data Is Different

Big data is often described in terms of the variety of data, the speed with which it changes and the volume of data.

In many ways, large volumes are the easiest of the three to address. Deploy enough commodity servers and storage along with a distributed file system, like the Hadoop Distributed File System (HDFS), and you can collect petabytes of data. If you do not need to store the data for too long, you can take advantage of public clouds like Amazon AWS, Microsoft Azure and Rackspace, but watch out for long term storage costs.

If you have rapidly changing data consider using a realtime big data analytics tool like Storm. Twitter developed it and offers it as a free and open source tool. Storm processes data in realtime while offering fault tolerance and guaranteed processing of your messages. 

Variety can be the most challenging aspect of big data. If you are dealing with a large number of different structured data types, such as sensor data or structured machine data, then designing programs to work with the data will be straightforward. If you are working with unstructured data, such as social media content, you have a bigger problem.  Natural language texts tend to have domain specific characteristics. The tools you develop for analyzing customer sentiment in online reviews may not be much help analyzing financial documents, like a publically traded company's filings with the Securities and Exchange Commission.  

Some techniques, such as recognizing names of companies and geographic locations, work across many domains. There are, however, other types of operations required to extract structured data from free form text that do not work as well across domains.

Analysis Techniques Are Different, Too

Working with big data is different from working with structured data extracted from transaction processing systems. When you start a business intelligence project you have a clear idea of the types of data you will need to analyze. If you are building a sales support system you will need data about products, sales transactions, sales personnel and costs. You will probably pull all the data from a back office enterprise application, apply some transformations to summarize the data and then load it into a highly denormalized database. Users will likely want some standard reports that are sufficiently parameterized so they can compare sales by products, sales personnel, sales regions and the other dimensions supported by the data warehouse. A few users may be sufficiently well versed in SQL or comfortable with an ad hoc query tool to build their own reports as well.

Analyzing big data follows a different path. With traditional BI systems, we typically start with a driving reporting need. With big data analysis, we can start with a varied mixture of data types that together might give us some insight into an aspect of our business not captured by sales, inventory or human resources systems. Big data sources can include application log files, machine sensor data, and social media. 

What can this data tell us? To answer that question we have to take a more exploratory approach to big data and start with descriptive statistics and basic unsupervised machine learning techniques.

Descriptive statistics are familiar to most BI practitioners. It includes commonly used calculations such as mean, median, and standard deviation to describe a population. You might be able to calculate the average dollar value of a sales transaction from a sales data mart but you probably cannot find the average time spent in your Web application prior to the sale. Application log data can give you that data. Descriptive statistics are especially useful with big data sets when you partition the data and compare different groups. 

You might have some preconceived notions about how to group your customers. An obvious way is by sales region. That may not be the most informative way to organize and analyze your data. Clustering is a set of unsupervised machine learning techniques that is used to identify subsets of data with similar characteristics. Instead of querying your database to group your customers by sales region, you could use clustering to have the data dictate the groups. For example, clustering might help you identify different groups of customers based on the time spent in your Web application and the amount of their sales transaction.

Once you have identified a group of customers who spend a substantial amount of time, but do not make a purchase, you can analyze their navigation patterns on the site. Since they spend a significant amount of time on your site, you can reasonably hypothesize that they are interested in making a purchase. Are there patterns common to the visitors who do not make a purchase that are not seen in patterns of those who do make a purchase? By exploring big data sets and testing hypotheses you can complement the types of analysis you do with your traditional BI systems.

Ignore Doomsayers: BI Isn't Dead

One final note: do not believe headlines that proclaim traditional BI is dead. Management reporting based on data from transaction processing systems will be useful as long as business fundamentals remain the same. Big data introduces new ways of understanding business operations that complements, not replaces, existing management reporting systems.

MORE: Business Intelligence: Tools, Trends And Buying Guides
MORE: Big Data Analytics: Tools, Trends And Buying Guides
MORE: Best Big Data Certifications