Product and service reviews are conducted independently by our editorial team, but we sometimes make money when you click on links. Learn more.
 

Next Generation Business Intelligence: Data Mining

Next Generation Business Intelligence: Data Mining
By

Tools and techniques to extract even more information from your data.Tools and techniques to extract even more information from your data.

Ad hoc query reporting and OLAP analysis are important elements of business intelligence (BI) but additional tools are making their way into  BI infrastructure.  In Part 1 of this set of articles on next generation business intelligence tools, we covered statistical analysis packages; in this article we will turn our attention to data mining tools and techniques.

Business should have a corollary to the adage “those who cannot remember the past are condemned to repeat it.”  In the case of business, problems arise when we keep repeating past practices without understanding the effects of those practices. For example, assume you run a marketing campaign with advertising across multiple channels. You get fairly decent results so you repeat the same activities in the next campaign.

You could keep doing this ad nauseum without improving the results. Unless all channels are equally effective you could improve performance if you analyzed sales by channel and invested more in the better performing channels.

This is a simple case that can be readily analyzed with basic ad hoc query tools.  If you want to expand the scope of your analysis from a single attribute, e.g. advertising channel, to thousands of different products, customer demographics and operational characteristics, you will need statistical analysis and data mining tools.

The practice of data mining includes a number of broad methods that include classification, clustering, and market basket analysis, among others. 

Classification algorithms are used to predict how a person, object, transaction or event should be categorized.  You might want to categorize current customers into three categories: loyal, possibly leaving, likely to leave.  With data about customer characteristics, purchasing patterns, etc. and a sufficient number of examples of each type of customer, you can use classification models to help understand how each current customer should be categorized.

Clustering techniques also group objects together but unlike classification algorithms, these techniques do not use a predefined set of categories. Clustering is useful for exploring your data and finding groupings implied by the data itself. Consider the sales transactions of a consumer electronics store.  If you clustered customers by sales you might have several groups emerge: customers who buy primarily home appliances and videos; customers who buy video games; customers who buy video games, videos, and computer accessories, etc.

Again, this is another simple example. As you increase the number of customer characteristics, such as household income, family status, and location, and details about sales transaction, it becomes increasingly difficult to detect groups with the aid of a clustering tool.

Market basket analysis, also known as associative analysis, is useful for analyzing groups of items that commonly appear together in the data. A supermarket manager would not be surprised to find bread, peanut butter and jam frequently purchased together in US stores. Less obvious groupings can be found using market basket analysis techniques. These can help support the development of cross selling efforts, loss leader campaigns, or product placement within a physical store.

Getting started in data mining has never been easier. If you want to analyze your data in-house you have the option of using one of the popular statistical analysis packages that include data mining functions, such as SAS, IBM SPSS, and R.  Relational databases, including Oracleand Microsoft SQL Server, include support for data mining as well.  If you prefer a data mining service you have options as well, including Datameer, Yottamine Analytics or Googles Prediction API.

Dan Sullivan is an author, systems architect, and consultant with over 20 years of IT experience with engagements in systems architecture, enterprise security, advanced analytics and business intelligence. He has worked in a broad range of industries, including financial services, manufacturing, pharmaceuticals, software development, government, retail, gas and oil production, power generation, life sciences, and education.  Dan has written 16 books and numerous articles and white papers about topics ranging from data warehousing, Cloud Computing and advanced analytics to security management, collaboration, and text mining.

See here for all of Dan's Tom's IT Pro articles.

(Shutterstock image credit: Data Mining)