Product and service reviews are conducted independently by our editorial team, but we sometimes make money when you click on links. Learn more.

Next Generation Business Intelligence: Statistical Analysis Tools

Next Generation Business Intelligence: Statistical Analysis Tools

Now that you've built a data warehousing infrastructure and created an array of management reports what will you do next?

If you’ve reached the point where you have generated enough standard reports to meet the needs of your management users and you’ve provided advanced users with ad hoc query tools for more hands on work then it may be time to take a look at another set of tools. Statistical analysis tools like SAS, IBM SPSS and R can help uncover insights from your data that may not be apparent in typical management reports. (See here for Part Two: Data Mining and Business Intelligence)

Statistical Analysis in BI

Both statistical analysis and business intelligence (BI) reporting tools can quickly help you  get a handle on general characteristics of a data set.  For example, it’s a simple matter to find an average or get a sense of the shape of the distribution of a numeric attribute.  Many BI reports are designed to summarize and aggregate data in different subgroups.  With reports like those, you can readily find thetop performing product lines, identify poorly performing sales staff, or get a sense of how varied performance is between regions.

As useful as these types of reports are, they do not show the full range of potentially useful information in a data set. 

Consider the problem of comparing sales between two regions.  You would like to know if one region is performing significantly better than the other so you run one of your core BI reports to see the total sales and average profit margin in each region for the last quarter. Let’s assume that one region’s profit margin is better than the other. It may be the case that one region is outperforming the other or it could be that the difference could reasonably be explained by chance. How can we tell whether a difference in two data sets is significant and therefore useful for making business decisions and when the differences can be explained by chance events and insufficient data? A group of calculations, known as statistical tests, can help us understand how significant differences are. 

Another type of problem that can be solved with statistical analysis tools is predictive modeling.  Imagine you have data on the sales of a particular product and want to predict what the sales for that product will be in the future.  In a simple case, you could plot the value of sales over time and estimate a trend line.  In a simple case, if sales have been growing in the past at 3% per quarter, you might project they will continue to grow at the same rate.  Things can get complicated pretty quickly, though.

Several factors can complicate simple trend line predictions. The sales of some products may follow a seasonal pattern in which case you might see a pattern of increases and decreases instead of a straight line increase.  In other cases, some important variables may be influencing a trend but they are not apparent in management BI reports.  For example, your sales may be growing at 3% per when you look at all sales but if you break down the data by multiple variables you may find that sales are growing at 10% for 20-35 year old males and at much lower rates for other customer groups.

Digging deeper into data with statistics can help you see additional information that could influence your business decisions.  Statistical tools can help you discover those additional pieces of information. It is important to note that statistical tools are not replacements for other BI reporting tools but bring additional capabilities to your existing set of reporting tools.

Dan Sullivan is an author, systems architect, and consultant with over 20 years of IT experience with engagements in systems architecture, enterprise security, advanced analytics and business intelligence. He has worked in a broad range of industries, including financial services, manufacturing, pharmaceuticals, software development, government, retail, gas and oil production, power generation, life sciences, and education.  Dan has written 16 books and numerous articles and white papers about topics ranging from data warehousing, Cloud Computing and advanced analytics to security management, collaboration, and text mining.

See here for all of Dan's Tom's IT Pro articles.

(Shutterstock image credit: Analytics)