Integrating Data with SQL Server Integration Services

Integrating Data with SQL Server Integration Services
By

Data integration a time consuming challenge for data managers. Data integration a time consuming challenge for data managers. One of the most time consuming challenges facing data managers is data integration. Developers and DBAs can take advantage of SQL Server Integration Services (SSIS) to streamline extraction, transformation and load operations while improving data quality. 

The first step to using SSIS in your data integration operations is to understand how SSIS addresses some of the challenges faced by developers integrating data from multiple sources.

Common data integration challenges include determining how to link logically related data that originates in multiple data sources, detecting and resolving data quality issues, identifying data that has changed since the last data integration operation, and troubleshooting workflows with potentially complex control flows.SQL Server Integration Services includes features to address these tasks and more.

Complex Data Transformations

When the data you are interested in is stored in a single database there will likely be logical constraints that enforce data integrity. For example, rows of related data in different tables can be linked, or joined in database parlance, by a shared key; constraints ensure that those keys are consistently applied.  In cases where you have to integrate data from multiple sources there is no guarantee that you will have consistently used keys to join. In these cases you have to use other pieces of data to determine how data is related.

You might, for example, match a customer database with a demographic data set by matching on address. Such operations can require pre-processing (e.g. standardizing address format).  With SQL Server Integration Services developers can use Microsoft Visual C# and Visual Basic.NET  to create modules that implement the preprocessing and matching logic. Extraction, transformation, and load (ETL) code can be stored in a catalog of projects and packages and this can help streamline reuse of existing packages.

Data Quality

Even though database designers and developers can take steps to ensure high quality data, problems can and do arise.  Users may not know the proper value to use in a data entry form, an application bug may not export data correctly, or legacy systems may not have sufficient data quality enforcement to meet your requirements.These kinds of data quality problems are common.

SSIS Data Quality Services (DQS) provide features to detect and correct data quality problems.

DQS includes an ETL component called the DQS Cleansing Transform which you can incorporate into your ETL workflows.  This transform uses a data quality knowledge base for validating data.A DQS knowledge base contains data and rules for identifying data quality problems which are organized into domains. A domain can include valid values, rules and relations, and third party reference data.  Domains are used to specify valid values for types of data, such as addresses and company names.

When you deploy the DQS transform you apply a domain to an input value, such as customer address.  The transform can generate suggestions and corrections or flag some data as invalid.  There may be cases where you need to track changes for audit purposes.  In those cases, DQS transform can generate reasons for changes, e.g.  applied formatting rule, applied data cleansing algorithm, rounded decimal value to integer value.

Dann SullivanDann SullivanDan Sullivan is an author, systems architect, and consultant with over 20 years of IT experience with engagements in systems architecture, enterprise security, advanced analytics and business intelligence. He has worked in a broad range of industries, including financial services, manufacturing, pharmaceuticals, software development, government, retail, gas and oil production, power generation, life sciences, and education.  Dan has written 16 books and numerous articles and white papers about topics ranging from data warehousing, Cloud Computing and advanced analytics to security management, collaboration, and text mining.

See here for all of Dan's Tom's IT Pro articles.

(Shutterstock image credit: Digital Numbers)

Take your big ideas off the back burner with Converged Infrastructure

Check Out These IT Videos

VIDEO: Big Data, Big Hardware, Big Software

VIDEO: The Cloud Is Inevitable - Get Training Now!

VIDEO: Cloud Services: No Buzz, No Bull

Video: The Business Case for Cloud Computing

Video: Cloud Contracts - What's In Them

Comments