Amazon Web Services (AWS) introduces AWS Data Pipeline, which is a new service that manages data flow from disparate locations to correct destinations.
Data flow and processing can be applied to any amount of data and can be determined via data-driven workflows and dependency checking.
Each Data Pipeline is defined via a JSON text file and includes a set of data sources, preconditions, destinations, processing steps, and an operational schedule. Once the pipeline is defined, it can be run at a regular schedule. Amazon will also offer a drag-and-drop UI to schedule and run pipelines.
For example, Amazon said, a customer could "arrange to copy log files from a cluster of Amazon EC2 instances to an S3 bucket every day, and then launch a massively parallel data analysis job on an Elastic MapReduce cluster once a week."
Optionally, Amazon allows users to set a precondition that must exist before a pipeline can be executed. A precondition could be, for example, the presence of an input file. A completed pipeline job will result in a message sent to the Amazon SNS.
Amazon said that the AWS Data Pipeline is currently in a limited beta. Users who are interested in participating should contact AWS sales.
Wolfgang GruenerWolfgang Gruener is a contributor to Tom's IT Pro. He is currently principal analyst at Ndicio Research, a market analysis firm that focuses on cloud computing and disruptive technologies, and maintains the conceivablytech.com blog. An 18-year veteran in IT journalism and market research, he previously published TG Daily and was managing editor of Tom's Hardware news, which he grew from a link collection in the early 2000s into one of the most comprehensive and trusted technology news sources.
See here for all of Wolfgang's Tom's IT Pro articles.
Check Out These IT Videos