A Data Pipeline is a set of actions aimed at collecting a stream of data from different channels (appropriately filtered), directing them to a single collection point (repository), where they can be stored and analyzed.
Data pipelines eliminates many of the manual processes that are inherently error-prone by automating the process of extracting data from source points, transforming it and validating it for upload to the target repository.
Types and operation
When we talk about Data Pipelines, one of the concepts that comes alongside is that of:
- ETL pipeline (Extraction, Transformation, Upload). This type of pipeline uses a batch processing system (data is extracted periodically, transformed, and uploaded to a target repository).
For organizations that have to manage large amounts of data, however, the option to follow is:
- ELT pipeline (Extraction, Loading, Transformation). Data is moved in real time from the source systems to the destination respository. This allows users to analyze and create reports without waiting for the IT department to intervene to extract them.
Data Pipeline: benefits
Reduce the risk of human error from processing complex data
Constant mapping of data flows
is guarantee of data quality
Reduced costs and
more reliable processes
Data accessibility and
rapidity of consultation
Real time data
Quick response to changes and