/
Tech-study-notes

Data Pipelines

ETL and ELT

They are two data-processing approaches for analytics. Large organizations have several hundred (or even thousands) of data sources from all aspects of their operationsโ€”like applications, sensors, IT infrastructure, and third-party partners. They have to filter, sort, and clean this large data volume to make it useful for analytics and business intelligence. The ETL approach uses a set of business rules to process data from several sources before centralized integration.

Both are composed of three steps:

Key Differences

ETL vs ELT
AspectETLELT
Transform and load locationData is transformed before loading, on a secondary processing server. The transformation stage ensures compliance with the target database’s structural requirements. You only move the data once it is transformed and ready.Raw data is loaded directly into the target data warehouse, then transformed. You can interact with and transform the raw data as many times as needed
Data compatibilityPrimarily structured, tabular data with rows and columns. It transforms one set of structured data into another structured format and then loads it.Structured and unstructured data (e.g., images, documents)
PerformanceSlower and harder to scale due to pre-load processingFaster; leverages cloud warehouse parallel processing
Cost & complexityHigher setup and infrastructure costsSimpler stack with lower setup and maintenance costs
SecurityRequires custom solutions for PII protectionBuilt-in warehouse security (access control, MFA, etc.)
Use casesHighly structured, stable reporting, Strict data governance or compliance, Legacy or on-prem systems, Low volume & predictable pipelines, Complex transformations requiring custom logicCloud-native analytics platforms, Mixed or evolving data sources, Exploratory analytics and data science, Near real-time or high-volume data ingestion

ETL has been around since the 1970s, becoming especially popular with the rise of data warehouses. However, traditional data warehouses required custom ETL processes for each data source. The evolution of cloud technologies changed what was possible. Companies could now store unlimited raw data at scale and analyze it later as required. ELT became the modern data integration method for efficient analytics.


Idempotency

Idempotency means that if you perform the same action multiple times, you get the same result every time. Why does it matter?

Idempotency can be implemented with: