Skip to content

The project aims to create an ETL pipeline. ETL is a data pipeline that collects data from different sources, transforms it according to business requirements, and loads it into a destination data storage.

Notifications You must be signed in to change notification settings

neeeringute/ETL-pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ETL Pipeline v1

Web capture_25-1-2024_123919_

Introduction

This code contains the steps to build an ETL pipeline that carries out the following tasks:

  • Extracts 400k transactions from Redshift
  • Identifies and removes duplicates
  • Loads the transformed data to a s3 bucket

Requirements

The minimum requirements:

  • Python 3+

Instructions on how to execute the code

  1. Clone the repository, and go to the week19 folder

  1. Install the libraries that they need to run main.py
pip3 install -r requirements.txt
  1. Copy the .env.copy file to .envand fill out the environment variabls.

  2. Run the main.py script Mac users:

python3 main.py
python main.py

About

The project aims to create an ETL pipeline. ETL is a data pipeline that collects data from different sources, transforms it according to business requirements, and loads it into a destination data storage.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages