neeeringute / ETL-pipeline Public

Notifications You must be signed in to change notification settings
Fork 0
Star 1

The project aims to create an ETL pipeline. ETL is a data pipeline that collects data from different sources, transforms it according to business requirements, and loads it into a destination data storage.

1 star 0 forks Branches Tags Activity

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
src		src
.gitignore		.gitignore
README.md		README.md
env.example		env.example
main.py		main.py
requirements.txt		requirements.txt

Repository files navigation

ETL Pipeline v1

Introduction

This code contains the steps to build an ETL pipeline that carries out the following tasks:

Extracts 400k transactions from Redshift
Identifies and removes duplicates
Loads the transformed data to a s3 bucket

Requirements

The minimum requirements:

Python 3+

Instructions on how to execute the code

Clone the repository, and go to the week19 folder

Install the libraries that they need to run main.py

pip3 install -r requirements.txt

Copy the .env.copy file to .envand fill out the environment variabls.
Run the main.py script Mac users:

python3 main.py

python main.py

About

The project aims to create an ETL pipeline. ETL is a data pipeline that collects data from different sources, transforms it according to business requirements, and loads it into a destination data storage.

Report repository

Releases

No releases published

Packages

No packages published

Languages

Python 100.0%