This code contains the steps to build an ETL pipeline that carries out the following tasks:
- Extracts 400k transactions from Redshift
- Identifies and removes duplicates
- Loads the transformed data to a s3 bucket
The minimum requirements:
- Python 3+
- Clone the repository, and go to the week19 folder
- Install the libraries that they need to run
main.py
pip3 install -r requirements.txt
-
Copy the
.env.copy
file to.env
and fill out the environment variabls. -
Run the
main.py
script Mac users:
python3 main.py
python main.py