Getting Big Data From Redshift

28 בAugust 2016

As part of a machine learning data collection task, I wanted to retrieve ~350 GB of data from Redshift into an AWS server in order to load the data into an IPython notebook and test some machine learning models. Get It All With Python (Attempt #1) Using psycopg2 library to work with Redshift, I tried to fetch the result ('fetchall') into the server, using a simple python Docker container. I initiated the script, only to find out that the server's RAM is bursting out, practically killing the docker container, and the hosting storage. It made sense that the server can’t cope...
