How to Bulk Load Data into PostgreSQL with Python
- By : Mydatahack
- Category : Data Engineering, Data Ingestion
- Tags: Bulk Load, Postgres, psycopg2, Python
Bulk loading with the copy command from a CSV file is the fastest option to load a large table with Postgres. In fact, loading data from a flat file is the fastest option in any relational databases. When you have a large table and need to load it to another database, the fastest way is to unload it to a flat file and upload it to the database table.
Let’s first import all the necessary modules. The code here works for both Python 2.7 and 3. The psycopg2 module is pretty much your best friends to connect to Postgres with Python.
1 2 3 | import psycopg2 import pandas as pd import sys |
Exporting Table to CSV
Let’s export a table to a csv file. To export an entire table, you can use select * on the target table. Panda’s read_sql function will convert the query result into Pandas’ dataframe. To create a CSV file, you can use to_csv on the dataframe.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 | def table_to_csv(sql, file_path, dbname, host, port, user, pwd): ''' This function creates a csv file from PostgreSQL with query ''' try: conn = psycopg2.connect(dbname=dbname, host=host, port=port,\ user=user, password=pwd) print("Connecting to Database") # Get data into pandas dataframe df = pd.read_sql(sql, conn) # Write to csv file df.to_csv(file_path, encoding='utf-8', header = True,\ doublequote = True, sep=',', index=False) print("CSV File has been created") conn.close() except Exception as e: print("Error: {}".format(str(e))) sys.exit(1) # Execution Example with transaction table sql = 'Select * From sf.transaction' file_path = '/tmp/transaction.csv' dbname = 'db name' host = 'host url' port = '5432' user = 'username' pwd = 'password' table_to_csv(sql, file_path, dbname, host, port, user, pwd) |
Importing CSV to Table
The copy_expert function will give you good option to copy a CSV file. The function takes the input file path, target table and connection details. Note that the column order should be the same between the CSV file and the table.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 | def pg_load_table(file_path, table_name, dbname, host, port, user, pwd): ''' This function upload csv to a target table ''' try: conn = psycopg2.connect(dbname=dbname, host=host, port=port,\ user=user, password=pwd) print("Connecting to Database") cur = conn.cursor() f = open(file_path, "r") # Truncate the table first cur.execute("Truncate {} Cascade;".format(table_name)) print("Truncated {}".format(table_name)) # Load table from the file with header cur.copy_expert("copy {} from STDIN CSV HEADER QUOTE '\"'".format(table_name), f) cur.execute("commit;") print("Loaded data into {}".format(table_name)) conn.close() print("DB connection closed.") except Exception as e: print("Error: {}".format(str(e))) sys.exit(1) # Execution Example file_path = '/tmp/restaurants.csv' table_name = 'usermanaged.restaurants' dbname = 'db name' host = 'host url' port = '5432' user = 'username' pwd = 'password' pg_load_table(file_path, table_name, dbname, host, port, user, pwd) |