Member-only story
Process Large Data Using SQL-Offset and Python efficiently-Logic
Have Large data in the database and want to process it in SQL Itself, then use the power full option OFFSET LIMIT in SQL to split the record.
16GB Memory holds 40GB Data? The answer is No.
ex: Required to manipulate large data while reading, SQL is the best place to do it, because it is faster than a programming language. You can’t simply run select * from mylargetable
on LARGE Dataset. It will hang!……
But can split the Large dataset into N-Number of Small datasets in SQL-QUERY itself By Doing the Following steps :
- Get the total count of records from a table and Identify how many batches have to create
- Generate SQL queries dynamically with n-number offset.- Each batch Each SQL Query
These queries help to implement multithreading. - Store those SQL queries in LIST or DICT-(Python) or Temp Table-(SQL)
- Execute SQL queries in LOOP or MultiThredding.
#pip install psycopg2 pandas
import psycopg2
import pandas as pd
from concurrent.futures import ThreadPoolExecutor
#Initiate DB Connection
connection = psycopg2.connect(host="localhost",
database="mydatabase",
user="user",
password="password"
)
#Get the total count of records from table #
count_query =…