Member-only story

Process Large Data Using SQL-Offset and Python efficiently-Logic

2 min readJan 8, 2023

Have Large data in the database and want to process it in SQL Itself, then use the power full option OFFSET LIMIT in SQL to split the record.

16GB Memory holds 40GB Data? The answer is No.

ex: Required to manipulate large data while reading, SQL is the best place to do it, because it is faster than a programming language. You can’t simply run select * from mylargetable on LARGE Dataset. It will hang!……

But can split the Large dataset into N-Number of Small datasets in SQL-QUERY itself By Doing the Following steps :

Get the total count of records from a table and Identify how many batches have to create
Generate SQL queries dynamically with n-number offset.- Each batch Each SQL Query
These queries help to implement multithreading.
Store those SQL queries in LIST or DICT-(Python) or Temp Table-(SQL)
Execute SQL queries in LOOP or MultiThredding.

#pip install psycopg2 pandas

import psycopg2
import pandas as pd
from concurrent.futures import ThreadPoolExecutor

#Initiate DB Connection
connection = psycopg2.connect(host="localhost",
    database="mydatabase",
    user="user",
    password="password"
)

#Get the total count of records from table #
count_query =…

Process Large Data Using SQL-Offset and Python efficiently-Logic

Written by Ramesh Ponnusamy

Responses (1)