Member-only story

Query Any Data, Anywhere-Simplify Workflows with DuckDB

Ramesh Ponnusamy
3 min readJan 6, 2025

Engineering is all about simplicity. We often use lightweight technologies to solve problems instead of overly complex ones, wherever we can replace them.

DuckDB is a very good example. It doesn’t need any complex setup. It’s like SQLite’s big brother.

SQLite is used for OLTP, but DuckDB is designed for OLAP.

Choose Right Tool From Store

DuckDB is especially useful for data processing, particularly with file systems like CSV and Parquet. The best part is that we can directly query cloud files without downloading them. We can run SQL queries on these files for easy data transformations.

Official Documentation: https://duckdb.org/docs/api/python/overview.html

Usage of DuckDB

  • Embedded Analytics: Load a CSV and query directly using SQL.
  • Local Analytics: Run queries on Parquet files.
  • Data Science Workflows: Use SQL for data preprocessing and model preparation.
  • Ad-Hoc Queries: Perform quick analytics on large datasets.
import duckdb

# Query a CSV file directly
query = """
SELECT *
FROM 'sample.csv'
WHERE column1 > 100
"""
result = duckdb.query(query).to_df()
print(result)

# Query a Parquet file locally
query = """
SELECT *…

--

--

Ramesh Ponnusamy
Ramesh Ponnusamy

Written by Ramesh Ponnusamy

Data-Architect, SQL Master,Python ,Django, Flask dev, AI prompting, Linked-in: https://www.linkedin.com/in/ramesh-ponnusamy/ mail : ramramesh1374@gmail.com

Responses (1)