Mastodawn

MS SQL Arrow

mssql-python 드라이버가 이제 Apache Arrow 구조를 직접 지원하여 SQL Server에서 데이터를 Polars, Pandas, DuckDB 등 Arrow 네이티브 라이브러리로 빠르고 메모리 효율적으로 가져올 수 있게 되었다. 이 기능은 Python 객체 생성과 가비지 컬렉션 부담을 줄여 특히 DATETIMEOFFSET 같은 시간 관련 타입에서 큰 성능 향상을 제공한다. 기존 fetch API와 호환되며, 배치 단위 또는 스트리밍 방식으로 데이터를 처리할 수 있어 대용량 데이터 처리에 적합하다. 현재 Linux에서 NVARCHAR 타입의 성능 개선 작업이 진행 중이다.

https://devblogs.microsoft.com/python/introducing-apache-arrow-support-in-mssql-python/

#mssqlpython #apachearrow #python #sqlserver #dataframe

Introducing Apache Arrow Support in mssql-python - Microsoft for Python Developers Blog

Efficient Data Fetching from SQL Server via Apache Arrow

Microsoft for Python Developers Blog

Jesus Castagnetto 🇵🇪May 9

#TIL about a #graph #knowledge engine that supports versioning #git style: #Omnigraph

https://github.com/ModernRelay/omnigraph

#OpenSource #Rustlang #ApacheArrow

GitHub - ModernRelay/omnigraph: Lakehouse-native graph engine with git-style workflows

Lakehouse-native graph engine with git-style workflows - ModernRelay/omnigraph

GitHub

sayzard May 8

Show HN: Bundlebase – Docker for Data

Bundlebase는 버전 관리되고 자체 설명이 가능한 데이터 컨테이너를 제공하는 도구로, 서버나 별도의 인프라 없이 Python, SQL, CLI, BI 도구에서 접근할 수 있습니다. 데이터셋의 스키마, 변환 이력, 출처를 포함해 공유하며, 데이터 정제 규칙을 번들에 내장해 반복 작업을 자동화합니다. Apache Arrow, DataFusion, Parquet 등 최신 기술을 활용해 대용량 데이터도 효율적으로 처리하며, LLM 에이전트가 상태를 유지하는 데도 적합합니다. 이는 데이터 파이프라인과 협업을 간소화하는 혁신적 데이터 관리 솔루션입니다.

https://nvoxland.github.io/bundlebase/

#dataengineering #python #sql #apachearrow #dataversioning

Bundlebase — Data Packaging - Bundlebase

Bundlebase packages data into versioned, self-describing containers. Attach CSV, Parquet, or JSON from S3, HTTP, or local files. Query with SQL via Python, CLI, or any BI tool. Share via a path. No database required.

Antoine Pitrou May 6

Announcing the first ever Apache Arrow and Parquet meetup in Paris, kindly hosted by @datadoghq .

If you’re using Arrow or Parquet, looking for insights, or wanting to meet other community members, this meetup is for you. Please register if you plan to attend!

https://luma.com/6ed1oko1

#apachearrow #apacheparquet

Apache Arrow / Parquet - June 2026 meetup in Paris · Luma

Details We’re excited to announce the first ever Apache Arrow and Parquet meetup in Paris! This meetup will be hosted on June 18th by Datadog, in their…

amoeba May 4

We're excited to announce the release of {arrow} 24.0.0 🏹📦

Here's a roundup of the new features and changes in a 🧵

Full details can be found at https://arrow.apache.org/docs/r/news/

#rstats #apachearrow

Changelog

Recce - Trust, Verify, Ship Apr 7

"Arrow has the intricacy of a fine Swiss watch." The co-creator of Apache Arrow on why AI agents cannot replicate decade-long infrastructure design.

#ApacheArrow #DataRenegades

Recce - Trust, Verify, Ship Mar 3

Wes McKinney built pandas in a mouse-infested NYC apartment on founder hours. Now he runs parallel Claude Code sessions and says AI is forcing "radical accountability" on every software vendor shipping mediocre products. Full conversation: https://youtu.be/Uso8-yaERkE

#DataRenegades #pandas #ApacheArrow

Posit Mar 2

What happens after you outgrow your memory limits? 🤔
Creator of pandas and Apache Arrow, Wes McKinney, takes the stage at #positconf 2026 to discuss the next frontier of analytical computing and agentic software engineering. 🏗️
Don't just use the tools—meet the person building the foundation of the modern data stack.
👉 Grab your spot: pos.it/conf
#positconf #DataScience #ApacheArrow #Ibis #Python