#introduction
Hi! I'm Jeffrey, recently relocated from Seattle to South Surrey. Dual US/Canadian, genuine mutt, settling into BC life.
Data engineer & independent researcher building knowledge graphs — antebellum US politics, and supply chain chokepoints (fertilizer, oranges, cocoa, coffee).
Off the clock: sourdough, smoking meats, pizza, cheesemaking. US & Canadian history. Homelab K8s. Raspberry Pi wildlife cams.
Classics background. Former Mormon. Supertaster.
#DataEngineering #History #BC

In this #InfoQ podcast, Somtochi Onyekwere breaks down:
• Recent developments in #DistributedDataSystems
• How to achieve fast, eventually consistent replication across distributed nodes
• Using #CRDTs (Conflict-free Replicated Data Types) to resolve data conflicts seamlessly

🎧 Listen here: https://bit.ly/49OiNaE

#DataEngineering #EventualConsistency

The best time to move off PowerBI was years ago. The second best time is now.
No-code tools have always held competent analysts back from basic engineering practice: version control, code review, anything you’d expect from a serious team.

https://christopherdillon.me/blog/2026/04/22/powerbi-is-a-tax-on-competence/

#dataengineering #analytics #powerbi

PowerBI Is a Tax on Competence

Why plain text, version-controlled analytics is a categorically different thing from drag-and-drop BI — and why LLMs have widened the gap permanently.

Christopher Dillon

Show HN: Bundlebase – Docker for Data

Bundlebase는 버전 관리되고 자체 설명이 가능한 데이터 컨테이너를 제공하는 도구로, 서버나 별도의 인프라 없이 Python, SQL, CLI, BI 도구에서 접근할 수 있습니다. 데이터셋의 스키마, 변환 이력, 출처를 포함해 공유하며, 데이터 정제 규칙을 번들에 내장해 반복 작업을 자동화합니다. Apache Arrow, DataFusion, Parquet 등 최신 기술을 활용해 대용량 데이터도 효율적으로 처리하며, LLM 에이전트가 상태를 유지하는 데도 적합합니다. 이는 데이터 파이프라인과 협업을 간소화하는 혁신적 데이터 관리 솔루션입니다.

https://nvoxland.github.io/bundlebase/

#dataengineering #python #sql #apachearrow #dataversioning

Bundlebase — Data Packaging - Bundlebase

Bundlebase packages data into versioned, self-describing containers. Attach CSV, Parquet, or JSON from S3, HTTP, or local files. Query with SQL via Python, CLI, or any BI tool. Share via a path. No database required.

Most agent reliability problems are data engineering problems

AI 에이전트의 신뢰성 문제는 주로 데이터 엔지니어링 문제에서 비롯되며, 단순한 프롬프트 엔지니어링만으로 해결되지 않는다. 효과적인 에이전트 운영을 위해서는 검색 API의 응답 최적화, 최소 필드 반환을 통한 토큰 비용 절감, 비즈니스 규칙이 반영된 작업 단위 도구 설계, 그리고 스키마 탐색 기능 제공 등 데이터 파이프라인과 도구 설계에 집중해야 한다. 이러한 접근법은 에이전트의 속도와 정확성을 높이고, 감사 로그의 해석 용이성 및 권한 관리에도 도움을 준다.

https://sderosiaux.substack.com/p/from-prompt-engineering-to-data-engineering

#aiagents #dataengineering #searchapi #mcp #schemaintrospection

Fixing the Agent Data Layer: Six Patterns

Tool design, schema discovery, search APIs, and the data layer agents need.

The Technical Executive
Will you be at AI Council in San Francisco next week? If so, come visit me at the Dremio booth!
#DataEngineering #AgenticAI #DataAnalytics
Data Engineering vs. Data Science : rôle, compétences, outils, impact, salaire. Critères : coder/analyser, défis techniques/business, niveau maths. #DataEngineering #DataScience #Carrière #Tech #Débat ... https://www.linkedin.com/posts/gabriel-chandesris_dataengineering-datascience-carriaeyre-share-7457759249834803202-FmFo
#dataengineering #datascience #carrière #tech #débat | Gabriel C.

🤼 "Data Engineering vs. Data Science : Le débat qui divise (et comment trancher selon votre profil)" **Data Engineering** ou **Data Science** ? Le débat fait rage, mais **les deux métiers sont complémentaires**. Voici comment **choisir en fonction de votre profil** : --- 🔹 **📌 Data Engineering : Les Bâtisseurs** | **Critère** | **Data Engineering** | |------------------|-----------------------------------------------| | **Rôle** | Construire et maintenir des **pipelines data**. | | **Compétences** | SQL, Python, Spark, Kafka, Airflow. | | **Outils** | ETL, ELT, data warehouses (Snowflake, Redshift).| | **Impact** | **Fiabilité**, **performance**, **scalabilité**. | | **Salaire** | 45k€ – 80k€ (France). | | **Pour qui ?** | Vous aimez **coder**, **optimiser**, **résoudre des problèmes techniques**. | --- 🔹 **📊 Data Science : Les Explorateurs** | **Critère** | **Data Science** | |------------------|-----------------------------------------------| | **Rôle** | Extraire des **insights** et des **prédictions**. | | **Compétences** | Python, R, SQL, ML (Scikit-learn, TensorFlow).| | **Outils** | Jupyter, Pandas, Tableau, Power BI. | | **Impact** | **Prédictions**, **recommandations**, **optimisation business**. | | **Salaire** | 40k€ – 70k€ (France). | | **Pour qui ?** | Vous aimez les **stats**, les **modèles**, **raconter des histoires avec les données**. | --- 🔹 **💡 Comment choisir ?** 1. **Vous préférez coder ou analyser ?** - **Coder** → Data Engineering. - **Analyser** → Data Science. 2. **Vous aimez les défis techniques ou business ?** - **Techniques** (pipelines, performance) → Data Engineering. - **Business** (impact, insights) → Data Science. 3. **Quel est votre niveau en maths ?** - **Faible** → Data Engineering. - **Fort** → Data Science (le ML nécessite des **maths avancées**). --- 💬 **Et vous, Data Engineer ou Data Scientist ? Pourquoi avez-vous choisi cette voie ?** *(Likez si vous êtes Data Engineer, commentez si vous êtes Data Scientist !)* #DataEngineering #DataScience #Carrière #Tech #Débat

LinkedIn
Pipeline data = usine à gaz ? 5 étapes : supprimer doublons, automatiser, optimiser SQL, choisir outils, documenter. Résultats : -50% coûts. #DataEngineering #Tech #Optimisation #Pipeline #Data ... https://www.linkedin.com/posts/gabriel-chandesris_dataengineering-tech-optimisation-share-7457758246821482497-AeAZ
#dataengineering #tech #optimisation #pipeline #data | Gabriel C.

🏭 "Votre pipeline data est une usine à gaz ? Voici 5 étapes pour le simplifier (et économiser 50% de coûts)" J’ai audité **des dizaines de pipelines data** ces dernières années. **80% étaient trop complexes**, coûteux, et **difficiles à maintenir**. Voici **5 étapes pour les simplifier** : --- 🔹 **Étape 1 : Supprimez les doublons** - **Problème** : 3 pipelines qui font **la même chose**. - **Solution** : **Audit complet** → Supprimez les redondances. - **Exemple** : Un client a **réduit ses coûts de 30%** en supprimant 2 pipelines inutiles. 🔹 **Étape 2 : Automatisez les tâches manuelles** - **Problème** : Des rapports générés **à la main** chaque semaine. - **Solution** : **Scripts Python + cron** ou **Airflow**. - **Exemple** : Un client a **gagné 10h/semaine** en automatisant ses rapports. 🔹 **Étape 3 : Optimisez vos requêtes SQL** - **Problème** : Des requêtes qui scannent **des millions de lignes**. - **Solution** : Ajoutez des **index**, utilisez **EXPLAIN ANALYZE** (PostgreSQL). - **Exemple** : Un client a **divisé par 10** le temps d’exécution de ses requêtes. 🔹 **Étape 4 : Choisissez les bons outils** - **Problème** : Utiliser **Spark** pour traiter **100 Mo de données**. - **Solution** : - **< 1 To** → **Pandas** ou **SQL**. - **> 10 To** → **Spark** ou **Dask**. - **Exemple** : Un client a **réduit ses coûts cloud de 50%** en passant de Spark à Pandas. 🔹 **Étape 5 : Documentez tout** - **Problème** : *"Personne ne comprend comment ça marche."* - **Solution** : Un **README.md** par pipeline avec : - **Entrées/Sorties**. - **Dépendances**. - **Owner** (qui contacter en cas de problème ?). - **Exemple** : Un client a **réduit ses bugs de 40%** grâce à une documentation claire. --- 💬 **Et vous, quel est le pipeline le plus "usine à gaz" que vous ayez vu ?** Partagez votre pire exemple en commentaire ! #DataEngineering #Tech #Optimisation #Pipeline #Data

LinkedIn
BoxLang AI 3.0 Series · Part 6 of 7 A chatbot with no memory isn't a conversation — it's a series of isolated queries. Every message starts from scratch. The user has to re-explain who they are, what they're working on, and what was just said. It's...
#AIagents #BoxLang #DATAENGINEERING #Developertools #Embeddings #Java #JVM #LLM #MemorySystems #rag #RetrievalAugmentedGeneration #VectorSearch
https://foojay.io/today/boxlang-ai-deep-dive-part-6-of-7-memory-systems-rag-building-ai-that-remembers/
foojay – a place for friends of OpenJDK

foojay is the place for all OpenJDK Update Release Information. Learn More.

foojay