Mastodawn

The best time to move off PowerBI was years ago. The second best time is now.
No-code tools have always held competent analysts back from basic engineering practice: version control, code review, anything you’d expect from a serious team.

https://christopherdillon.me/blog/2026/04/22/powerbi-is-a-tax-on-competence/

#dataengineering #analytics #powerbi

PowerBI Is a Tax on Competence

Why plain text, version-controlled analytics is a categorically different thing from drag-and-drop BI — and why LLMs have widened the gap permanently.

Christopher Dillon

sayzard 8h ago

Show HN: Bundlebase – Docker for Data

Bundlebase는 버전 관리되고 자체 설명이 가능한 데이터 컨테이너를 제공하는 도구로, 서버나 별도의 인프라 없이 Python, SQL, CLI, BI 도구에서 접근할 수 있습니다. 데이터셋의 스키마, 변환 이력, 출처를 포함해 공유하며, 데이터 정제 규칙을 번들에 내장해 반복 작업을 자동화합니다. Apache Arrow, DataFusion, Parquet 등 최신 기술을 활용해 대용량 데이터도 효율적으로 처리하며, LLM 에이전트가 상태를 유지하는 데도 적합합니다. 이는 데이터 파이프라인과 협업을 간소화하는 혁신적 데이터 관리 솔루션입니다.

https://nvoxland.github.io/bundlebase/

#dataengineering #python #sql #apachearrow #dataversioning

Bundlebase — Data Packaging - Bundlebase

Bundlebase packages data into versioned, self-describing containers. Attach CSV, Parquet, or JSON from S3, HTTP, or local files. Query with SQL via Python, CLI, or any BI tool. Share via a path. No database required.

sayzard 12h ago

Most agent reliability problems are data engineering problems

AI 에이전트의 신뢰성 문제는 주로 데이터 엔지니어링 문제에서 비롯되며, 단순한 프롬프트 엔지니어링만으로 해결되지 않는다. 효과적인 에이전트 운영을 위해서는 검색 API의 응답 최적화, 최소 필드 반환을 통한 토큰 비용 절감, 비즈니스 규칙이 반영된 작업 단위 도구 설계, 그리고 스키마 탐색 기능 제공 등 데이터 파이프라인과 도구 설계에 집중해야 한다. 이러한 접근법은 에이전트의 속도와 정확성을 높이고, 감사 로그의 해석 용이성 및 권한 관리에도 도움을 준다.

https://sderosiaux.substack.com/p/from-prompt-engineering-to-data-engineering

#aiagents #dataengineering #searchapi #mcp #schemaintrospection

Fixing the Agent Data Layer: Six Patterns

Tool design, schema discovery, search APIs, and the data layer agents need.

The Technical Executive

Alex Merced 20h ago

Will you be at AI Council in San Francisco next week? If so, come visit me at the Dremio booth!
#DataEngineering #AgenticAI #DataAnalytics

gaby_wald 1d ago

Data Engineering vs. Data Science : rôle, compétences, outils, impact, salaire. Critères : coder/analyser, défis techniques/business, niveau maths. #DataEngineering #DataScience #Carrière #Tech #Débat ... https://www.linkedin.com/posts/gabriel-chandesris_dataengineering-datascience-carriaeyre-share-7457759249834803202-FmFo

#dataengineering #datascience #carrière #tech #débat | Gabriel C.

🤼 "Data Engineering vs. Data Science : Le débat qui divise (et comment trancher selon votre profil)" **Data Engineering** ou **Data Science** ? Le débat fait rage, mais **les deux métiers sont complémentaires**. Voici comment **choisir en fonction de votre profil** : --- 🔹 **📌 Data Engineering : Les Bâtisseurs** | **Critère** | **Data Engineering** | |------------------|-----------------------------------------------| | **Rôle** | Construire et maintenir des **pipelines data**. | | **Compétences** | SQL, Python, Spark, Kafka, Airflow. | | **Outils** | ETL, ELT, data warehouses (Snowflake, Redshift).| | **Impact** | **Fiabilité**, **performance**, **scalabilité**. | | **Salaire** | 45k€ – 80k€ (France). | | **Pour qui ?** | Vous aimez **coder**, **optimiser**, **résoudre des problèmes techniques**. | --- 🔹 **📊 Data Science : Les Explorateurs** | **Critère** | **Data Science** | |------------------|-----------------------------------------------| | **Rôle** | Extraire des **insights** et des **prédictions**. | | **Compétences** | Python, R, SQL, ML (Scikit-learn, TensorFlow).| | **Outils** | Jupyter, Pandas, Tableau, Power BI. | | **Impact** | **Prédictions**, **recommandations**, **optimisation business**. | | **Salaire** | 40k€ – 70k€ (France). | | **Pour qui ?** | Vous aimez les **stats**, les **modèles**, **raconter des histoires avec les données**. | --- 🔹 **💡 Comment choisir ?** 1. **Vous préférez coder ou analyser ?** - **Coder** → Data Engineering. - **Analyser** → Data Science. 2. **Vous aimez les défis techniques ou business ?** - **Techniques** (pipelines, performance) → Data Engineering. - **Business** (impact, insights) → Data Science. 3. **Quel est votre niveau en maths ?** - **Faible** → Data Engineering. - **Fort** → Data Science (le ML nécessite des **maths avancées**). --- 💬 **Et vous, Data Engineer ou Data Scientist ? Pourquoi avez-vous choisi cette voie ?** *(Likez si vous êtes Data Engineer, commentez si vous êtes Data Scientist !)* #DataEngineering #DataScience #Carrière #Tech #Débat

gaby_wald 1d ago

Pipeline data = usine à gaz ? 5 étapes : supprimer doublons, automatiser, optimiser SQL, choisir outils, documenter. Résultats : -50% coûts. #DataEngineering #Tech #Optimisation #Pipeline #Data ... https://www.linkedin.com/posts/gabriel-chandesris_dataengineering-tech-optimisation-share-7457758246821482497-AeAZ

#dataengineering #tech #optimisation #pipeline #data | Gabriel C.

🏭 "Votre pipeline data est une usine à gaz ? Voici 5 étapes pour le simplifier (et économiser 50% de coûts)" J’ai audité **des dizaines de pipelines data** ces dernières années. **80% étaient trop complexes**, coûteux, et **difficiles à maintenir**. Voici **5 étapes pour les simplifier** : --- 🔹 **Étape 1 : Supprimez les doublons** - **Problème** : 3 pipelines qui font **la même chose**. - **Solution** : **Audit complet** → Supprimez les redondances. - **Exemple** : Un client a **réduit ses coûts de 30%** en supprimant 2 pipelines inutiles. 🔹 **Étape 2 : Automatisez les tâches manuelles** - **Problème** : Des rapports générés **à la main** chaque semaine. - **Solution** : **Scripts Python + cron** ou **Airflow**. - **Exemple** : Un client a **gagné 10h/semaine** en automatisant ses rapports. 🔹 **Étape 3 : Optimisez vos requêtes SQL** - **Problème** : Des requêtes qui scannent **des millions de lignes**. - **Solution** : Ajoutez des **index**, utilisez **EXPLAIN ANALYZE** (PostgreSQL). - **Exemple** : Un client a **divisé par 10** le temps d’exécution de ses requêtes. 🔹 **Étape 4 : Choisissez les bons outils** - **Problème** : Utiliser **Spark** pour traiter **100 Mo de données**. - **Solution** : - **< 1 To** → **Pandas** ou **SQL**. - **> 10 To** → **Spark** ou **Dask**. - **Exemple** : Un client a **réduit ses coûts cloud de 50%** en passant de Spark à Pandas. 🔹 **Étape 5 : Documentez tout** - **Problème** : *"Personne ne comprend comment ça marche."* - **Solution** : Un **README.md** par pipeline avec : - **Entrées/Sorties**. - **Dépendances**. - **Owner** (qui contacter en cas de problème ?). - **Exemple** : Un client a **réduit ses bugs de 40%** grâce à une documentation claire. --- 💬 **Et vous, quel est le pipeline le plus "usine à gaz" que vous ayez vu ?** Partagez votre pire exemple en commentaire ! #DataEngineering #Tech #Optimisation #Pipeline #Data

Alex Merced 2d ago

Read here: https://substack.com/@alexmerced1985/note/p-196574481?r=h4f8p&utm_medium=ios&utm_source=notes-share-action

#DataLakehouse #DataEngineering

Foojay.io 2d ago

BoxLang AI 3.0 Series · Part 6 of 7 A chatbot with no memory isn't a conversation — it's a series of isolated queries. Every message starts from scratch. The user has to re-explain who they are, what they're working on, and what was just said. It's...
#AIagents #BoxLang #DATAENGINEERING #Developertools #Embeddings #Java #JVM #LLM #MemorySystems #rag #RetrievalAugmentedGeneration #VectorSearch
https://foojay.io/today/boxlang-ai-deep-dive-part-6-of-7-memory-systems-rag-building-ai-that-remembers/

foojay – a place for friends of OpenJDK

foojay is the place for all OpenJDK Update Release Information. Learn More.

foojay

Helmholtz Metadata Collab.2d ago

🚨 We’re hiring: Knowledge Graph Developer (f/m/d)

📍 Forschungszentrum Jülich

🌐 Build the #Helmholtz Knowledge Graph – connect, model & improve research #metadata across infrastructures.

📅 Apply now
👉 https://www.fz-juelich.de/de/karriere/stellenangebote/2026-070

#MetadataMatters #DataEngineering
@fzj @helmholtz @HelmholtzOpenScienceOffice

Information Engineer / Knowledge Graph Developer

The Institute for Advanced Simulation – Materials Data Science and Informatics (IAS-9) at Forschungszentrum Jülich works at the intersection of data science, research software engineering and semantic technologies. The research focus area “Metadata and Information Systems” focuses on building practical, reusable information systems that support modern, data-driven science. Our work spans the full spectrum from conceptual metadata modeling and ontology development to hands-on software engineering for research data infrastructures. We are strongly involved in large-scale national initiatives such as the Helmholtz Metadata Collaboration (HMC), where we shape metadata services and semantic infrastructure to be embedded in scientific workflows and used across disciplines. A core pillar is the Helmholtz Knowledge Graph which connects metadata from infrastructures across Helmholtz and makes it usable for discovery, integration, and analysis. Our goal is not only to build a coherent knowledge graph, but to make it useful, maintainable, and extensible as part of a living research ecosystem.

gaby_wald 2d ago

Recruter Data Engineer en 1 semaine : profil précis, canaux ciblés, processus accéléré, vente du projet. Résultat : 120 CV → 1 embauche. #Recrutement #DataEngineering #RH #Tech #Urgence ... https://www.linkedin.com/posts/gabriel-chandesris_recrutement-dataengineering-rh-share-7457371387930902528-e6ZA

#recrutement #dataengineering #rh #tech #urgence | Gabriel C.

⚡ "Comment j’ai recruté un Data Engineer en 1 semaine (sans compromis sur la qualité)" En **2024**, recruter un **bon Data Engineer** prend en moyenne **3 mois**. Voici comment j’ai réduit ce délai à **1 semaine** pour un client, **sans baisser les exigences** : 🔹 **Étape 1 : Définir un profil ultra-précis** - **Exemple** : - **Compétences techniques** : Python, Spark, SQL, Airflow. - **Compétences métiers** : Expérience en **pipelines ETL** et **data warehousing**. - **Soft skills** : Autonomie, esprit d’équipe, curiosité. 🔹 **Étape 2 : Utiliser des canaux ciblés** - **LinkedIn** : Filtres avancés (mots-clés : "Data Engineer", "Spark", "ETL"). - **Communautés tech** : Slack (ex : Data Engineering Community), Discord (ex : r/dataengineering). - **Plateformes spécialisées** : **AngelList** (startups), **Hired** (profiles seniors). 🔹 **Étape 3 : Un processus de recrutement accéléré (mais rigoureux)** - **Jour 1** : **Pré-sélection** via un **test technique court** (ex : résoudre un problème SQL en 30 min). - **Jour 3** : **Entretien technique** (1h) avec un **cas pratique** (ex : optimiser un pipeline lent). - **Jour 5** : **Entretien culturel** (30 min) avec l’équipe. - **Jour 7** : **Offre signée**. 🔹 **Étape 4 : Vendre le projet (pas seulement le salaire)** - **Arguments clés** : - *"Vous travaillerez sur un pipeline traitant 10 To de données/jour."* - *"Impact direct sur le chiffre d’affaires (optimisation des coûts de 20%)."* - *"Équipe de 5 Data Engineers + 2 Data Scientists."* 💡 **Résultat** : - **120 candidatures** en 48h. - **3 candidats qualifiés** en entretien. - **1 embauche** en 7 jours. 💬 **Quel est votre meilleur conseil pour recruter rapidement sans sacrifier la qualité ?** #Recrutement #DataEngineering #RH #Tech #Urgence