Mastodawn

Apache Spark vs. Apache Kafka: A Comprehensive Technical Comparison

https://www.automq.com/blog/apache-spark-vs-kafka-comparison-event-streaming-processing

Apache Spark vs. Apache Kafka: A Comprehensive Technical Comparison

AutoMQ offers cloud-native scalability with Kafka compatibility and cost efficiency, transforming how organizations handle streaming data with high performance and low overhead.

InfoQ 3d ago

Databricks is contributing the tech behind Delta Live Tables (DLT) to the #ApacheSpark project!

It will now be known as Spark Declarative Pipelines, making it easier to develop & maintain streaming pipelines for all Spark users.

🔗 Learn more: https://bit.ly/3IkaM3a

#InfoQ #SoftwareArchitecture #opensource

Andreas Scherbaum Jul 4

Today is the DBA Appreciation Day!

Bring your DBAs a cake and a coffee, please. And don't drop any tables in production, pretty please. It's weekend ...

#PostgreSQL #SQLServer #Oracle #DB2 #MySQL #MariaDB #Snowflake #SQLite #Neo4j #Teradata #SAPHana #Aerospike #ApacheSpark #Clickhouse #Informix #WarehousePG #Greenplum #Adabas

Overture Rede Private Limited Jun 30

Training Requirement: Freelance Trainer – Big Data & Spark

Location: Pune, Mumbai | Duration: Project-Based / Part-Time
Experience: 10+ years

📩 Email: amritk1@overturerede.com

📞 Call/WhatsApp: +91 9289118667

You can also explore and apply to current openings here:
🔗 https://zurl.co/3fAbr

#BigData #ApacheSpark #FreelanceTrainer #DataEngineering #HiringNow #RemoteJobs #SparkSQL #DataFrames #KafkaStreaming #TechTraining #HadoopEcosystem #MLlib #RealTimeAnalytics

Ronan Jun 23

https://semyonsinchenko.github.io/ssinchenko/post/why-spark-is-slow/

#ApacheSpark

Why Apache Spark is often considered as slow?

The question about why Apache Spark is "slow" is one of the most often questions I'm hearing from junior engineers and peoples I'm mentoring. While that is partially true, it should be clarified. TLDR – OSS Spark is a multi-purpose engine that is designed to handle different kinds of workloads. Under the hood of Spark is using a data-centric code generation but also it has some vectorization as well as option to fallbak to a pure Volcano-mode. Because of that Spark can be considred as a hybrid engine, that can benefit from all the approaches. But because of it's multi-purpose nature it will be almost always slower compared to pure vectorized engines like Trino on OLAP workloads on top of columnar data, except rare cases of big amount of nulls or deep branching in the query. In this blogpost I'm trying to explain the statement above.

Sem Sinchenko

Dirk Van den Poel May 25

Last week, the 2025 Edition of our “Current Data Science for Business Students Meet Alumni” Event took place at the Facultyof Economics and Business Administration (Ghent University). #ORMS #DataScience #DataAnalytics #Python #ApacheSpark #SQL

https://www.linkedin.com/pulse/current-ds4b-students-meet-alumni-2025-edition-dirk-van-den-poel-cdsbe

Current DS4B Students Meet Alumni (2025 Edition)

Last week, the 2025 Edition of the “Current Data Science for Business Students Meet Alumni” event was held at the Faculty of Economics and Business Administration of Ghent University, Belgium. Four DS4B graduates (Elke, Karac, Lieselot, and Charles) shared their work experience in Data Analytics and

Markus Breuer Mar 18

🚀 From 24h to 20min – A Small Change, Huge Impact!

A Spark query ran almost a full day on a large dataset. Stats showed 300GB traffic between worker nodes! 🔍 The Explain Plan revealed the culprit: a costly JOIN causing shuffles.

The fix? No JOIN needed! A simple filter replaced it—resulting in a 20-minute runtime instead of 24h.

💡 Lesson: Always check the Explain Plan!

#BigData #ApacheSpark #PerformanceTuning #DataEngineering

Pedro Faria Mar 17

New post about how to write data from a Apache Spark DataFrame into a Elasticsearch/Opensearch database #datascience #databricks #elasticsearch #opensearch #bigdata #apachespark #spark #tech #programming #python:

https://pedro-faria.netlify.app/posts/2025/2025-03-16-spark-elasticsearch/en/

Writing Spark DataFrames to Elasticsearch/Opensearch databases – home |> dplyr::glimpse()

Elasticsearch and Opensearch are two very popular No-SQL databases. In this post, I want to address how can you write data from a Spark DataFrame into an Elasticsearch/Opensearch database.

heise online English Mar 14

Easier to use: DuckDB gets local web user interface

As of version 1.2.1, the DuckDB in-process database can be conveniently operated via a local UI, which is installed as an extension, as an alternative to CLI.

https://www.heise.de/en/news/Easier-to-use-DuckDB-gets-local-web-user-interface-10316323.html?wt_mc=sm.red.ho.mastodon.mastodon.md_beitraege.md_beitraege&utm_source=mastodon

#ApacheSpark #Datenbanken #SQL #news

Easier to use: DuckDB gets local web user interface

As of version 1.2.1, the DuckDB in-process database can be conveniently operated via a local UI, which is installed as an extension, as an alternative to CLI.

heise online

heise Developer Mar 14

Einfacher bedienen: DuckDB erhält lokale Web-Benutzeroberfläche

Die In-Process-Datenbank DuckDB lässt sich ab Version 1.2.1 alternativ zur CLI komfortabel über ein lokales UI bedienen, das als Extension installiert wird.

https://www.heise.de/news/Einfacher-bedienen-DuckDB-erhaelt-lokale-Web-Benutzeroberflaeche-10316264.html?wt_mc=sm.red.ho.mastodon.mastodon.md_beitraege.md_beitraege&utm_source=mastodon

#ApacheSpark #Datenbanken #SQL #news

Datenbank: DuckDB erhält lokale Web-Benutzeroberfläche

Die In-Process-Datenbank DuckDB lässt sich ab Version 1.2.1 alternativ zur CLI komfortabel über ein lokales UI bedienen, das als Extension installiert wird.

heise online