Mastodawn

it should not be be this difficult to learn data engineering. there are so many resources for learning SQL. and for leaning Python. and courses for becoming an analyst or analytics engineer. but i find there’s a dearth of resources for mastering the best practices for data engineering — common patterns, pagination, working with APIs. i can’t be the only one

Show thread

brittany bennett Jan 29, 2023

i’ve taken half a dozen “intro to python” courses. i know how to use virtual envs and the debugger and how to write tests with pytest. i’ve worked with APIs and i’ve written a few ELT scripts myself. but i find myself still at the beginner level!! it’s maddening

Show thread

brittany bennett Jan 29, 2023

obviously my vacation is off to a great start

Show thread

Jason Becker Jan 29, 2023

@thebbennett I think there’s a boundary between “scripting” and “developing” that’s really hard to bridge because there’s bad dialog between the two “camps” and little good work that understands one bridging to the other. (This is also at the heart of R v Python language wars)

Show thread

brittany bennett Jan 29, 2023

@jsonbecker can you say more? i’m really interested in what you have to say here

Show thread

Jason Becker Jan 29, 2023

@thebbennett I think there’s a world of scripts, where software largely has an input and an output and runs sequentially. We may organize in files or even classes and have tests, but there’s an entry and exit point with clear steps and goals. Then there’s a separate world where software exists unattended taking concurrent inputs and almost unknowable state. You can get surprisingly far on both sides without understanding the other. And each has valuable lessons for the other.

Show thread

Jason Becker Jan 29, 2023

@thebbennett that may be the best I can do in 500 characters or less, but I’d love to discuss this at anytime. I think a lot of folks on the data side start in one world and adopt and adapt many tools from the other, but it’s incredibly hard to find tutorials appropriate to highly skilled people at one piece that help to bridge. The beginner materials don’t help.

Show thread

SirLeeJackwagon Jan 29, 2023

@thebbennett I've been a data engineer for several years and have shifted into full stack swe and now doing some data science. It has been maddening throughout. I think @jsonbecker makes some good points. I think another piece that adds significant complexity here has to do with differing architectures and their associated costs. Docker+AWS ECR+AWS Lambda+AWS RDS(SQL) has dollar and learning costs, but that's but only one way to build a pipeline.

Show thread

Michael Wexler Jan 29, 2023

@thebbennett You are thinking that DE is part of DS. What you mention at the end is traditional "database" tech: think Kimball data arch; data access patterns for non-cached direct storage access, etc. It's not all that new, it's just new to DS and usually not part of DS training.

Look at some of the feature store platforms: what are they abstracting away? Look at how folks use caching, in trad CRUD projects across many Medium posts. You'll discover a whole new world...

Show thread

Andrew Meredith Jan 29, 2023

@thebbennett Definitely not the only one! I'm new(-ish) to data engineering proper, and I'm still trying to figure out what is and is not DE. It seems like it is a fairly broad discipline that is more focused on bringing together DBA, DevOps, software engineering, and analytics than on introducing brand new practices.
If you figure it out, let me know!