Mastodawn

Do you even need a database?

https://www.dbpro.app/blog/do-you-even-need-a-database

Do You Even Need a Database? - DB Pro Blog

We built the same HTTP server in Go, Bun, and Rust using two storage strategies: read the file on every request, or load everything into memory. Then we ran real benchmarks. The results are more interesting than you'd expect.

Show thread

z3ugma

At some point, don't you just end up making a low-quality, poorly-tested reinvention of SQLite by doing this and adding features?

Show thread

freedomben 7h ago

Sometimes yes, I've seen it. It even tends to happen on NoSQL databases as well. Three times I've seen apps start on top of Dynamo DB, and then end up re-implementing relational databases at the application level anyway. Starting with postgres would have been the right answer for all three of those. Initial dev went faster, but tech debt and complexity quickly started soaking up all those gains and left a hard-to-maintain mess.

Show thread

leafarlua 7h ago

This always confuses me because we have decades of SQL and all its issues as well. Hundreds of experienced devs talking about all the issues in SQL and the quirks of queries when your data is not trivial.

One would think that for a startup of sorts, where things changes fast and are unpredictable, NoSQL is the correct answer. And when things are stable and the shape of entities are known, going for SQL becomes a natural path.

There is also cases for having both, and there is cases for graph-oriented databases or even columnar-oriented ones such as duckdb.

Seems to me, with my very limited experience of course, everything leads to same boring fundamental issue: Rarely the issue lays on infrastructure, and is mostly bad design decisions and poor domain knowledge. Realistic, how many times the bottleneck is indeed the type of database versus the quality of the code and the imlementation of the system design?

Show thread

dalenw 6h ago

It's almost always a system design issue. Outside of a few specific use cases with big data, I struggle to imagine when I'd use NoSQL, especially in an application or data analytics scenario. At the end of the data, your data should be structured in a predictable manner, and it most likely relates to other data. So just use SQL.

Show thread

greenavocado 6h ago

System design issues are a product of culture, capabilities, and prototyping speed of the dev team

Show thread

mike_hearn 5h ago

Disclaimer: I work part time on the DB team.

You could also consider renting an Oracle DB. Yep! Consider some unintuitive facts:

• It can be cheaper to use Oracle than MongoDB. There are companies that have migrated away from Mongo to Oracle to save money. This idea violates some of HN's most sacred memes, but there you go. Cloud databases are things you always pay for, even if they're based on open source code.

• Oracle supports NoSQL features including the MongoDB protocol. You can use the Mongo GUI tools to view and edit your data. Starting with NoSQL is very easy as a consequence.

• But... it also has "JSON duality views". You start with a collection of JSON documents and the database not only works out your JSON schemas through data entropy analysis, but can also refactor your documents into relational tables behind the scenes whilst preserving the JSON/REST oriented view e.g. with optimistic locking using etags. Queries on JSON DVs become SQL queries that join tables behind the scenes so you get the benefits of both NoSQL and SQL worlds (i.e. updating a sub-object in one place updates it in all places cheaply).

• If your startup has viral growth you won't have db scaling issues because Oracle DBs scale horizontally, and have a bunch of other neat performance tricks like automatically adding indexes you forgot you needed, you can materialize views, there are high performance transactional message queues etc.

So you get a nice smooth scale-up and transition from ad hoc "stuff some json into the db and hope for the best" to well typed data with schemas and properly normalized forms that benefit from all the features of SQL.

Show thread

freedomben 5h ago

I wanted to hate you for suggesting Oracle, but you defend it well! I had no idea

Show thread

tracker1 5h ago

I generally limit Oracle to where you are in a position to have a dedicated team to the design, deployment and management of just database operations. I'm not really a fan of Oracle in general, but if you're in a position to spend upwards of $1m/yr or more for dedicated db staff, then it's probably worth considering.

Even then, PostgreSQL and even MS-SQL are often decent alternatives for most use cases.

Show thread

mike_hearn 4h ago

That was true years ago but these days there's the autonomous database offering, where DB operations are almost all automated. You can rent them in the cloud and you just get the connection strings/wallet and go. Examples of stuff it automates: backups, scaling up/down, (as mentioned) adding indexes automatically, query plan A/B testing to catch bad replans, you can pin plans if you need to, rolling upgrades without downtime, automated application of security patches (if you want that), etc.

So yeah running a relational DB used to be quite high effort but it got a lot better over time.

Show thread

tracker1 4h ago

At that point, you can say the same for PostgreSQL, which is more broadly supported across all major and minor cloud platforms with similar features and I'm assuming a lower cost and barrier of entry. This is without signing with Oracle, Inc... which tends to bring a lot of lock-in behaviors that come with those feature sets.

TBF, I haven't had to use Oracle in about a decade at this point... so I'm not sure how well it competes... My experiences with the corporate entity itself leave a lot to be desired, let alone just getting setup/started with local connectivity has always been what I considered extremely painful vs common alternatives. MS-SQL was always really nice to get setup, but more recently has had a lot of difficulties, in particular with docker/dev instances and more under arm (mac) than alternatives.

I'm a pretty big fan of PG, which is, again, very widely available and supported.

Show thread

mike_hearn 4h ago

Autonomous DB can run on-premises or in any cloud, not just Oracle's cloud. So it's not quite the same.

I think PG doesn't have most of the features I named, I'm pretty sure it doesn't have integrated queues for example (SELECT FOR UPDATE SKIP LOCKED isn't an MQ system), but also, bear in mind the "postgres" cloud vendors sell is often not actually Postgres. They've forked it and are exploiting the weak trademark protection, so people can end up more locked in than they think. In the past one cloud even shipped a transaction isolation bug in something they were calling managed Postgres, that didn't exist upstream! So then you're stuck with both a single DB and a single cloud.

Local dev is the same as other DBs:

    docker run -d --name <oracle-db> container-registry.oracle.com/database/free:latest

See https://container-registry.oracle.com

Works on Intel and ARM. I develop on an ARM Mac without issue. It starts up in a few seconds.

Cost isn't necessarily much lower. At one point I specced out a DB equivalent to what a managed Postgres would cost for OpenAI's reported workload:

> I knocked up an estimate using Azure's pricing calculator and the numbers they provide, assuming 5TB of data (under-estimate) and HA option. Even with a 1 year reservation @40% discount they'd be paying (list price) around $350k/month. For that amount you can rent a dedicated Oracle/ExaData cluster with 192 cores! That's got all kinds of fancy hardware optimizations like a dedicated intra-cluster replication network, RDMA between nodes, predicate pushdown etc. It's going to perform better, and have way more features that would relieve their operational headache.

Home

Show thread

alexisread 4h ago

Good points, but Postgres has all those, along with much better local testing story, easier and more reliable CDC, better UDFs (in Python, Go etc.), a huge ecosystem of extensions for eg. GIS data, no licencing issues ever, API compatability with DuckDB, Doris and other DBs, and (this is the big one) is not Oracle.

Show thread

marcosdumay 5h ago

No, when things change fast and unpredictably, NoSQL is worse than when they are well-known and stable.

NoSQL gains you no speed at all in redesigning your system. Instead, you trade a few hard to do tasks in data migration into an unsurmountable mess of data inconsistency bugs that you'll never actually get into the end of.

> is mostly bad design decisions and poor domain knowledge

Yes, using NoSQL to avoid data migrations is a bad design decision. Usually created by poor general knowledge.

Show thread

james_marks 4h ago

If the argument for NoSQL is, “we don’t know what our schema is going to be”, stop.

Stop and go ask more questions until you have a better understanding of the problem.

Show thread

tracker1 5h ago

I think part of it is the scale in terms of the past decade and a half... The hardware and vertical scale you could get in 2010 is dramatically different than today.

A lot of the bespoke no-sql data stores really started to come to the forefront around 2010 or so. At that time, having 8 cpu cores and 10k rpm SAS spinning drives was a high end server. Today, we have well over 100 cores, with TBs of RAM and PCIe Gen 4/5 NVME storage (u.x) that is thousands of times faster and has a total cost lower than the servers from 2010 or so that your average laptop can outclass today.

You can vertically scale a traditional RDBMS like PostgreSQL to an extreme degree... Not to mention utilizing features like JSONB where you can have denormalized tables within a structured world. This makes it even harder to really justify using NoSQL/NewSQL databases. The main bottlenecks are easier to overcome if you relax normalization where necessary.

There's also the consideration of specialized databases or alternative databases where data is echo'd to for the purposes of logging, metrics or reporting. Not to mention, certain layers of appropriate caching, which can still be less complex than some multi-database approaches.

Show thread

AlotOfReading 4h ago

There's plenty of middle ground between an unchanging SQL schema and the implicit schemas of "schemaless" databases. You can have completely fluid schemas with the full power of relational algebra (e.g. untyped datalog). You shouldn't be using NoSQL just because you want to easily change schemas.

Show thread

icedchai 5h ago

Same. DynamoDB is almost never a good default choice unless you've thought very carefully about your current and future use cases. That's not to say it's always bad! At previous startups we did some amazing things with Dynamo.

Show thread

tshaddox 4h ago

I've never used DynamoDB in production, but it always struck me as the type of thing where you'd want to start with a typical relational database, and only transition the critical read/write paths when you get to massive scale and have a very good understanding of your data access patterns.

Show thread

noveltyaccount 7h ago

As soon as you need to do a JOIN, you're either rewriting a database or replatforming on Sqlite.

Show thread

pgtan 5h ago

Here are two checks using joins, one with sqlite, one with the join builtin of ksh93:

  check_empty_vhosts () {
    # Check which vhost adapter doesn't have any VTD mapped
    start_sqlite
    tosql "SELECT l.vios_name,l.vadapter_name FROM vios_vadapter AS l
        LEFT OUTER JOIN vios_wwn_disk_vadapter_vtd AS r
    USING (vadapter_name,vios_name)
    WHERE r.vadapter_name IS NULL AND
      r.vios_name IS NULL AND
   l.vadapter_name LIKE 'vhost%';"
    endsql
    getsql
    stop_sqlite
  }

check_empty_vhosts_sh () { # same as above, but on the shell join -v 1 -t , -1 1 -2 1 \ <(while IFS=, read vio host slot; do if [[ $host == vhost* ]]; then print ${vio}_$host,$slot fi done < $VIO_ADAPTER_SLOT | sort -t , -k 1)\ <(while IFS=, read vio vhost vtd disk; do if [[ $vhost == vhost* ]]; then print ${vio}_$vhost fi done < $VIO_VHOST_VTD_DISK | sort -t , -k 1) }

Show thread

goerch 4h ago

a) Just heard today: JOINs are bad for performance
b) How many columns can (an Excel) table have: no need for JOINs

Show thread

bachmeier 5h ago

Based on what's in the article, it wouldn't take much to move these files to SQLite or any other database in the future.

Edit: I just submitted a link to Joe Armstrong's Minimum Viable Programs article from 2014. If the response to my comment is about the enterprise and imaginary scaling problems, realize that those situations don't apply to some programming problems.

Show thread

locknitpicker 5h ago

> Based on what's in the article, it wouldn't take much to move these files to SQLite or any other database in the future.

Why waste time screwing around with ad-hoc file reads, then?

I mean, what exactly are you buying by rolling your own?

Show thread

bachmeier 5h ago

You can avoid the overhead of working with the database. If you want to work with json data and prefer the advantages of text files, this solution will be better when you're starting out. I'm not going to argue in favor of a particular solution because that depends on what you're doing. One could turn the question around and ask what's special about SQLite.

Show thread

pythonaut_16 5h ago

If your language supports it, what is the overhead of working with SQLite?

What's special about SQLite is that it already solves most of the things you need for data persistence without adding the same kind of overhead or trade offs as Postgres or other persistence layers, and that it saves you from solving those problems yourself in your json text files...

Like by all means don't use SQLite in every project. I have projects where I just use files on the disk too. But it's kinda inane to pretend it's some kind of burdensome tool that adds so much overhead it's not worth it.

Show thread

locknitpicker 5h ago

> You can avoid the overhead of working with the database.

What overhead?

SQLite is literally more performant than fread/fwrite.

Show thread

cleversomething 5h ago

That's exactly what I was going to say. This seems more like a neat "look Ma, no database!" hobby project than an actual production recommendation.

Show thread

ablob 5h ago

So you trade the overhead of SQL with the overhead of JSON?

Show thread

cleversomething 5h ago

> what's special about SQLite

Battle-tested, extremely performant, easier to use than a homegrown alternative?

By all means, hack around and make your own pseudo-database file system. Sounds like a fun weekend project. It doesn't sound easier or better or less costly than using SQLite in a production app though.

Show thread

whalesalad 5h ago

Reminds me of the infamous Robert Virding quote:

“Virding's First Rule of Programming: Any sufficiently complicated concurrent program in another language contains an ad hoc informally-specified bug-ridden slow implementation of half of Erlang.”

Show thread

mrec 4h ago

In case you weren't aware, that in itself is riffing on Greenspun's tenth rule:

https://en.wikipedia.org/wiki/Greenspun%27s_tenth_rule

Greenspun's tenth rule - Wikipedia

Show thread

hackingonempty 4h ago

Probably more like a low-quality, poorly-tested reinvention of BerkeleyDB.