Mastodawn

Are there any #PostgreSQL / #Timescale specialists here? I wonder if it is a bad idea to define very long running jobs. Here in particular it's about having a daily job that moves “old” hypertable chunks to another tablespace. Initially this takes a long time (several hours), later it should be faster, as there is not always something to do for all tables.

Show thread

Tomas Vondra Sep 3, 2024

@FrVaBe FWIW I'm not a timescale specialist, but I know a thing or two about Postgres ...

I think this very much depends on what you mean by a "long running job". Is that one long transaction? That might indeed be a problem, as it will block vacuum and might cause bloat.

But maybe that's OK. Maybe your application does not generate many dead tuples for this to matter. Or maybe it's worth it, if the following runs will be faster.

Show thread

Franz van Betteraey Sep 3, 2024

@tomasv I guess it is not covered in an overall transaction as data is moved to another tablespace (chunk by chunk). I guess you are right - the initial "effort" (some few hours runtime) will be worth the benefit afterwards. I only am not sure if there is some kind of timeout on the server site for a procedure call.

Show thread

Andreas Scherbaum Sep 3, 2024

@FrVaBe @tomasv If it is just initial and gets better later, it's indeed something to consider.

If it takes too long, can you copy and delete some chunks of the data? And eventually just drop the original table?

Show thread

Tomas Vondra Sep 3, 2024

@FrVaBe If it's not one long transaction for the whole process, this shouldn't be a problem from the DB point of view - it won't block any cleanup or cause similar issues. It's essentially a sequence of small transactions, the database is designed to handle that just fine.

Of course, I don't know if it's worth it - but I guess if the gains were not significant, you won't ask the question. I'm just saying the database should handle this OK.

Show thread

Stefanie Janine Stölting Sep 4, 2024

@FrVaBe Please don't use select/insert/delete on TimescaleDB tables.
The data is sorted in table partitions hence there's usually no advantage in moving data around manually.
The size of these partitions, which are called hypertables chunks, can be changed with set_chunk_time_interval.

If that doesn't fit your needs maybe PostgreSQL partitioning might solve problems as you can detach a partition, which in fact is just a table. Detaching is fast and doesn't block ongoing processes.

But it all depends on the use case and the goal, that's to achieve.

Show thread

Franz van Betteraey Sep 4, 2024

@sjstoelting It is about a daily job that will iterate over about 360 hypertables and move chunks with older data to another partition like described here:

https://docs.timescale.com/use-timescale/latest/user-defined-actions/example-tiered-storage/

The first execution will be the crucial one because a lot of chunks have to be moved and this will last hours.

Alternative I tried to run a dedicated job for each hypertable but I ran into not enough worker (?) threads problems.

Timescale Documentation | Use a user-defined action to implement automatic tablespace management

Automatically move hypertable chunks between tablespaces

Show thread

Stefanie Janine Stölting Sep 4, 2024

@FrVaBe Oh, that other peoples hardware is expensive problem 😉.
To solve the initial problem make it running small chunks in separated transactions. Preferable with some code in whatever language you prefer, for example Python or shell script.
When that's finished you can have it running daily automatically.

Show thread

Franz van Betteraey Sep 4, 2024

@sjstoelting Maybe a stupid question. Is a user defined action run in a transaction scope? How would it be possible to rollback the move of lots of chunks?

Show thread

Stefanie Janine Stölting Sep 4, 2024

@FrVaBe There's no such thing as a stupid question. But there are a lot of stupid answers.
You have already faced it when you ran out of workers.