really scared to run tools like sort on this file
im verifying that it parses correctly by just running jq on it and its taking minutes. actual whole minutes. several of them. and this is on a very powerful server, with an nvme drive, a lot of cache, and some very fast memory
86 gigabytes of json
; grep 'pds.trump.com' export2 | wc -l
18818347
; wc -l export2
87447507 export2
of course this observation is so dull that even a coding agent on bluesky made it. so we press on
great news. @107d is also following itself on bluesky, which i found out while screwing about with a bluesky follow suggestions tool. current count... two
i want to process as few records as possible so now i am going to filter out all rows with an invalid pds. good bye example.test. so long localhost. see you in hell 'null'
@mothcompute im fairly sure i have myself blocked as well
@107d now that is very funny
@mothcompute still kinda cool that jq can handle it at all. nice when a tool actually works at scale.
@mothcompute this is one area where I love jsonl for array datasets. yeet that at 50 or so cores and it fliiiieees
@mothcompute sensible quantity of json to have