I need an Append only database. Okay...
‪Orange website post from 738 days ago tells me Kafka is a high speed append only RDBMS. I have doubts. ‬
@valarauca1 uhhh... i've only used kafka as a message passing system. i guess you could use it as a relational database? it's pretty neat though, definitely heard of it used under high throughput
@iximeow I'm trying to store something like 140million rows of data for analysis and I seem to have very few options.
@iximeow okay that's one scheme. There are a few others. It is actually only some ~10GiB of data. I'm thinking I just need to host it all in RAM and write my analysis tool.
@valarauca1 if you don't mind me asking, what are you analysis'ing?
@iximeow ALL of github
@valarauca1 did you grab the dataset from google's github query thingie? i need to scrape github still (and codeplex, before it goes down..)
@iximeow yeah reducing the bigquery dataset took about 4TB of cloud credits.

@iximeow (well not all) repoIDs, userIDs, and timestamps.

Everything is pretty much a uint64_t

@valarauca1 aaahh. i need all the header files of all of the things :( fairly worried about impending size constraints
@iximeow grepping across all files in BigQuery file contents is only ~3TiB <$40 they give you data size estimate when you type in a valid query you can throw in the cost calculator.
@iximeow BQ charges for network transfer, not in place filtering/joining/hashing. But it's computation is STUPID SLOW. Doing a 1000 item cross join will take >20 hours.