Mastodawn

In my experience it is obvious. Calling people on it also makes them feel embarrassed usually. I put something like “I can just ask an LLM myself if I wanted this output. Please provide your own commentary.” If I were a manager and I had an employee just copy pasting that kind of output, I’d probably wonder if that employee actually contributes anything.

Show thread

AliasAKA Mar 15

I think this is the way. A certain number of times of “[coworker] wasn’t asked because they only respond with LLMs, so I just ask the LLMs directly. I am not sure what [coworker]’s expertise is anymore, I just don’t consult them” and I suspect coworker may in fact stop responding with LLMs.

Show thread

AliasAKA Mar 12

ITT: people surprised one of the most recognizable and prolific games in the history of video games has a CEO.

She sounds like a savvy business person:

“We’re half the population, and we bring in a lot of money into the industry, and so I always question when our licensing partners are developing a new Tetris game: how many women do you have on the team? Because our demographic is close to 50[%].”

Yeah that checks out. It’s pretty wild that “developing games for our demographic / population” is so hard for gaming companies to grasp as a winning concept.

Show thread

AliasAKA Mar 4

I’m locked in to apples ecosystem for various reasons, but I’ll be buying this as a second phone hands down to try and wean me also convince family on switch over. Goes well with my self hosting.

Show thread

AliasAKA Mar 2

There’s probably is a path to female only reproduction using artificial means. I suppose it would be possible with men too, if we could synthetically gestate a child, but it’s probably easier in women since they possess the evolved ability to gestate. Parthenogenesis has occurred in other organisms, possible perhaps for humans too, with help.

Show thread

AliasAKA Mar 1

They have guns, but they also have much much much stronger gun control laws.

Show thread

AliasAKA Feb 26

This already happens intrinsically in the models. The tokens are abstracted in the internal layers and only translated in the output layer back to next token prediction. Training visual models is slightly different because you’re not outputting tokens but pixel values (or possibly bounding boxes or edges, but not usually; conversely if not generative you may be predicting labels which could theoretically be in token space).

The field itself is actually fairly stagnant in architecture. It’s still just attention layers all the way down. It’s just adding more context length and more layers and wider layers while training on more data. I personally think this approach will never achieve AGI or anything like it. It will get better at perfectly reciting its training data, but I don’t expect truly emergent phenomena to occur with these architectures just because they’re very big. They’ll be decent chatbots, but we already have that, and they’ll just consumer ever more resources for vanishingly small improvements (and won’t functionally improve any true logical capability beyond regurgitating logical paths already trodden in their training data but in a very brittle way, because they do not actually understand the logic or why the logic is valid, they have no true state model of objects which are described in the token space they’re traversing probabilistically).

AliasAKA Feb 26

Sorry, I’m not saying that’s a good thing. It’s not just the context that’s expanding, but the parameter of the base model. I’m saying at some point you just have saved a compressed version of the majority of the content (we’re already kind of there) and you’d be able to decompress it even more losslessly. This doesn’t make it more useful for anything other than recreating copyrighted works.

Show thread

AliasAKA Feb 26

Agreed. Continue the momentum and soon perhaps Mexico will go with a 4 day workweek.

Show thread

AliasAKA Feb 26

Current models are speculated at 700 billion parameters plus. At 32 bit precision (half float), that’s 2.8TB of RAM per model, or about 10 of these units. There are ways to lower it, but if you’re trying to run full precision (say for training) you’d use over 2x this, something like maybe 4x depending on how you store gradients and updates, and then running full precision I’d reckon at 32bit probably. Possible I suppose they train at 32bit but I’d be kind of surprised.