Mastodawn

Kale

3d ago

Good morning folks

Show thread

Kale

Time for a #poll.

Reviews are adverts.

Yes

*makes a 'maybe' gesture

100%

No

Poll ended at Mar 25 at 4:33am.

Show thread

Kale

3d ago

Rebuilding ALL the derived files for the dataset and training from scratch without the headers to see if it's the 'date' field that's causing this 'I will reply with a number, 20% of the time' behaviour in the model.

Show thread

Kale

3d ago

I'm also wondering if, ultimately, my finetune dataset is way too broad. Perhaps I should pull it back to just the one Persona and see if that helps.

Show thread

Kale

3d ago

If it did, I'd do a training run on each persona, so I'd basically have one model per persona, and that wouldn't be bad since I *can* run 6 or 7 of these models, concurrently

Show thread

Kale

3d ago

THIS IS NEW (and, counts):

🗣️ Prompt: user: [The user is an older man named {user}], scenario: [{char} has spent the day reading a book, {user} has just arrived.], conversation: [{user}: good evening, {char} good to see you.
{char}: hello, {user}] External data: [] Prompt: [What's up?]

[temp: 0.8 | persona: sarcastic teen] 🤖: The next bookmark

Show thread

Kale

3d ago

🗣️ Prompt: Do you ever dream in color?

[temp: 0.2 | persona: hivequeen] 🤖: The swarm does.

... this... also counts!

Show thread

Kale

3d ago

On the plus side: yes, I 'fixed' the random number bullshit it was giving me.

On the weird side, it seems obsessed with the words 'anarchy', and 'animal'

Show thread

Kale

3d ago

*reads*

*looks up at camera*

*reads again*

*crumples up paper and throws it over shoulder*

Show thread

Kale

3d ago

*picks up imaginary phone*

Hello, Irony? Yes, I'd like to direct your attention...

Show thread

Kale

3d ago

Lemme just hop on the OpenData subreddit, surely I can find some good datasets.

Nope, it's just people jerking off about how they believe this shit should be open. Not providing shit. Even the stuff they SAY is open and available, they're not linking.

Show thread

Kale

3d ago

Genuinely, I've been doing some pretty severe interrogating of various corners of the internet lately (lots of news sites, blogs, etc) outside my usual milieu, and all I'm fucking finding is people talking ABOUT things and not actually showing these things/linking to resourcces.

Show thread

Kale

3d ago

Now, to counterbalance my bitching:
https://exoplanetarchive.ipac.caltech.edu/

WOOO EXOPLANETS

#exoplanets #datasets

NASA Exoplanet Archive

Show thread

Kale

3d ago

Here's a hint: if you put this into your LLM dataset, alllllll you're gonna fuckin' get is random numbers as your output.

Show thread

Kale

3d ago

No, this is NOT what I did, I did something ELSE, thank you.

But I can learn from my mistakes, kthx.

Show thread

Roger BW 😷3d ago

@DarkestKale When newspapers were on paper, there was some excuse for this. Not now that you can just put in "and here is a link to the publication we're summarising".

Show thread

Kale

3d ago

@RogerBW here's the fucking worst thing about news websites: they change, and links break, constantly and consistently.

Show thread

Kale

3d ago

@RogerBW Our ABC (not the yank one), will CONSTANTLY change the urls of articles, the headlines, etc - up to eight times in one day, just republishing with slight (unflagged) 'updates'

There's no such thing as being able to reference an ABC article. It's fucking quicksand.

Show thread

Roger BW 😷3d ago

@DarkestKale Even without human fuckery, there is the sort of CMS that makes the master index to an article a bit of text based on its title, and the sort that makes it an arbitrary number.

Show thread

Roger BW 😷3d ago

@DarkestKale The Swindon of datasets? (Supposedly it was so close to the mean demographic of all England that it was used for test marketing all sorts of new products. After a few years they noticed that lots of things were doing well there but failing nationwide, and eventually realised that word had got round and people who liked trying new products had moved to Swindon to do so.)

Show thread

Kale

3d ago

@RogerBW huh. Our 'proving ground' used to be Tasmania. Especially for telecoms.

Smaller population, bounded by sea, so you can basically run good tests over there, etc - while keeping same currency, language and legality

Show thread

Machine Lord Zero 3d ago

@DarkestKale I've seen reviews that're such blatant adverts you can see the maker's fingers when the reviewer opens their mouth too wide.
I've also seen reviews that're warnings against toxic waste.