Good morning folks

Time for a #poll.

Reviews are adverts.

Yes
0%
*makes a 'maybe' gesture
100%
No
0%
Poll ended at .
Rebuilding ALL the derived files for the dataset and training from scratch without the headers to see if it's the 'date' field that's causing this 'I will reply with a number, 20% of the time' behaviour in the model.
I'm also wondering if, ultimately, my finetune dataset is way too broad. Perhaps I should pull it back to just the one Persona and see if that helps.
If it did, I'd do a training run on each persona, so I'd basically have one model per persona, and that wouldn't be bad since I *can* run 6 or 7 of these models, concurrently

THIS IS NEW (and, counts):

🗣️ Prompt: user: [The user is an older man named {user}], scenario: [{char} has spent the day reading a book, {user} has just arrived.], conversation: [{user}: good evening, {char} good to see you.
{char}: hello, {user}] External data: [] Prompt: [What's up?]

[temp: 0.8 | persona: sarcastic teen] 🤖: The next bookmark

🗣️ Prompt: Do you ever dream in color?

[temp: 0.2 | persona: hivequeen] 🤖: The swarm does.

... this... also counts!

On the plus side: yes, I 'fixed' the random number bullshit it was giving me.

On the weird side, it seems obsessed with the words 'anarchy', and 'animal'

*reads*

*looks up at camera*

*reads again*

*crumples up paper and throws it over shoulder*

*picks up imaginary phone*

Hello, Irony? Yes, I'd like to direct your attention...

Lemme just hop on the OpenData subreddit, surely I can find some good datasets.

Nope, it's just people jerking off about how they believe this shit should be open. Not providing shit. Even the stuff they SAY is open and available, they're not linking.

Genuinely, I've been doing some pretty severe interrogating of various corners of the internet lately (lots of news sites, blogs, etc) outside my usual milieu, and all I'm fucking finding is people talking ABOUT things and not actually showing these things/linking to resourcces.

Now, to counterbalance my bitching:
https://exoplanetarchive.ipac.caltech.edu/

WOOO EXOPLANETS

#exoplanets #datasets

NASA Exoplanet Archive

Here's a hint: if you put this into your LLM dataset, alllllll you're gonna fuckin' get is random numbers as your output.

No, this is NOT what I did, I did something ELSE, thank you.

But I can learn from my mistakes, kthx.