0 Followers
0 Following
1 Posts

They/Them, agender-leaning scalie.

ADHD software developer with far too many hobbies/trades: AI, gamedev, webdev, programming language design, audio/video/data compression, software 3D, mass spectrometry, genomics.

Learning German (B2), Chinese (HSK 3-4ish), French (A2).

I’m undiagnosed too. Trying to get medicated should be the first step, but gaslighting doctors can be awful so I understand if that’s off the table.

I personally found these help:

  • Lots of caffeine. Like, double the amount most people would call “unhealthy”. It’s less efficient, but it’s still a stimulant that helps attention span, at least for me.
  • Keep learning. Get Duolingo or Brilliant or anything like that and just make sure you are always collecting and revising new knowledge. This raises the amount of dopamine in circulation in your brain, and makes it easier to focus even when you’re not learning.

A warning about podcasts, etc.: Don’t let your “work-time entertainment” be the same as your “free time entertainment”, or else they’ll blend together and you’ll never feel like you can relax at home.

Wow, I just scrolled through the front page and it was 100% depressing/anxiety-inducing news.

I don’t think I want this in my life.

I had no idea Omeleto existed. Looks like I’ve got a few weekends of watching their vids ahead of me!
Good to see them learning LaTeX young. It’s one of those life skills that no one should need, but everybody does need at some point
Why do I find “match-3” most offensive part of that thought?

Note: For this guide, we’ll focus on functions that operate on the scalar preactivations at each neuron individually.

Very frustrating to see this, as large models have shown that scalar activation functions make only a tiny impact when your model is wide enough.

arxiv.org/abs/2002.05202v1 shows GLU-based activation functions almost universally beat their equivalent scalar functions. IMO there needs to be more work in this area, as there are much bigger potential gains.

E.g. even for cases where the network only needs static routing (tabular data), transformers sometimes perform magically better than MLPs. This suggests there’s something special about self-attention as an “activation function”. If that magic can be extracted and made sub-quadratic, it could be a paradigm shift in NN design.

GLU Variants Improve Transformer

Gated Linear Units (arXiv:1612.08083) consist of the component-wise product of two linear projections, one of which is first passed through a sigmoid function. Variations on GLU are possible, using different nonlinear (or even linear) functions in place of sigmoid. We test these variants in the feed-forward sublayers of the Transformer (arXiv:1706.03762) sequence-to-sequence model, and find that some of them yield quality improvements over the typically-used ReLU or GELU activations.

arXiv.org
Btw, there’s a deep annual survey at www.gendercensus.com . It’s not specifically about this community, and excludes purely cis-binary people, but has interesting data & trends if you’re into this sort of thing.
Gender Census

You’re right. Everything is suspiciously wordy, substance is sparse, and every headline is clickbaity. It’s like they tuned the content specifically for google, not human readers…
Google also is responsible for the SEO industry. They made ads hugely profitable, then started directing traffic to sites that serve more of their ads, regardless of quality.
“zero commercial prospects”? That sounds exactly like the sort of movie I’d pay money for!