themineraria 

98 Followers
72 Following
178 Posts
~Am i dev ? 0x90~
------------------------------
🇫🇷 🇬🇧 🇪🇸
AI applied to IS / Pentest
Little content here, nothing but quality
​ : 🔟​ (arium 10.7 LTS)
​​ :🦜 (parrot 5.0.1 security)
🛠️: â€‹ â€‹ â€‹ â€‹  â€‹   â€‹ â€‹
Ici depuis5 nov. 2022

Are you impressed by all the new developments in AI over the last few months? Between Meta's stylish glasses with integrated Llama 3, ChatGPT's new o1 model, etc., we're getting closer and closer to a world where ultra-accessible AIs think for us via thought chains, and we're slowly moving towards conscious models. All this may sound promising, dystopian or just plain alarmist, but the fact remains. Whether we like it or not, at this very moment, private companies with colossal resources, which in the past have already committed moral or criminal offences, sometimes even denounced by whistle-blowers, are in the process of demonstrating that all their data infrastructures acquired over the years now allow them to become the monopoly of a revolutionary technology that they don't always fully control or comprehend.

Fortunately, the worst imaginable tragedies didn't occur yet, and nobody wants to be the first. To avoid this, everyone has more or less relevant methods, but in case you've missed it like too many people in my opinion, I'd like here to highlight the work of the company Anthropic. This brand came to the attention of a wider public fairly recently with the release of their newest AI model “Claude”, but that's not the most interesting part. The project was founded by former OpenAI researchers, who left when OpenAI should have been renamed “MicrosoftAI”.

With this small thread, I'd like to try and shed some light on the hyper-complex but fascinating work of these researchers, who deserve hundred times more attention than the trendy new feature of your favorite model or the image I put in as an eye catch. In October 2023, Anthropic published a first research paper entitled “Towards monosemanticity: Decomposing Language Models With Dictionary Learning”, followed by the second opus in May 2024. In my opinion, the methodology presented by these papers and the results are as major an advance in the field as the famous “Attention Is All You Need”, and even more.

I'll spare you here from describing the complex beauty of this paper, (which I invite you to go and consult if you like AI, or rather read an explanation of it by the brilliant “Astral Codex Ten”), but to simplify, it suggests a method using one AI as a tool to analyze another one in order to understand its inner workings, analyze and map what it has learned, and how it “reasons”. Among other things, they highlighted the fact that an AI “creates” within itself lots of little models specialized on specific tasks, in a similar fashion as the brain's regions.

These results are so credible, reliable and important that the same methodology was very recently applied in an article published on the second of October 2024 in Nature, entitled “Largest brain map ever reveals fruit fly's neurons in exquisite detail”. They make it easier for us to understand how a brain and AI model works, and also to detect any malfunctions in it by viewing inner relationship between concepts and connections between areas.

Why is this so much more important than all the mainstream news about AI, or even this fascinating nature paper about fly's brain mapping that I suggest you to read, you might ask? Because without these open publications, available to all and enabling these major advances, companies would each be condemned to do their own research in private without an external advice afraid of being robbed of their secrets, which, in the field of security and learning, is never a desirable practice. Security through obscurity is the WORST of all. So, it's very important to bring these advances to light, to popularize them, to make them spread to as many people as possible, to finance them if possible, and to encourage by every possible means other private companies to contribute publicly to the safety and knowledge of all, and to share their research so that everyone wins in the end.

Have I catched your attention with this image? Good, it means you're probably a scientist, engineer or enthusiast about technology in general.
If so, let me tell you a short story about AI and how it relates to this picture.
Click to read it bellow 

#artificialintelligence #ai #antrhopic

The junk includes:

  • 453M 32-hex hashes
  • 444M digits-only strings of length 8-11 (easily bruteforced)
  • 415M lower-digit or digit-lower strings that are clearly just wordlist words with all possible 4-digit strings appended or prepended
  • 287M of length 6 or less (easily bruteforced)
  • 201M 40-hex hashes
  • 138M bcrypt hashes (plus 15M truncated bcrypts)
  • 71M strings more than 100 characters
  • 51M 96-hex hashes
  • 50M Houzz __SEC__ (modified sha512crypt) hashes
  • 18M encrypted + base64 passwords from the 2013 Adobe leak (credit: Flagg)
  • 12M 32-hex prefixed with '0x'
  • 11M Google auth tokens (ya29 prefix)
  • 7M with at least 20 contiguous hex chars
  • 6.6M 128-hex hashes
  • 160K argon2 hashes

("Easily bruteforced" means that competent attackers are going to run the equivalent hybrid or bruteforce attack anyway much faster on GPU. All these naively-generated strings do is waste attack time ... and inflate the scary size of the compilation 🙄)

If you remove all of this junk (that's useless for directly cracking a human-generated password), all of the RockYou2021 mashup (which was itself similarly problematic), and all founds already available on Hashmob (1.2B) ...

... you're left with only 190M strings that are "net new, maybe useful".

So if you're a pentester or other "normal" password cracker, you can probably just skip RockYou2024. It's only going to be useful if you're a completionist who's trying to crack other mashups (like the long tail of junk in the Pwned Passwords corpus, etc.)

[will update post as I find more non-trivial junk]

#PasswordCracking #RockYou2024

To my fellow #AI architect/trainer/data scientist reading this toot one day:

Which tool / format do you use to store your data, qualify it and create your training set ?

#artificialintelligence #BigData

Who is your favourite security researchers, journalists etc on Mastodon?
Really need to fill up feed :)

8/26/91

Linus Torvalds

How to find specific things with grep

find what looks like an IP Address

egrep -o "[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}"

find what looks like an email address

egrep "\w+([._-]\w)*@\w+([._-]\w)*\.\w{2,4}"

find what looks like a URL

egrep -o "(http|https)://[a-zA-Z0-9./?=_-]*"

→ corrections/contributions welcome.

#Linux #bash #grep #regex

@ parrotos Just no. Don't. I know that it's supposed to make things clearer and ctrl+o could be "open a file", seems logical.. but for more than 10 years it has been "save a file", you can't change that now, it's toooooo late...
If I find you ...