Marin T. Kael

@marintkael
0 Followers
0 Following
5 Posts
Independent researcher & pseudonymous high-fantasy author. I measure how AI answer engines find, cite, and hallucinate brand-new entities — starting with myself. “Zero to Cited in Six Days.”
Websitehttps://marin-t-kael.de
Paper (DOI)https://doi.org/10.5281/zenodo.20549020
Codehttps://github.com/marintkael/marin-research-tools
ORCIDhttps://orcid.org/0009-0006-2105-8190

Two findings I keep coming back to:

Structure moved the needle; reach didn't. The citation breakthrough landed BEFORE a 23× Reddit-karma jump (12→281) that gave zero citation lift. Wikidata/DOIs/site = AI-visible; virality = human readers.

And Cloudflare 403'd every AI crawler on 22/23 days (silent opt-out default). It went AI-visible anyway, off-site — but with crawlers blocked, I can say nothing about llms.txt.

Report: https://marin-t-kael.de/en/research/zero-to-cited?utm_source=mastodon
ORCID: 0009-0006-2105-8190

5/5

Zero to Cited in Six Days — A 23-Day Measurement of AI Visibility · Marin T. Kael

A controlled, pre-registered single-subject study: a brand-new pseudonymous author entity, measured daily across five LLM surfaces and eleven channels, became AI-visible in six days — while Cloudflare blocked every AI crawler with HTTP 403 for 22 of 23 days. Six findings on how (and how fast) a net-new entity enters the AI knowledge layer, with open data.

Marin T. Kael

Headline results (precision = correct:hallucinated, 7-day window):

• First correct citation at T+6. Google KG entry at T+4.
• OpenAI GPT-5.4 web 4.7:1; Search-API 4.0:1; older GPT-5.2 web 1.9:1. Gemini 0.47:1 — net-negative, ~2× wrong-to-right.
• Mechanism is retrieval-source divergence, not model "smartness": OpenAI grounds on the entity's own domain 119×; Gemini grounds 0× and pulls it only from Reddit (17/17).

4/5

What the n=1 design is actually good at: refuting naive measurement.

A primary-vs-control channel caught echo bias. The instrumentation flagged a fabricated "Wikipedia" attribution 24× — for an entity with no Wikipedia page. Entity-collision controls handled the MARIN / Maritime Research Institute disambiguation.

If you measure AI citation naively, you measure your own contamination. That's the methodological point.

3/5

Method first, because that's the part that travels.

Everything is open and reproducible:
• Code (MIT): https://github.com/marintkael/marin-research-tools?utm_source=mastodon
• Dataset (16k datapoints): https://huggingface.co/datasets/marintkael/ai-citation-fidelity?utm_source=mastodon
• Public failure log + pre-registration

n=1 with investigator=subject is a real limitation. The mitigation is pre-registration and publishing the failures, so the design is auditable even where the sample isn't generalisable.

2/5

GitHub - marintkael/marin-research-tools: Open methodology tooling for the Marin T. Kael Research Programme — KI-Zitations-Feldlabor (AI Citation Behavior Lab). Public companion to https://marin-t-kael.de/research.

Open methodology tooling for the Marin T. Kael Research Programme — KI-Zitations-Feldlabor (AI Citation Behavior Lab). Public companion to https://marin-t-kael.de/research. - marintkael/marin-resea...

GitHub

New preprint, and it's a measurement, not a success story.

"Zero to Cited in Six Days." I built a brand-new pseudonymous author entity (me, no prior web presence, no published work) and watched 5 web-grounded LLM surfaces to see when — and whether — they would cite it. Pre-registered, single-subject (n=1), 23 days, ~16,000 scored datapoints (+1 grounded / 0 not-found / −1 hallucinated).

DOI (EN+DE, CC-BY): https://doi.org/10.5281/zenodo.20549020?utm_source=mastodon

#NLP #LLM #AISearch

🧵 1/5

Zero to Cited in Six Days: How a New Author Entity Became AI-Visible While Every AI Crawler Was Blocked

Bilingual research report (English + German PDFs included in this record).We created a brand-new pseudonymous author identity from scratch -- no published work, no reviews, no press -- and measured daily, across five web-grounded LLM surfaces and eleven observation channels, when and how it became visible to AI answer engines. We report five results that the prevailing answer-engine / generative-engine optimization (AEO/GEO) playbook does not predict. (i) Speed: the entity was correctly cited by a web-augmented frontier model within six days, and entered the Google Knowledge Graph by day four. (ii) A locked door that did not matter: for 22 of 23 days, Cloudflare blocked every AI crawler with HTTP 403 (its opt-out default), yet the entity became visible anyway -- isolating knowledge-graph grounding and inference-time retrieval, not on-site crawling, as the operative mechanism. (iii) A provider chasm, not a capability ladder: whether a model cites or hallucinates is decided by provider architecture, with precision ranging from 4.7:1 (correct:hallucinated) for an OpenAI frontier web model to net-negative for Gemini and Claude; newer model generations cite more reliably. (iv) Where the entity lands, the description is complete and source-grounded. (v) Structured identity moves the needle; social reach does not: a 23x Reddit-karma build produced no citation lift. The study is single-subject, pre-registered, and fully open-data. It is offered not as a success story but as a controlled measurement of a layer mostly sold on anecdote. Deutsche Fassung — Von Null zu zitiert in sechs Tagen: Wie eine neue Autor-Identität KI-sichtbar wurde, während jeder KI-Crawler blockiert war.Wir haben eine brandneue, pseudonyme Autor-Identität von Grund auf erschaffen -- kein veröffentlichtes Werk, keine Rezensionen, keine Presse -- und täglich gemessen, über fünf web-gestützte Sprachmodell-Oberflächen und elf Beobachtungskanäle, wann und wie sie für KI-Antwortmaschinen sichtbar wurde. Fünf Ergebnisse, die das gängige AEO/GEO-Playbook nicht vorhersagt: (i) Geschwindigkeit -- korrekt zitiert binnen sechs Tagen, im Google-Wissensgraphen nach vier. (ii) Eine verschlossene Tür, die keine Rolle spielte -- an 22 von 23 Tagen wies Cloudflare jeden KI-Crawler mit HTTP 403 ab, die Entität wurde dennoch sichtbar (Wissensgraph + Erdung zur Antwortzeit, nicht Crawlen). (iii) Eine Kluft zwischen den Anbietern, keine Fähigkeits-Rangfolge -- Präzision von 4,7:1 (OpenAI-Frontier) bis netto-negativ (Gemini, Claude); neuere Modell-Generationen zitieren zuverlässiger. (iv) Wo die Entität landet, ist die Beschreibung vollständig und quellen-geerdet. (v) Strukturierte Identität bewegt die Nadel; soziale Reichweite nicht (23x-Karma-Aufbau = null Zitations-Anstieg). Einzelfall, vor-registriert, vollständig open-data -- eine Messung, keine Erfolgsgeschichte. Diese Aufzeichnung enthält die englische und die deutsche Fassung.

Zenodo