This unicode fuckery isn't just for domain names: I've had assholes list my books for sale on Amazon and Apple Books under an account that uses a unicode glyph to replace one of the letters in my name—the title shows up in search, but any payments for sales go into the grifter's account.
https://chaos.social/@Emathion/114613267697396447
Emathion :mastodon: 📯 (@[email protected])

Attached: 1 image Spot the Difference?

chaos.social
@cstross It's stunning that Amazon is unwilling and lazy enough to not implement some simple filtering based on name and title similarities. This shouldn't even take a day to write the necessary code for.
@tsturm First ask yourself how many books have the same title. (Answer: lots.) Second, ask yourself how many authors have similar names. (Obligatory shout-out to tech author Randall Stross at this point.) Now ask yourself what lawsuits AMZN would open itself up for by auto-banning authors or book titles …

@cstross Well, not banning, just running any vague matches of author+title involving unusual unicode ranges through a manual queue during title setup.

Amazon is a big company, pretty sure they could handle this pretty smoothly IF they wanted to.

@tsturm @cstross My guess is that the problem is volume.

Not only would they have to have enough staff to process those manual queues (in multiple languages) but those staff would have to understand well enough to know which author is real but similar to some other author and which is a grifter fake.

Bearing in mind that Goodreads relies on a lot of volunteer librarians, I'd be surprised if it were practical at scale.

Of course, that *shouldn't* be an excuse, but enshittification…

@cstross @tsturm Checking for a match on covers isn't exactly outside their budget these days though - just not something you can always take automatic action in response to
@flippac @tsturm The particularly pernicious trick the scammer used was to add a big gold medalion saying 50% OFF in the middle of the cover. Which might well disrupt a simple image match.

@cstross @tsturm OTOH they could auto-ban names that use a suspicious mix of specific characters from different unicode blocks, as defined by the unicode consortium itself

unicode.org/reports/tr39/

(there are libraries that do all of the dirty work for you)

if they allowed for a manual override (after reasonable checks) for that one author who really wants to sell a book titled “don't go to aⅿazon.com” I'd think it would be a pretty reasonable restriction

UTS #39: Unicode Security Mechanisms

@cstross @tsturm
Does the issue you're experiencing not risk them lawsuits?

@cstross

That’s terrible

It’s strange that Amazon allowed these accounts to sell the books at all; when I self-publish titles that either are, or have been, also published by traditional publishers, I have to jump through all kinds of hoops to prove to Amazon that I have the rights for the territory and format in question — sending them scans of my publishing contracts and letters of reversion. I’ve sometimes had to argue for *months* with multiple different Amazon employees following their opaque procedures to convince them that I’ve proved my case. So I don’t know what these grifters are doing to get their wholly fraudulent authorisation with such ease.

@gregeganSF I just point Hachette or Macmillan's enforcement people at the item in question and it goes away Real Fast. (All my books are still in print with one Big Five imprint or another, except for a web book from 1996 and a short story collection from 2001 that is superseded by a much better one.)
@cstross Surprised that their search even returns them. I suppose their match distance is very permissive.

@lispi314 @cstross Multiple different author accounts can have books with perfectly overlapping names. Thus, someone could abuse the system by making a “C. Stross” account and listing books with all the same titles. A search for “Invisible Sun Stross” would return both. Scummy search engine gaming tricks could result in the fake being ranked higher. Doesn’t even have to be the same book. They just throw the cover on some LLM garbage or whatever.

It’s difficult for automated systems to catch this reliably, and book marketplaces like Amazon are unwilling to hire enough people to review what the automated systems don’t catch.

@cstross
Well, yes.

And Unicode has a number of even more advanced fun topics. Like optional decomposition of diacritics. Yes, there are code points that serve as suffix to add all kinds of stuff to a character. So that à and ä have two Unicode representations and this also two utf-8 encodings.

MacOS FS is one place that uses decomposited Unicode. So zip files with diacritics look fine, but actually break when used say on Linux.