110 Followers
156 Following
141 Posts

Cofounder at Phaselab (https://phaselab.co)

Former product/exec at Cortico, Speakable, Chartbeat. Board member at @wxyc. Spent some time at WFMU, Internet Archive. Into thru-hiking.

Websitehttps://dvd.qa
Twitterhttps://twitter.com/dvd
LocationChapel Hill, NC
getting angry/blind everytime someone plays coldplay on wxyc. why???
more excited than i should be that /s/typo/typofix actually fixes your typo in Slack now. how long has that been a thing?!

What is wrong with these people?!

"The OAG sent an inquiry letter to a national cremation services company, after receiving a complaint from a Connecticut resident who received an advertisement in the mail for cremation services after recently completing chemotherapy."

From the just released CT AG report on their first enforcement actions under the new state data privacy law: https://portal.ct.gov/-/media/AG/Press_Releases/2024/CTDPA-Final-Report.pdf

"A spokeswoman for Intuit, Tania Mercado, criticized the direct file project as a “half-baked solution” and a waste of taxpayer money. “The direct file scheme is a solution in search of a problem,” she said. Intuit makes the TurboTax tax preparation software."

Oooookay Tania...

https://www.nytimes.com/2024/01/05/your-money/irs-tax-filing-free-online.html

I.R.S. to Begin Trial of Its Own Free Tax-Filing System

Residents of 12 states are eligible to participate if they meet certain criteria. But the agency’s plans have already met resistance from tax preparation companies.

The New York Times
'tis the season (to listen to pentangle all day)

This email from the CEO of Carta is a crazy self-own. No I didn't hear about the upsetting allegations against your company, but I guess I'll just delete this email and move on, as instructed?

To amplify one of my colleagues, "anything less than an apology is insane".

As an aside, simple watermarks have been used successfully in the past... Rap Genius alternated straight/curly apostrophes to catch Google scraping their song lyrics (though they ended up losing in court):

https://www.pcmag.com/news/genius-we-caught-google-red-handed-stealing-lyrics-data

Genius: We Caught Google 'Red Handed' Stealing Lyrics Data

Genius secretly watermarks certain songs with patterns of apostrophes, some of which translate to 'red-handed' in Morse code. Google blames it on third-party licensing partners.

PCMag

There's so much interesting work happening in being able to layer some form of rights protection into datasets to track usage in LLMs, or in some cases spoil the model altogether. Two I learned about recently:

Easymark for text-based watermarking: https://arxiv.org/abs/2310.08920

Nightshade for images: https://www.technologyreview.com/2023/10/23/1082189/data-poisoning-artists-fight-generative-ai/

Embarrassingly Simple Text Watermarks

We propose Easymark, a family of embarrassingly simple yet effective watermarks. Text watermarking is becoming increasingly important with the advent of Large Language Models (LLM). LLMs can generate texts that cannot be distinguished from human-written texts. This is a serious problem for the credibility of the text. Easymark is a simple yet effective solution to this problem. Easymark can inject a watermark without changing the meaning of the text at all while a validator can detect if a text was generated from a system that adopted Easymark or not with high credibility. Easymark is extremely easy to implement so that it only requires a few lines of code. Easymark does not require access to LLMs, so it can be implemented on the user-side when the LLM providers do not offer watermarked LLMs. In spite of its simplicity, it achieves higher detection accuracy and BLEU scores than the state-of-the-art text watermarking methods. We also prove the impossibility theorem of perfect watermarking, which is valuable in its own right. This theorem shows that no matter how sophisticated a watermark is, a malicious user could remove it from the text, which motivate us to use a simple watermark such as Easymark. We carry out experiments with LLM-generated texts and confirm that Easymark can be detected reliably without any degradation of BLEU and perplexity, and outperform state-of-the-art watermarks in terms of both quality and reliability.

arXiv.org

I can attribute about 80% of my success in life to one simple rule:

When someone asks you a random question, give them a random answer.

Large majorities of both Democrats and Republicans believe that there should be more regulation of what companies do with personal information.

We need comprehensive federal privacy legislation.