Mastodawn

Andrew Drozdov Sep 5, 2023

Doug Turnbull Sep 3, 2023

Do we think about vector storage completely wrong?

For search, we care about increasing the initial recall set. Which often can be 100s or 1000s.

Then we want to improve the precision over that with a smarter model that can use many things (BM25, numeric features of the data, cos similarity, etc)

Why do we obsess over perfect accuracy in top 10 for vector dbs? Building to those benchmarks have externalities making data structures that are hard to update and manage?

Andrew Drozdov Jul 26, 2023

Stefan Jul 26, 2023

Mastodon is so complicated. Those 2 million active users must be the smartest people in the world.

Andrew Drozdov Jul 26, 2023

Carl T. Bergstrom Jul 25, 2023

Look at this ridiculous sweetheart.

#birds #birdphotography #birdwatching

Andrew Drozdov Jul 26, 2023

In the age of Mastodon, a "following" score is woefully insufficient.

On Mastodon, your feed is who you follow. We need more guidance on finding people that do a good job both posting and boosting.

I propose a "taste" rating that is a function of everything a user has posted and boosted. Also, "personalized taste" would show how a profile's taste compares with your own — perhaps you want to find both people similar and complementary to yourself.

User Taste > Post Recommendations

Andrew Drozdov Jul 26, 2023

German Embassy London Jul 26, 2023

Here's our latest German Word of the Week:

Andrew Drozdov Jul 26, 2023

Sometimes I wonder if BuzzFeed could have made the twitter alternative we deserve.

Andrew Drozdov Jul 4, 2023

https://cemantle.certitudes.org/

cemantle - Find the secret word!

Find the secret word by trying to get as semantically close as possible.

cemantle

Andrew Drozdov May 5, 2023

RT @gneubig
I had to travel 26 hours and spend $2000+ to join #ICLR2023 in Rwanda.
But people in Africa have to do this every time a conference is held in US.

What happens when we make it easier to participate?

1530% higher registrations from Africa.

This is important and must continue.

Andrew Drozdov May 5, 2023

RT @yoavgo
the amount of chatter and speculation based on a "leaked" document by a random person who works for google is kinda amazing.

Andrew Drozdov May 5, 2023

RT @kalpeshk2011
Happy to share that LongEval received an outstanding paper award 🏆 at #EACL2023! Thanks to @eaclmeeting and our reviewers for the support!

Interested in improving human evaluation of long text? See our paper, code and thread 👇

https://arxiv.org/abs/2301.13298
https://github.com/martiansideofthemoon/longeval-summarization https://twitter.com/kalpeshk2011/status/1620781282044297216

LongEval: Guidelines for Human Evaluation of Faithfulness in Long-form Summarization

While human evaluation remains best practice for accurately judging the faithfulness of automatically-generated summaries, few solutions exist to address the increased difficulty and workload when evaluating long-form summaries. Through a survey of 162 papers on long-form summarization, we first shed light on current human evaluation practices surrounding long-form summaries. We find that 73% of these papers do not perform any human evaluation on model-generated summaries, while other works face new difficulties that manifest when dealing with long documents (e.g., low inter-annotator agreement). Motivated by our survey, we present LongEval, a set of guidelines for human evaluation of faithfulness in long-form summaries that addresses the following challenges: (1) How can we achieve high inter-annotator agreement on faithfulness scores? (2) How can we minimize annotator workload while maintaining accurate faithfulness scores? and (3) Do humans benefit from automated alignment between summary and source snippets? We deploy LongEval in annotation studies on two long-form summarization datasets in different domains (SQuALITY and PubMed), and we find that switching to a finer granularity of judgment (e.g., clause-level) reduces inter-annotator variance in faithfulness scores (e.g., std-dev from 18.5 to 6.8). We also show that scores from a partial annotation of fine-grained units highly correlates with scores from a full annotation workload (0.89 Kendall's tau using 50% judgments). We release our human judgments, annotation templates, and our software as a Python library for future research.

arXiv.org

personal site	https://mrdrozdov.github.io
umass	https://people.cs.umass.edu/~adrozdov
twitter	https://twitter.com/mrdrozdov