| Home page | https://victorzhong.com |
| Home page | https://victorzhong.com |
Our work on Reading to Learn, along with terrific work from
@anima_anandkumar and
Karthik Narasimhan was recent featured in @QuantaMagazine !
https://quantamagazine.org/machines-learn-better-if-we-teach-them-the-basics-20230201/
By far the most professional interactions I've had with a news org - a lot of work put into fact checking+editing.
New paper 🚨
Can we solely rely on LLMs’ memories (eg replace search w ChatGPT)? Probably not.
Is retrieval a silver bullet? Probably not either.
Our analysis reveals that LLMs' memorizations are still limited and scaling won't help much in long-tail distributions.
We show that adaptively incorporating non-parametric memories (eg retrieved chunks) can improve performance as well as efficiency.
📜 http://tinyurl.com/2sdeuupn 💻 http://github.com/AlexTMallen/adaptive-retrieval
#PaperThread #newpaper
[1/N]
"Using #Apple's CarPlay system slowed drivers' reaction times nearly five times as much as driving with a blood-alcohol level of 0.08 — but CarPlay is legal on U.S. vehicles, even as U.S. regulators spend millions on anti-distracted driving campaigns to politely request drivers not use it."
ADDED: Link to original study https://iamwebsite.blob.core.windows.net/media/docs/default-source/default-document-library/iam-roadsmart-trl-simulator-study_infotainment.pdf
Users engaged with natural language systems can provide feedback in realtime, and this feedback is a super duper learning signal! So: deploy, train, repeat!
https://arxiv.org/abs/2212.09710
Last PhD paper w/@alsuhr/[email protected] ... 🧵
We propose and deploy an approach to continually train an instruction-following agent from feedback provided by users during collaborative interactions. During interaction, human users instruct an agent using natural language, and provide realtime binary feedback as they observe the agent following their instructions. We design a contextual bandit learning approach, converting user feedback to immediate reward. We evaluate through thousands of human-agent interactions, demonstrating 15.4% absolute improvement in instruction execution accuracy over time. We also show our approach is robust to several design variations, and that the feedback signal is roughly equivalent to the learning signal of supervised demonstration data.
Kind of frustrated with the state of Mastodon as an open source project... it seems pretty ad hoc and none transparent process wise. Please comment on / upvote this discussion if you care about this:
There’s a lot of conversation around the #TwitterFiles. Here’s my take, and thoughts on how to fix the issues identified. I’ll start with the principles I’ve come to believe…based on everything I’ve learned and experienced through my past actions as a Twitter co-founder and lead:Social media must be resilient to corporate and government control.Only the original author may remove content they produce.Moderation is best implemented by algorithmic choice.The Twitter when I led it and the Twitter …