Mastodawn

🧅 Tor is a building block for a free internet that protects your privacy. As a nonprofit, we rely on donations to power our tools that are trusted by millions. These tools are always free to use, without collecting, selling, trading, or renting any of your data for profit.

Donate today to keep Tor strong and build a better internet for tomorrow. Every donation will be matched by our supporters Power Up Privacy. This means a $25 donation will have a $50 impact.

https://torproject.org/donate/donate-md-yec2025

Joe Hawley Apr 27, 2025

moshboy Apr 27, 2025

tempest 2000, print ad, imagery, jaguar (1994) https://archive.org/details/video-games-the-ultimate-gaming-magazine-issue-65-june-1994/page/n47/mode/2up?view=theater

Video Games The Ultimate Gaming Magazine Issue 65 ( June 1994) : video game the ultimate gaming magazine : Free Download, Borrow, and Streaming : Internet Archive

video games the ultimate gaming magazine volume issue 65 june 1994.

Internet Archive

Joe Hawley Mar 6, 2025

The Motley Fool Feb 19, 2025

Should You Buy Roku Stock Right Now?
https://www.fool.com/investing/2025/02/19/should-you-buy-roku-stock-right-now/?utm_source=flipboard&utm_medium=activitypub

Should You Buy Roku Stock Right Now?

The Motley Fool

Joe Hawley Feb 10, 2025

stux⚡️Feb 9, 2025

Remember Aaron♥️

Fuck #Meta

Joe Hawley Sep 19, 2024

Show thread

GoatsLive Feb 17, 2022

@mintyfresh Thanks, I had a very handy grandfather growing up, who taught me basic construction. A next door neighbor taught me welding (hence the $100,000 in high end welding, CNC plasma cutting and machine tools in the barn!) and of course college for my eletrical engineering. As a kid I just soaked it all up!

Joe Hawley Sep 19, 2024

arXiv cs.CL bot Sep 18, 2024

CORE-Bench: Fostering the Credibility of Published Research Through a Computational Reproducibility Agent Benchmark

Zachary S. Siegel, Sayash Kapoor, Nitya Nagdir, Benedikt Stroebl, Arvind Narayanan
https://arxiv.org/abs/2409.11363 https://arxiv.org/pdf/2409.11363 https://arxiv.org/html/2409.11363

arXiv:2409.11363v1 Announce Type: new
Abstract: AI agents have the potential to aid users on a variety of consequential tasks, including conducting scientific research. To spur the development of useful agents, we need benchmarks that are challenging, but more crucially, directly correspond to real-world tasks of interest. This paper introduces such a benchmark, designed to measure the accuracy of AI agents in tackling a crucial yet surprisingly challenging aspect of scientific research: computational reproducibility. This task, fundamental to the scientific process, involves reproducing the results of a study using the provided code and data. We introduce CORE-Bench (Computational Reproducibility Agent Benchmark), a benchmark consisting of 270 tasks based on 90 scientific papers across three disciplines (computer science, social science, and medicine). Tasks in CORE-Bench consist of three difficulty levels and include both language-only and vision-language tasks. We provide an evaluation system to measure the accuracy of agents in a fast and parallelizable way, saving days of evaluation time for each run compared to a sequential implementation. We evaluated two baseline agents: the general-purpose AutoGPT and a task-specific agent called CORE-Agent. We tested both variants using two underlying language models: GPT-4o and GPT-4o-mini. The best agent achieved an accuracy of 21% on the hardest task, showing the vast scope for improvement in automating routine scientific tasks. Having agents that can reproduce existing work is a necessary step towards building agents that can conduct novel research and could verify and improve the performance of other research agents. We hope that CORE-Bench can improve the state of reproducibility and spur the development of future research agents.

CORE-Bench: Fostering the Credibility of Published Research Through a Computational Reproducibility Agent Benchmark

AI agents have the potential to aid users on a variety of consequential tasks, including conducting scientific research. To spur the development of useful agents, we need benchmarks that are challenging, but more crucially, directly correspond to real-world tasks of interest. This paper introduces such a benchmark, designed to measure the accuracy of AI agents in tackling a crucial yet surprisingly challenging aspect of scientific research: computational reproducibility. This task, fundamental to the scientific process, involves reproducing the results of a study using the provided code and data. We introduce CORE-Bench (Computational Reproducibility Agent Benchmark), a benchmark consisting of 270 tasks based on 90 scientific papers across three disciplines (computer science, social science, and medicine). Tasks in CORE-Bench consist of three difficulty levels and include both language-only and vision-language tasks. We provide an evaluation system to measure the accuracy of agents in a fast and parallelizable way, saving days of evaluation time for each run compared to a sequential implementation. We evaluated two baseline agents: the general-purpose AutoGPT and a task-specific agent called CORE-Agent. We tested both variants using two underlying language models: GPT-4o and GPT-4o-mini. The best agent achieved an accuracy of 21% on the hardest task, showing the vast scope for improvement in automating routine scientific tasks. Having agents that can reproduce existing work is a necessary step towards building agents that can conduct novel research and could verify and improve the performance of other research agents. We hope that CORE-Bench can improve the state of reproducibility and spur the development of future research agents.

arXiv.org

Joe Hawley Sep 19, 2024

Show thread

Cory Doctorow Sep 18, 2024

Recent appearances:

* @defcon 32 - Disenshittify or die! How hackers can seize the means of computation
https://www.youtube.com/watch?v=4EmstuO0Em8

* The Van Show
https://www.youtube.com/watch?v=eDa_a-NlILs

* HC4US
https://soundcloud.com/hc4us/081224-cory-doctorow-a-canadian-british-science-fiction-author-activist-and-journalist

10/

DEF CON 32 - Disenshittify or die! How hackers can seize the means of computation - Cory Doctorow

The enshittification of the internet wasn't inevitable. The old, good internet gave way to the enshitternet because we let our bosses enshittify it. We took ...

YouTube

Joe Hawley Sep 19, 2024

Nick Byrd, Ph.D.Sep 18, 2024

If you installed the latest macOS 15.0 Sequoia on your primary Mac for work, how’s it going?

I'm particularly interested in the experiences of those working in computational and/or quantitative sciences, but the more input the better!

To learn more about the new OS: https://apple.com/macos/macos-sequoia/

#Apple #computer #Mac #OS #macOS #Sequoia #software #work #productivity #R #Jamovi #stats #cogSci #compSci #psychology #academia #higherEd #edu

It's great!

59.5%

The bugs are just tolerable.

2.7%

I regret installing it.

8.1%

I haven't installed it

29.7%

Poll ended at Sep 25, 2024 at 1:13pm.

macOS Sequoia

macOS Sequoia brings effortless window tiling, web browsing with fewer distractions, new iPhone Mirroring, and support for Apple Intelligence.

Apple

Joe Hawley Sep 19, 2024

Ethan White Sep 18, 2024

How we like to do science

#OpenScience

Joe Hawley Sep 19, 2024

Show thread

Dynamic Ecology Sep 18, 2024

While I recognize this as a joke, Galileo, Newton and Darwin/Wallace were all probably working on ideas as creative as Computing Ponds. As were Barry Marshall (discoverer that H. pylori causes ulcers and stomach cancer) and Katalin Kariko (who labored in obscurity to lead to the mrna COVID vaccine and the almost limitless potential of mrna medicine). Arguably in ecology Lindeman and Odum (founders of ecosystem ecology), the original Brown & Maurer macroecology papers and probably others were out in that zone too. I don’t know the details, but I bet people who suggested an obscure relative of fungi was causing a global collapse of frogs were out on their own for a while too.

Society in general, and countries whose funding agencies only fund projects rather than researchers (e.g. NSF) in particular, have lost something by not sincerely pursuing funding of bold, out there science. From your description, it certainly sounds like Stafford Beer had earned some rope.

Video Games The Ultimate Gaming Magazine Issue 65 ( June 1994) : video game the ultimate gaming magazine : Free Download, Borrow, and Streaming : Internet Archive

Should You Buy Roku Stock Right Now?

CORE-Bench: Fostering the Credibility of Published Research Through a Computational Reproducibility Agent Benchmark

DEF CON 32 - Disenshittify or die! How hackers can seize the means of computation - Cory Doctorow

It's great!

The bugs are *just* tolerable.

I regret installing it.

I haven't installed it

macOS Sequoia

Barry Marshall - Wikipedia

The bugs are just tolerable.