Joe Hawley

@mamador
6 Followers
48 Following
35 Posts

🧅 Tor is a building block for a free internet that protects your privacy. As a nonprofit, we rely on donations to power our tools that are trusted by millions. These tools are always free to use, without collecting, selling, trading, or renting any of your data for profit.

Donate today to keep Tor strong and build a better internet for tomorrow. Every donation will be matched by our supporters Power Up Privacy. This means a $25 donation will have a $50 impact.

https://torproject.org/donate/donate-md-yec2025

Video Games The Ultimate Gaming Magazine Issue 65 ( June 1994) : video game the ultimate gaming magazine : Free Download, Borrow, and Streaming : Internet Archive

video games the ultimate gaming magazine volume issue 65 june 1994.

Internet Archive
Should You Buy Roku Stock Right Now?

The Motley Fool

Remember Aaron♥️

Fuck #Meta 

@mintyfresh Thanks, I had a very handy grandfather growing up, who taught me basic construction. A next door neighbor taught me welding (hence the $100,000 in high end welding, CNC plasma cutting and machine tools in the barn!) and of course college for my eletrical engineering. As a kid I just soaked it all up!

CORE-Bench: Fostering the Credibility of Published Research Through a Computational Reproducibility Agent Benchmark

Zachary S. Siegel, Sayash Kapoor, Nitya Nagdir, Benedikt Stroebl, Arvind Narayanan
https://arxiv.org/abs/2409.11363 https://arxiv.org/pdf/2409.11363 https://arxiv.org/html/2409.11363

arXiv:2409.11363v1 Announce Type: new
Abstract: AI agents have the potential to aid users on a variety of consequential tasks, including conducting scientific research. To spur the development of useful agents, we need benchmarks that are challenging, but more crucially, directly correspond to real-world tasks of interest. This paper introduces such a benchmark, designed to measure the accuracy of AI agents in tackling a crucial yet surprisingly challenging aspect of scientific research: computational reproducibility. This task, fundamental to the scientific process, involves reproducing the results of a study using the provided code and data. We introduce CORE-Bench (Computational Reproducibility Agent Benchmark), a benchmark consisting of 270 tasks based on 90 scientific papers across three disciplines (computer science, social science, and medicine). Tasks in CORE-Bench consist of three difficulty levels and include both language-only and vision-language tasks. We provide an evaluation system to measure the accuracy of agents in a fast and parallelizable way, saving days of evaluation time for each run compared to a sequential implementation. We evaluated two baseline agents: the general-purpose AutoGPT and a task-specific agent called CORE-Agent. We tested both variants using two underlying language models: GPT-4o and GPT-4o-mini. The best agent achieved an accuracy of 21% on the hardest task, showing the vast scope for improvement in automating routine scientific tasks. Having agents that can reproduce existing work is a necessary step towards building agents that can conduct novel research and could verify and improve the performance of other research agents. We hope that CORE-Bench can improve the state of reproducibility and spur the development of future research agents.

CORE-Bench: Fostering the Credibility of Published Research Through a Computational Reproducibility Agent Benchmark

AI agents have the potential to aid users on a variety of consequential tasks, including conducting scientific research. To spur the development of useful agents, we need benchmarks that are challenging, but more crucially, directly correspond to real-world tasks of interest. This paper introduces such a benchmark, designed to measure the accuracy of AI agents in tackling a crucial yet surprisingly challenging aspect of scientific research: computational reproducibility. This task, fundamental to the scientific process, involves reproducing the results of a study using the provided code and data. We introduce CORE-Bench (Computational Reproducibility Agent Benchmark), a benchmark consisting of 270 tasks based on 90 scientific papers across three disciplines (computer science, social science, and medicine). Tasks in CORE-Bench consist of three difficulty levels and include both language-only and vision-language tasks. We provide an evaluation system to measure the accuracy of agents in a fast and parallelizable way, saving days of evaluation time for each run compared to a sequential implementation. We evaluated two baseline agents: the general-purpose AutoGPT and a task-specific agent called CORE-Agent. We tested both variants using two underlying language models: GPT-4o and GPT-4o-mini. The best agent achieved an accuracy of 21% on the hardest task, showing the vast scope for improvement in automating routine scientific tasks. Having agents that can reproduce existing work is a necessary step towards building agents that can conduct novel research and could verify and improve the performance of other research agents. We hope that CORE-Bench can improve the state of reproducibility and spur the development of future research agents.

arXiv.org
DEF CON 32 - Disenshittify or die! How hackers can seize the means of computation - Cory Doctorow

The enshittification of the internet wasn't inevitable. The old, good internet gave way to the enshitternet because we let our bosses enshittify it. We took ...

YouTube

If you installed the latest macOS 15.0 Sequoia on your primary Mac for work, how’s it going?

I'm particularly interested in the experiences of those working in computational and/or quantitative sciences, but the more input the better!

To learn more about the new OS: https://apple.com/macos/macos-sequoia/

#Apple #computer #Mac #OS #macOS #Sequoia #software #work #productivity #R #Jamovi #stats #cogSci #compSci #psychology #academia #higherEd #edu

It's great!
59.5%
The bugs are *just* tolerable.
2.7%
I regret installing it.
8.1%
I haven't installed it
29.7%
Poll ended at .
macOS Sequoia

macOS Sequoia brings effortless window tiling, web browsing with fewer distractions, new iPhone Mirroring, and support for Apple Intelligence.

Apple

How we like to do science

#OpenScience

While I recognize this as a joke, Galileo, Newton and Darwin/Wallace were all probably working on ideas as creative as Computing Ponds. As were Barry Marshall (discoverer that H. pylori causes ulcers and stomach cancer) and Katalin Kariko (who labored in obscurity to lead to the mrna COVID vaccine and the almost limitless potential of mrna medicine). Arguably in ecology Lindeman and Odum (founders of ecosystem ecology), the original Brown & Maurer macroecology papers and probably others were out in that zone too. I don’t know the details, but I bet people who suggested an obscure relative of fungi was causing a global collapse of frogs were out on their own for a while too.

Society in general, and countries whose funding agencies only fund projects rather than researchers (e.g. NSF) in particular, have lost something by not sincerely pursuing funding of bold, out there science. From your description, it certainly sounds like Stafford Beer had earned some rope.

Barry Marshall - Wikipedia