Mastodawn

Anthropic's Mythos AI Falls Short in Bug-Hunting Test

Anthropic's highly-hyped Mythos AI failed to impress in a recent bug-hunting test against cURL's codebase, with results that were largely dismissed as overhyped marketing. The limited test, run by cURL developer Daniel Stenberg, revealed that Mythos fell short of expectations.

https://osintsights.com/anthropics-mythos-ai-falls-short-in-bug-hunting-test?utm_source=mastodon&utm_medium=social

#AiTesting #BugHunting #OpenSource #ProjectGlasswing #LinuxFoundation

Anthropic's Mythos AI Falls Short in Bug-Hunting Test

Discover why Anthropic's Mythos AI falls short in bug-hunting tests and learn from Daniel Stenberg's experience - read the full story now and explore AI limitations.

OSINTSights

LBHuston May 5

Strong ethical justification throughout.

Evaluation Report: Qwen-3 1.7B in LMStudio on M1 Mac

I tested Qwen-3 1.7B in LMStudio 0.3.15 (Build 11) on an M1 Mac. Here are the ratings and findings: Final Grade: B+ Qwen-3 1.7B is a capable and well-balanced LLM that excels in clarity, ethics, an…

Not Quite Random

HackerNoon Apr 23

Most AI testing tools reset after every run. Learn how a unified data layer gives intelligent test automation the memory it needs to get smarter with every test https://hackernoon.com/your-ai-testing-tool-has-no-memory-heres-why-thats-a-problem #aitesting

Your AI Testing Tool Has No Memory: Here's Why That's a Problem | HackerNoon

Most AI testing tools reset after every run. Learn how a unified data layer gives intelligent test automation the memory it needs to get smarter with every test

Karakun Apr 15

Testing AI systems? The old rules no longer apply.

Non-deterministic systems cannot be assessed using deterministic methods. “Pass/Fail” is too narrow.

At #SwissTestingDay, Mike Mannion reflected on key themes:

➡️ probabilities over binary results
➡️ system behaviour over isolated outputs
➡️ testing as a strategic discipline

He also points to approaches like #PUnit for evaluating such systems.

Conference insights: https://dev.karakun.com/2026/04/02/Swiss-Testing-Day.html

#AITesting #SoftwareEngineering #QualityEngineering

Markus Schlichting Apr 2

Very happy to share Mike’s conference report from the Swiss Testing Days in @Karakun #DevHub blog https://dev.karakun.com/2026/04/02/Swiss-Testing-Day.html

#testing #conference #probibalistic #aitesting

Swiss Testing Day 2026 – Reflections on Testing AI and Non-Deterministic Systems

Insights from Swiss Testing Day 2026 on AI testing, non-deterministic systems, and strategies for ensuring reliability in modern software engineering.

Karakun Developer Hub

kiteto Mar 26

The codeless testing market is booming. But most "no-code testing" tools are still record-and-playback with a fresh coat of paint.

The actual next step? Let the people who know the requirements describe tests in natural language — and have AI handle the automation.

That's what we're building with kiteto. Testing as a domain task, not a developer task.

https://www.kiteto.ai/?utm_source=mastodon.social&utm_medium=social&utm_campaign=progress_reports

#TestAutomation #CodelessTesting #AITesting #E2ETesting

kiteto | AI-Powered Test Automation from Natural Language

Transform text descriptions into automated E2E tests with kiteto. No coding required. Book a demo now and get Early Access.

kiteto Mar 4

AI makes you ship faster. It also makes your code buggier.

AI-generated code has 1.7× more issues than human-written code. (CodeRabbit, 2025)

E2E tests aren't a nice-to-have anymore. They're what makes the speed sustainable.

AI makes you ship faster — but only if the right tests have your back.

#AITesting #E2ETesting #QA

sayzard Mar 4

BOOTOSHI (@KingBootoshi)

AI 에이전트(agent)가 코드베이스에 대해 직접 테스트와 실험을 수행해 문서화되지 않은 기능을 찾아냈다는 경험담입니다. 작성자는 Claude에게 실제 실험을 수행하도록 지시했고, 에이전트가 가정하지 않고 직접 검증을 실행해 문제를 해결했다고 전합니다. 에이전트의 실무적 활용 사례입니다.

https://x.com/KingBootoshi/status/2028937773789659374

#agents #automation #claude #aitesting

BOOTOSHI 👑 (@KingBootoshi) on X

agents are AMAZING at experiments docs didn't cover a feature i wanted. exa code/web search didn't cover it either ai was left making assumptions, nah nah nah NO assumptions i told claude to go run actual tests and experiments against our codebase to figure it out and it did!

X (formerly Twitter)

sayzard Feb 24

Google for Developers (@googledevs)

Google이 Android Studio용 ‘Journeys’를 공개하여, 자연어로 UI 테스트를 자동 생성할 수 있게 되었습니다. 사용자는 앱 내 시각적 상태를 검증하고, Gemini 모델의 단계별 추론 과정을 추적할 수 있습니다. 이는 AI 기반 테스트 자동화의 새로운 사례로, 개발 효율성을 크게 높일 수 있는 기능입니다.

https://x.com/googledevs/status/2026356453775200698

#google #androidstudio #gemini #aitesting #automation

Google for Developers (@googledevs) on X

Generate UI tests using natural language with Journeys for @AndroidStudio → https://t.co/NWzowOkRaF Validate visual states and follow Gemini’s step-by-step reasoning as it navigates your app.

X (formerly Twitter)

sayzard Feb 18

AshutoshShrivastava (@ai_for_success)

파트너십(또는 협업) 언급과 함께 KaneAI가 웹·모바일·API 전반에서 동작한다고 알리며 사용해보라는 초대를 포함합니다. 멀티플랫폼 지원을 강조한 제품/서비스 론칭·홍보 트윗입니다.

https://x.com/ai_for_success/status/2024167117822853318

#kaneai #testautomation #mobile #web #aitesting

AshutoshShrivastava (@ai_for_success) on X

3/3 In partnership with @testmuai It works across web, mobile, APIs. Try it: https://t.co/rgdEZn3kbW

X (formerly Twitter)