Refact.ai

@refact_ai
0 Followers
1 Following
20 Posts
Open-source autonomous AI Agent that handles engineering tasks end-to-end.
Get for VS and JetBrains: https://linktr.ee/refactai

Always inspired by what devs build with @refact_ai Agent. While autonomous AI does the coding, people can focus on creativity.

At yesterday’s Showcase:
🏋️Fitness app w/ AI motion tracking
🧠Explainer app turning dev topics into GIFs.

Watch on YouTube 👇
https://youtu.be/kTbNQsmfT08

Build Amazing Apps with AI: Fitness Tracker & Educational Tools Demo | Refact AI Agent

YouTube

❌Prompt engineering isn’t about magic words.
✅It’s about giving your AI Agent the right setup: targeted context, clear intent, and an iteration-friendly workflow.

Watch our webinar:
Prompt Engineering for Developers Without 🐂$h!†'

https://youtu.be/DP_yKoHeWI8?feature=shared

🔥 Refact.ai is now the #1 open-source AI Agent on the SWE-bench Leaderboard:
🔹 SWE-bench Verified → 70.4% solved
🔹 SWE-bench Lite → 60% solved

Soon: new score with Claude 4 Sonnet 🙂‍↕️

Full technical breakdown: https://refact.ai/blog/2025/open-source-sota-on-swe-bench-verified-refact-ai/

We also open-sourced our SWE-bench pipeline, check it on GitHub: https://github.com/smallcloudai/refact-bench

What makes Refact.ai special isn’t just the score — it’s our end-to-end approach.

We build for real-world results, not just leaderboards.

Delegate your everyday programming tasks to our AI Agent, preview every step, and guide the process whenever you like  

***

🖇Explore the technical details of our setup: https://refact.ai/blog/2025/open-source-sota-on-swe-bench-verified-refact-ai/

Try Refact.ai Agent — SOTA on SWE-bench Verified — in your IDE, today:

• VS Code: http://marketplace.visualstudio.com/items?itemName=smallcloud.codify
• JetBrains: http://plugins.jetbrains.com/plugin/20647-refact-ai

Refact.ai is the #1 open-source AI Agent on SWE-bench Verified with a 69.8% score

Refact.ai is the #1 open-source AI Agent on SWE-bench Verified with a 69.8% score

Before SWE-bench Verified, we applied lessons from our SOTA SWE-bench Lite run:

• Made tools more tolerant of the model’s uncertainty
• Renamed them for clarity: definition()→search_symbol_definition(), etc.
• Reduced chat compression
❌Dropped multi-step planning
• & more

🧠The strategic_planning() tool (powered by o3) stepped in when deeper reasoning is needed.

It analyzed the debug_script() report, brainstormed the solution, and applied fixes directly — no patches or diffs.

One mandatory call per task, lean and focused.

AI Agent needed to be more reliable to solve SWE-bench tasks in pass@1.

🛡️We added automatic guardrails:
A script runs static checks on model outputs. If it detects Agent going off track, it injects mid-run helper messages (as from a “user”) to nudge it back in the right direction.

These small actions make a big difference in stability.

We introduced a new sub-agent — debug_script().

It uses pdb to debug, modify, and generate scripts, gathering:
1. Which files are affected
2. What caused the failure
3. How it might be fixed.

We forced at least 1 and up to 3 calls per task.

Model setup:

• Orchestration: Claude 3.7
• debug_script(): Claude 3.7 + o4-mini
• strategic_planning(): o3
• Temp: 0 for Claude

For each benchmark task, our AI Agent made one multi-step run to produce a single, correct final solution.

AI Agent needed to be more reliable to solve SWE-bench tasks in pass@1.

🛡️We added automatic guardrails:
A script runs static checks on model outputs. If it detects Agent going off track, it injects mid-run helper messages (as from a “user”) to nudge it back in the right direction.

These small actions make a big difference in stability.