用 C-Reduce 找問題 (包括了程式的問題以及可能的 compiler 問題)

上個禮拜看到「You can use C-Reduce for any language (bernsteinbear.com)」這個,原文「You can use C-Reduce for any language」在說 C-Reduce 可以用在很多語言。 不過我光是 C-Reduce 用在 C 語言的情況都不確定了,就翻翻看有沒有範例可以先了解 C-Reduc...

Gea-Suan Lin's BLOG

You can use C-Reduce for any language | Max Bernstein

Link📌 Summary: C-Reduce 是一個工具,最初由 Regehr 和他的團隊開發,旨在減少 C 編譯器的錯誤重現代碼。儘管最初為 C 語言設計,但它實際上對其他語言同樣適用,只需滿足幾個條件。使用者只需提供一個可重現的錯誤條件和一個或多個可變的源文件,C-Reduce 便能自動縮減代碼。文章中舉了 RustPython 的實例,展示瞭如何透過 C-Reduce 在短時間內將一個文件縮減近 50%。整體操作快速且有效,適合開發者報告軟體錯誤時使用。

🎯 Key Points:
- C-Reduce 的主要功能是縮減 C 編譯器錯誤的代碼。
- 它能適用於多種語言,不僅僅是 C。
- 使用的必要條件包括確定的錯誤條件及靈活的源文件。
- 提供的範例展示瞭如何用 C-Reduce 迅速縮減代碼。
- 使用
--not-c 參數可避免 C-Reduce 的 C 特定操作,適用於非 C 語言的代碼處理。

🔖 Keywords: #C-Reduce #錯誤重現 #多語言支援 #代碼縮減 #開發者工具

You can use C-Reduce for any language

C-Reduce is a tool by Regehr and friends for minimizing C compiler bug reproducers. Imagine if you had a 10,000 line long C file that triggered a Clang bug. You don’t want to send a massive blob to the compiler developers because that’s unhelpful, but you also don’t want to cut it down to size by hand. The good news is that C-Reduce can do that for you. The bad news is that everyone thinks it only works for C.

Max Bernstein
Support Objective-C files · Issue #31 · csmith-project/creduce

clang_delta --query-instances=replace-function-def-with-decl file.mm Error: Unsupported file type! Seems that only C and C++ are supported right now.

GitHub
I tried creduce to reduce a segfaulting test case and it reduced it to this 👍 #creduce #screenshotsaturday

👎​ using #cvise (#creduce replacement) to reduce the C code triggering a compiler problem

👍​ using cvise to reduce the C code triggering a crash inside cvise

https://github.com/marxin/cvise/issues/116

clang_delta: […] clang::Expr::ClassifyImpl(…) const: Assertion `isLValue()' failed. · Issue #116 · marxin/cvise

While trying to reduce a C file, clang_delta, crashes with the following assertion: 00:00:00 INFO ===< ClangBinarySearchPass::replace-function-def-with-decl (30 T) >=== 00:00:00 WARNING clang_delta...

GitHub

👎 użycie #cvise (alternatywa dla #creduce) do zredukowania kodu C powodującego błąd kompilatora

👍 użycie cvise, by redukować kod C powodując wysypanie się cvise

https://github.com/marxin/cvise/issues/116

clang_delta: […] clang::Expr::ClassifyImpl(…) const: Assertion `isLValue()' failed. · Issue #116 · marxin/cvise

While trying to reduce a C file, clang_delta, crashes with the following assertion: 00:00:00 INFO ===< ClangBinarySearchPass::replace-function-def-with-decl (30 T) >=== 00:00:00 WARNING clang_delta...

GitHub

Challenge 2: #debugging. You spend a ton of time writing the compiler, fingers crossed for getting at least the hello world working, but instead you get table/memory out of bounds! Unreachable instruction executed! What on earth went wrong!?? It's totally not a shame to not get things right on the first try. Especially compilers, one small mistake may be amplified repeatedly at compile-time, making the output a pile of trash.

But you're unlucky if targetting #wasm. You may have searched the internet and found blog posts about source maps, dwarf, v8 inspector, or some wasm engine claiming to support debugging via lldb/gdb. My own experience as of today: they are extremely fragile if not to say non-existent. Give them a try anyway, but keep in mind they don't qualify as your lifeboat, and to get dwarf stuff working you need a ton of extra effort during code generation!

There're still some strategies you can follow.

First and foremost: crash early. Instrument your code aggressively, whenever you doubt if a property holds at runtime, assert it. It's common that the runtime state is already corrupt but the module runs longer and trips on other seemingly irrelevant places. You may also dump logs, they do help sometimes.

Next: shrink it. Use wasm-reduce in #binaryen to shrink the wasm module, or even better, use #creduce to shrink the miscompiled module's assembly source (if you know it's the crime scene), or the offending input that triggers the bug. Shrinking is an absolute must to minimize the debugging overhead. In the worst case you don't get additional insight, but at least you get some coffee breaks to relax :/

Sometimes you have an alternative compiler which emits correct wasm from the same input, which can be regarded as the source of truth. Luckily this was the case for #ghc wasm backend! GHC has target-specific assembly generators, but also a target-independent c generator, which is meant to ease porting GHC to new platforms. And it was tremendously useful when I debugged the wasm backend's code generator part; I even spent extra effort to make callconv & symbol names coherent between the two codegens, mixed good/bad objects at link-time, this was super useful when narrowing down the actual crime scenes.

Another low effort thing to try, especially if your compiler piggybacks on other toolchains like #llvm or binaryen: turn off any optimization. If you're lucky, it's someone else's bug :)

#CReduce is an impressive piece of software. If you're not familiar with it: you give it a C file that exhibits a bug, or some other interesting property (you specify a shell script that returns 0 only if the file is interesting). It then proceeds to shorten the file considerably.

I had a 10 kLOC file on which my analyzer showed buggy behaviour, CReduce shrinked it by 99.6 % 😅 (so far)