Why is it impossible to reverse-engineer closed source software?
https://lemmy.world/post/17371738
Why is it impossible to reverse-engineer closed source software? - Lemmy.World
The first programs were written in binary/hexadecimal, and only later did we
invent coding languages to convert between human readable code and binary
machine code. So why can’t we just do the same thing in reverse? I hear a lot
about devices from audio streaming
[https://arstechnica.com/gadgets/2024/05/spotify-wont-open-source-car-thing-but-starts-refund-process/]
to footware
[https://arstechnica.com/gadgets/2024/07/immensely-disappointing-nike-killing-app-for-350-self-tying-sneakers/]
rendered useless by abandonware. Couldn’t a very smart person (or AI) just take
the existing program and turn it into code?
It is not. idk who told you it was.
Disassembling an executable is trivial to do. Everything is open source if you can read assembly.
I’ve used a decompiler to peek at the source code of an app written in Visual Basic I wanted to recreate as a browser addon. It was mostly successful but some variable and function names were messed up.
Variable names, class names, package structure, method names, etc. won’t normally be maintained in the disassembled code. They are meaningless to the CPU, and just a series of memory addresses. In cases where you have method names being mentioned, it’s likely a syscall, and it’s calling a method from an existing library. I’m not familiar with VB, but at least in .Net and .Net Framework, this would be something like the System.Collections.Generic providing the implementation for List<string> and when .Sort() is called, it makes the syscall to that compiled .dll.
You could chuck it at an AI to reverse compile it into something readable.
Instead of just getting the down votes, I’ll explain why that wouldnt work.
The AI itself cannot decompile it without the same tools I would use. The AI would then end up with the same starting spot I have.
Current LLMs do not know how to interpret code logic, and would likely make mistakes in Syscalls, register addresses, and instructions.
Assembly languages themselves have nothing further than instruction sets. I’m sure there are ways to organize it in the super rare case of actually writing assembly, but not to the effect of object oriented or functional programming.
Lastly, other comments have pointed out decompiled code is extremely expensive to analyze. The output from whatever we decompile would easily exceed the input limits for all existing LLMs.