This paper looks promising: "SIGMADIFF: Semantics-Aware Deep Graph Matching for Pseudocode Diffing".

https://ink.library.smu.edu.sg/cgi/viewcontent.cgi?article=9671&context=sis_research

#SigmaDiff #Pseudocode #Diffing #BinaryDiffing #BCSA

This paper is not bad. At the very least, it isn't yet another "novel" academic paper simply using assembly and neural networks. The authors focus on pseudo-code diffing which is the future of binary diffing for obvious reasons. However, I still don't understand why world + dog in academia is solely focusing on deep learning: folks, there are more alternatives than deep learning, please try to research some, because 99,99% of the academia are researching the same.

Some little problems I can find with the approach mentioned in the #SigmaDiff paper are the following:

* Adjacency matrices, even when using sparse structures, are huge for real world functions.
* Calculating an inter-procedural data dependency graph for real world binaries takes ages.
* I don't understand why are them using a symbolic analyser (might need to re-read it).
* The Cisco Talos Datasets 1-2 don't contain binaries for anything that isn't Linux + GCC or Clang.