Some little problems I can find with the approach mentioned in the #SigmaDiff paper are the following:

* Adjacency matrices, even when using sparse structures, are huge for real world functions.
* Calculating an inter-procedural data dependency graph for real world binaries takes ages.
* I don't understand why are them using a symbolic analyser (might need to re-read it).
* The Cisco Talos Datasets 1-2 don't contain binaries for anything that isn't Linux + GCC or Clang.

This paper looks promising: "SIGMADIFF: Semantics-Aware Deep Graph Matching for Pseudocode Diffing".

https://ink.library.smu.edu.sg/cgi/viewcontent.cgi?article=9671&context=sis_research

#SigmaDiff #Pseudocode #Diffing #BinaryDiffing #BCSA