
Formally Verified Suffix Array Construction - Journal of Automated Reasoning
Suffix arrays are a data structure with numerous real-world applications. They are extensively used in text retrieval and data compression applications, including query suggestion mechanisms in web search, and in bioinformatics tools for DNA sequencing and matching. This wide applicability means that algorithms for constructing suffix arrays are of great practical importance. The SA-IS algorithm is an efficient but conceptually complex suffix array construction technique, and implementing it requires a deep understanding of its underlying theory. As a critical step towards developing a provably correct and efficient implementation, we have developed the SA-IS algorithm in Isabelle/HOL and formally verified that it is equivalent to a mathematical functional specification of suffix arrays, a task that required verifying a wide range of underlying properties of strings and suffixes. We also used Isabelle’s code extraction facilities to extract an executable Haskell implementation of SA-IS, which albeit is inefficient due to using lists and natural numbers rather than arrays and machine words, demonstrates that our verified HOL implementation of SA-IS can be refined to an executable implementation in its current form.