I’ve found a few papers, but I might be using the wrong search terms. I’m particularly interested in comparing implementations or code with their specifications, or more broadly, in using machine learning to verify code equivalence.

If you’re referring to “NASA-level mathematical correctness” for a software program, I’m not aware of any. Since machine learning is probabilistic, it seems to be the opposite approach to formal verification, right?

Are you referring to unit testing, or what type of software verification do you mean?

The intersection of large language models (LLMs) and formal reasoning is a highly active area right now, with numerous efforts to enhance reasoning in LLMs through formal methods.

There is extensive research on using machine learning for theorem proving. AlphaProof tackles Olympiad problems by first formalizing them in Lean and then generating a proof. Thanks to the Curry-Howard isomorphism, formal verification in dependently typed languages becomes the task of generating a function with the required type.