This research paper – published by IEEE and co-authored by Joe Barr and Tyler Thatcher (Acronis SCS), Peter Shaw (Nanjing University), Faisal Abu-Khzam (Lebanese American University), and Sheng Yu and Heng Yin (University of California – Riverside) – uses an empirical analysis of source code of the Android Fluoride Bluetooth stack to demonstrate a novel approach of classification of source code and rating for vulnerability.
A workflow that combines deep learning and combinatorial techniques with a straightforward random forest regression is presented. Two kinds of embedding are used: code2vec and LSTM, resulting in a distance matrix that is interpreted as a (combinatorial) graph whose vertices represent code components, functions and methods. Cluster Editing is then applied subsets representing nearly complete subgraphs. Finally, the vectors representing the components are used as features to model the components for vulnerability risk.