Security Professional and Researcher

Georgios Nikitopoulos

Our Binary Executable Vulnerability Dataset: BinPool

Posted at # publications

BinPool: A Binary Executable Vulnerability Dataset for Security Research

A Binary Executable Vulnerability Dataset for Security Research Recently, we released BinPool, a comprehensive dataset designed to advance vulnerability detection and binary security analysis. BinPool was published at the 33rd ACM International Conference on the Foundations of Software Engineering (FSE ’25), underscoring its significance to the research community. Unlike many existing datasets that rely on synthetic bugs or source code alone, BinPool focuses on real-world vulnerabilities in binary executables collected from Debian packages with matching vulnerable and patched versions.

Why BinPool? The scarcity of publicly available, large-scale binary datasets with precise vulnerability annotations has long been a barrier to developing and benchmarking effective vulnerability detection tools at the binary level. BinPool addresses this by combining: 6,144 binaries compiled from 162 Debian packages, Coverage of 603 distinct CVEs across 89 CWE categories, Both vulnerable and patched binaries compiled at four optimization levels, Detailed metadata linking vulnerabilities to exact source and binary functions and lines.

How Was BinPool Built? The dataset was automatically curated by combining data from the National Vulnerability Database (NVD), Debian Security Tracker, and Debian Snapshots. Using automated build and patching pipelines, vulnerable and fixed binaries were compiled and metadata was extracted by parsing source patches and debugging information. This automated process ensures reproducibility and extensibility.

What Can You Do With BinPool? BinPool enables research and development in: Automated vulnerability detection in binaries, Binary function similarity and code plagiarism detection, Benchmarking static and dynamic analysis tools, Understanding real-world vulnerability lifecycles through patch metadata.

Access BinPool BinPool is publicly available along with automation scripts and full documentation: https://github.com/SimaArasteh/binpool

Access the paper: https://dl.acm.org/doi/10.1145/3696630.3728606

Looking Ahead With the increasing importance of binary-level security, BinPool provides a critical foundation for future research and tool development. We hope this dataset accelerates progress toward robust, scalable binary vulnerability detection.