Motivation
Selecting diverse and representative subsets is crucial for the data-driven models and machine learning applications in many science and engineering disciplines, especially for molecular design and drug discovery. Motivated by this, we develop the Selector package, a free and open-source Python library for selecting diverse subsets.
The Selector
library implements a range of existing algorithms for subset sampling based on the distance between and similarity of samples, as well as tools based on spatial partitioning. In addition, it includes seven diversity measures for quantifying the diversity of a given set. We also implemented various mathematical formulations to convert similarities into dissimilarities.
Selector
Library
Selector is a free, open-source, and cross-platform Python library designed to help you effortlessly identify the most diverse subset of molecules from your dataset. Please use the following citation in any publication using Selector library:
Citation
Please use the following citation in any publication using the Selector
library:
To be added
More Information
For more information about the Selector library, please visit our GitHub repository and documentation at https://selector.qcdevs.org.
Acknowledgments
This webserver is supported by the DRI EDIA Champions Pilot Program. We are grateful to the Digital Research Alliance for providing the computing resources.