This thesis examines two forms of constraint-driven machine learning-based molecule generation techniques. The first is BBO-SYN, a generative framework based on black-box optimization (BBO), which predicts diverse molecules with desired properties together with corresponding synthesis pathways. BBO-SYN uses recent advances in a Monte Carlo Tree Search-based latent search algorithm to locate promising reactants that produce high-scoring products when fed to a pretrained language model for chemical reaction prediction. BBO-SYN is empirically shown to produce high-scoring and diverse synthesis trees while operating over a large continuous reactant space. Similarly, after exploring synthesizability constraints, CoarsenConf was developed to generate optimal 3D low-energy conformers in an SE(3) equivariant fashion. CoarsenConf is a hierarchical graph variational autoencoder that coarsens input molecule graphs based on torsion angles to learn a subgraph level latent distribution that is used for an efficient autoregressive generation via aggregated attention. CoarsenConf predominantly outperforms state-or-the-art methods with significantly less data and training iterations on more robust benchmarks.




Download Full History