TLDR: This research paper reviews the latest advancements and applications of diffusion models (DMs) in generating small molecules for drug discovery. It explains the theoretical principles of DMs, categorizes various methods based on their applications (target-free vs. target-aware), and evaluates their performance on benchmark datasets. The paper highlights the strengths of DMs in creating novel molecular structures, particularly 3D geometries, and discusses key challenges such as interpretability, the need for better integration of chemical constraints, and computational efficiency. It concludes by outlining future research directions to enhance the physical realism and biological relevance of DM-generated molecules.
Generative artificial intelligence is rapidly changing the landscape of drug discovery, offering new ways to design drugs and explore vast chemical possibilities. Among these AI tools, diffusion models (DMs) have gained significant attention, especially in the research and development of new medicines.
This paper provides a comprehensive overview of the latest advancements and applications of diffusion models in generating molecules. It starts by explaining the basic ideas behind these models and then categorizes different DM-based methods based on their mathematical approaches and chemical applications. The review also looks at how well these models perform on standard datasets, with a special focus on comparing methods that generate 3D molecular structures. Finally, it discusses the current challenges and suggests future research directions to fully harness the power of diffusion models in drug discovery.
Understanding Diffusion Models
At their core, diffusion models work in two main steps. Imagine you have a clean image or a perfect molecule. In the ‘forward diffusion’ step, noise is gradually added to this clean representation, making it increasingly blurry or perturbed. The ‘reverse diffusion’ step is where the magic happens: the model learns to gradually remove this noise, step by step, until it reconstructs the original, clean data. By learning this process, the model can then generate entirely new, realistic structures from pure noise. This approach has been incredibly successful in image generation, even outperforming older methods like VAEs and GANs, and researchers are now applying this success to creating small molecules.
Traditional drug discovery is often slow, expensive, and has a high failure rate. AI, particularly generative models, offers a promising solution to make this process more efficient. For example, virtual screening, which can take up to half of the drug research and development cycle, is a major bottleneck. Diffusion models can generate diverse libraries of candidate molecules for virtual screening, significantly improving both the effectiveness and efficiency of this crucial step.
Representing Molecules for AI
Molecules can be represented in various ways for AI models. The Simplified Molecular Input Line Entry System (SMILES) uses text symbols to describe atoms and their chemical bonds, allowing for molecule generation using natural language processing techniques. Two-dimensional molecular graphs, where atoms are nodes and bonds are edges, are also common. However, these methods don’t fully capture the three-dimensional information of molecules. For 3D structures, computer science uses point clouds or voxels to represent the spatial arrangement of atoms. Many molecular generation models also consider ‘rotational and translational equivariance,’ meaning that the molecule’s properties should remain consistent even if its coordinates are rotated or translated in 3D space. This helps models better capture the physical properties of molecules.
A New Classification for Diffusion Models
The paper proposes a new way to categorize diffusion models for molecule generation, dividing them into ‘target-free’ and ‘target-aware’ categories. Target-free models are used to explore a wide range of chemical possibilities and generate diverse molecular structures without needing to know a specific biological target. They are foundational, validating the core capabilities of new DM methods. In contrast, target-aware models use specific information, like the 3D structure of a protein, to generate molecules that are more likely to be effective and specific to that target, which is a huge advantage in drug design.
Further distinctions are made based on whether the models generate molecular ‘conformations’ (refining the spatial arrangement of an existing molecule) or perform ‘de novo’ generation (creating entirely new molecular structures from scratch). The categorization also considers the data representation (1D, 2D, or 3D), the specific diffusion process formulation (DDPM or score-based DM), and whether they maintain ‘equivariance’ to preserve geometric properties.
Generating Molecules: Target-Free vs. Target-Aware
For **target-free generation**, models are trained on datasets of 3D molecular structures like QM9 and GEOM-Drugs. These models aim to produce valid, stable, and novel molecules. One notable model, MiDi, which generates 3D molecular graphs, showed superior performance in terms of stability, validity, and novelty. Other approaches include those focusing on conformation generation (like GeoDiff and Torsional Diffusion), molecular structure generation (like EDM and MDM), multi-stage generation (like HierDiff and GeoLDM), voxel-based generation (like VoxMol), and 2D graph-based or 1D SMILES-based models.
For **target-aware generation**, the goal is to create molecules, specifically ligands, that can interact with a target protein, accelerating drug discovery. This is a highly active area of research. These models condition the diffusion process on the protein pocket. Key considerations include ensuring the chemical feasibility of generated molecules, capturing spatial interactions between protein and ligand atoms, and balancing chemical space exploration with existing knowledge of protein-ligand interactions. The CrossDocked2020 dataset is commonly used for benchmarking. Models like TargetDiff and DiffSBDD are foundational in this area. More advanced methods incorporate binding affinity predictors (like KGDiff and IPDiff) or focus on detailed protein-ligand interaction modeling (like PMDM and InterDiff). Fragment-based models, such as DiffLinker and AutoFragDiff, combine autoregressive and non-autoregressive approaches by assembling molecules from pre-defined fragments.
Benchmarking and Key Insights
The paper conducted extensive experiments, evaluating eighteen diffusion models across target-free and target-aware generation tasks. For target-free generation, MiDi emerged as the top performer. For target-aware generation, KGDiff and PMDM showed the best results after relaxation. The evaluation highlighted a crucial aspect: the need for ‘relaxation’ (optimizing ligand conformations using force fields) to improve the geometric validity of generated molecules. While relaxation often improves validity, it can sometimes weaken the molecule’s ability to bind to its target, indicating a trade-off between geometric validity and binding affinity. The study also noted significant differences in computational efficiency, with target-aware models generally requiring much longer processing times due to the complexity of encoding protein structures and modeling interactions.
Also Read:
- New AI Model Predicts Red Blood Cell Toxicity of Antimicrobial Peptides
- Generative AI’s Dual Impact on Science: A Look at Its Expanding Role and Lingering Questions
Challenges and Future Directions
Despite their potential, current molecular diffusion models face several limitations. One major challenge is ‘interpretability’ – it’s often unclear how and why these models generate specific molecules, hindering systematic improvements. The reliance on post-hoc geometric relaxation to achieve structural validity suggests that current training methods don’t fully incorporate chemical constraints. This means a generated molecule might look valid but have unfavorable energetic properties, making it biologically ineffective.
Future research should focus on making these models more physically realistic and biologically relevant. This includes integrating molecular mechanics or quantum chemistry energy constraints directly into the diffusion process to generate molecules that are both structurally valid and energetically favorable. The increasing availability of high-quality protein structure predictions, like those from AlphaFold3, opens new doors for developing multi-modal and target-aware molecular generation models. Jointly modeling molecules and their biological targets, combined with better evaluation frameworks that consider synthetic accessibility, binding relevance, and experimental feasibility, will help align model outcomes with real-world drug discovery needs.
AI-driven drug discovery is already showing real-world impact, with AI-designed drug candidates advancing to clinical trials. The ability of generative models to augment or even partially replace traditional experimental workflows will be critical for broader impact. Continued progress will depend on both methodological advancements and the availability of high-quality structural and biochemical data to support reliable model training and validation.
For more in-depth information, you can read the full research paper here: Unraveling the Potential of Diffusion Models in Small Molecule Generation.


