News
SDSC-housed Protein Data Bank Brings Molecules Up to Size
Published November 04, 2025
By Scott Paton

Accessible to all, millions of scientists, doctors, researchers and students around the world access the RCSB PDB annually. The RCSB PDB Molecule of the Month presents short accounts on selected molecules from the Protein Data Bank. Credit: RCSB PDB
Molecules — the unseen architects that form everything from the DNA in your cells to the planets in distant galaxies, crafting the very fabric of reality while remaining too small for the human eye to glimpse. But thanks to the Protein Data Bank (PDB), enabled by the San Diego Supercomputer at the University of California San Diego School of Computing, Information and Data Sciences, we can now “see” these hidden wonders, allowing scientists to “zoom in” on molecular mechanics, mapping their shapes and decoding their secrets. No longer confined to the study of one molecule in isolation, researchers can compare many, test slight changes, predict behavior and design better molecules for new pharmaceuticals, understanding diseases, more effective vaccines, healthier food and more. And with each new discovery, we are reminded that the universe’s greatest masterpieces not only surround us but reside within us.
Birth of the first global bio database
Founded in 1971 by a group of visionary structural biologists, the PDB was chartered on the concept of making the world’s growing collection of molecular structures publicly available and standardized, turning isolated discoveries into a shared and collaborative global library of life’s molecular architecture. A revolutionary idea at the time, the PDB was the first open-access digital archive in the history of biology.
Cataloging the biological name and function of each molecule and the organism of origin, (such as human, animal, organic, bacterial and viral), the PDB’s launch consisted of a modest seven molecules in its database. Half a century later, that archive now houses more than 240,000 three-dimensional biomolecular structures from experimental studies, expanding every year. Data are submitted by research labs located around the world, and carefully curated and distributed by a global partnership called the Worldwide Protein Data Bank (wwPDB). Today, the U.S. data center for the wwPDB is operated by the Research Collaboratory for Structural Bioinformatics (RCSB), and since partnering with the SDSC in 1999, that repository of information is accessible to scientists, researchers and students worldwide.
Out of sight, not of mind
Molecules live below the threshold of human sight, and often elude even advanced microscopy. Dr. Stephen K. Burley, director of the RCSB PDB, explains how scientists have mapped and illustrated the 3D models that reside in the Protein Data Bank.
“Several methods are currently used to determine the structure of a protein, including X-ray crystallography, NMR spectroscopy and 3D electron microscopy,” Burley said. “In each of these methods, the scientist combines many pieces of experimental data with related scientific data information to create the final atomic structure model available in the PDB.”
He explained that PDB structures have laid the groundwork for modern molecular modeling, which now uses supercomputers and quantum mechanics to predict how biomolecules move and react in real time. Researchers and students use complex visualization tools such as virtual reality to study PDB and other related structures. The enormous wealth of 3D structure data stored in the PDB has underpinned significant advances in our understanding of protein architecture, culminating in recent breakthroughs in protein structure prediction accelerated by artificial intelligence approaches and deep or machine learning methods.
Why molecules are the building blocks of everything
And why is 3D molecular imaging important? “The 3D shapes in biomolecules like proteins are very important because the shape determines how a molecule functions, interacts with other molecules and what biological role it plays,” said Yana Rose, a scientific software developer and data architect at RCSB PDB/SDSC. “Understanding its 3D structure allows us to explain how it works, design effective drugs and engineer new materials.”
When you zoom out from atoms and molecules, you see that:
- All materials – solids, liquids, gases – are collections of molecules (or ions) in different arrangements and motions.
- Macroscopic properties (melting points, hardness, color, solubility) derive from molecular-level structure and bonding.
- In living systems, nearly everything is molecular: proteins, lipids, nucleic acids, sugars, etc.
- Complex systems (cells, tissues, materials) emerge from molecular interactions, assemblies and dynamics.
Supercomputers and molecular modeling: SDSC’s role
In real life, we can’t directly “see” every molecule or watch atoms shift on micro timescales in the lab. We need computational models to help fill the gaps – simulations, docking studies, molecular dynamics, etc.
SDSC provides high-performance computing and data infrastructure that support researchers doing molecular modeling and structural biology. In other words, SDSC is a performance site for the RCSB PDB — helping host, analyze and distribute this structural data. 3D modelling renders the intangible tangible, and what once was invisible, visible.
How does this help with countless examples of molecules?
- Researchers can fetch structures of thousands of different proteins, nucleic acids and complexes from the RCSB PDB — thanks in part to SDSC’s infrastructure.
- With a supercomputer, one can simulate how a molecule moves (molecular dynamics), how it binds a drug or ligand (docking), or how mutations might change structure or behavior — all at atomic resolution.
- A practical example: scientists used software run on SDSC or linked to it to model how fungal enzymes degrade plastic at the molecular level, observing exactly how bonds are recognized and cut.
- In short:
3D models make chemistry and biology click. Students and scientists can explore molecules in virtual reality, rotate them in simulations and see dynamic processes that textbooks can only describe. These models help researchers worldwide test hypotheses, design therapies and visualize how life’s molecular machines truly work – atom by atom.
Spotlight on molecules
Beginning in 2000, shortly after SDSC and RCSB PDB made the resources of the PDB more readily accessible, RCSB PDB launched a monthly column coined Molecule of the Month. The features’ goal is to make complex molecular science accessible, engaging and visually meaningful for a broad audience — from students and teachers to researchers and the simply curious. Humanizing the molecular world, if you will, the column illustrates how invisible molecules are the real actors in the drama of life — underpinning mankind’s advancements in medicine, nutrition, the environment and technology.
Molecule of the Month is created by Janet Iwasa, an assistant professor in the biochemistry department at the University of Utah. "Many of our articles focus on current events, but from a molecular perspective,” Iwasa said. “For example, our latest articles focus on the biology behind weight loss drugs like Ozempic and Wegovy."
A million models and counting
The PDB archive contains nearly a quarter million experimentally determined 3D structures of proteins, nucleic acids and their complexes. “The PDB archive represents the accumulation of incredibly valuable knowledge,” said Jose Duarte, scientific software lead and UC San Diego manager at RCSB PDB/SDSC. “In many cases, a single structure can represent years of study and research by a team of interdisciplinary scientists.”
The PDB is the reason we know what hemoglobin, DNA polymerase, insulin and even the coronavirus spike protein look like at the atomic level — and why modern medicine, bioengineering and molecular research can operate with such precision. Additionally, RCSB PDB now integrates computed structure models (CSMs) — predicted via AI — alongside the experimentally derived ones.
So, in total, including both real and predicted models, RCSB PDB offers access to well over a million 3D molecular structures alongside specialized tools for visualization and analysis at RCSB.org. The Protein Data Bank is an incomparable resource for scientific and medical research. And as one of the PDB’s global hosts, SDSC ensures that this unparalleled repository of human knowledge is available to all.
|
The RCSB PDB’s Molecule of the Month column highlights many fascinating biomolecules, describing their structure, function, biological role and relevance to health or technology. Top Ten Molecules!1. DNA – The Blueprint of Life The elegant spiral, ladder-shaped molecule that carries genetic instructions for all living things. 2. Hemoglobin – The Oxygen Courier This protein lives in your red blood cells and acts like a tiny delivery truck for oxygen, then brings back carbon dioxide to exhale. 3. Myoglobin – The Muscle Oxygen Backup If hemoglobin is the delivery truck, myoglobin is the storage tank. Found in muscles, it keeps an emergency supply of oxygen handy for when you’re physically active. 4. Insulin – The Sugar Regulator Insulin is the tiny hormone that keeps blood sugar levels stable. It helps your body use or store glucose from the food you eat. Before scientists understood its structure, insulin was harvested from animals – 3D modeling helped pave the way for synthetic human insulin. 5. Antibodies – The Body’s Bouncers Antibodies are Y-shaped proteins that patrol your bloodstream, recognizing and tagging invaders like viruses and bacteria for destruction. 6. Coronavirus Spike Protein – The Viral Key The spike protein is the part of the coronavirus that latches onto human cells – the key that unlocks the door for infection. Understanding its 3D structure helped scientists design vaccines that block that keyhole. Once modeled, the spike became one of the most studied molecules in modern history. 7. Photosystem II – The Solar Panel of Nature Plants use this giant molecular complex to split water and make oxygen during photosynthesis – turning sunlight into chemical energy. 8. Collagen – The Body’s Scaffolding Collagen is the most abundant protein in your body, forming the sturdy framework for skin, bones, tendons, and cartilage. It’s like rebar in concrete – without it, we’d be jelly! 9. DNA Polymerase – The Copier of Life Whenever a cell divides, DNA polymerase copies the entire genome letter by letter – a job so precise it makes only one mistake in about a billion genetic codes. 10. Amyloids – Proteins Gone Wrong Amyloids are misfolded proteins that clump together in diseases like Alzheimer’s. Normally helpful molecules get tangled and jam the brain’s circuits. Some amyloids, however, are useful – certain bacteria use them to build protective coatings. All these molecular tales and hundreds more are featured in the RCSB PDB’s Molecule of the Month collection at pdb101.rcsb.org. Each article includes colorful 3D visuals, simple explanations and even classroom materials — a true bridge between fundamental scientific research and public understanding. |