Be aware 1: This submit is a component 2 of a three-part collection on healthcare, information graphs, and classes for different industries. Half 1, “What Is a Data Graph — and Why It Issues” is on the market right here.
Be aware 2: All photos by writer
In Half 1, we described how structured information enabled healthcare’s progress. This text examines why healthcare, greater than another business, was in a position to construct that construction at scale.
Healthcare is probably the most mature business in the usage of information graphs for just a few elementary causes. At its core, medication is grounded in empirical science (biology, chemistry, pharmacology) which makes it potential to determine a shared understanding of the forms of issues that exist, how they work together, and causality. In different phrases, healthcare lends itself naturally to ontology.
The business additionally advantages from a deep tradition of shared managed vocabularies. Scientists and clinicians are pure librarians. By necessity, they meticulously record and categorize every thing they’ll discover, from genes to ailments. This emphasis on classification is bolstered by a dedication to empirical, reproducible commentary, the place information should be comparable throughout establishments, research, and time.
Lastly, there are structural forces which have accelerated maturity: strict regulation; sturdy pre-competitive collaboration; sustained public funding; and open information requirements. All of those elements incentivize shared requirements and reusable information quite than remoted, proprietary fashions.
Collectively, these elements created the circumstances for healthcare to construct sturdy, shared semantic infrastructure—permitting information to build up throughout establishments, generations, and applied sciences.
Ontologies
People have all the time tried to grasp how the world works. After we observe and report the identical factor repeatedly, and agree that it’s true, we develop a shared understanding of actuality. This course of is formalized in science utilizing the scientific methodology. Scientists develop a speculation, conduct an experiment, and consider the outcomes empirically. On this manner, people have been creating an implicit medical ontology for hundreds of years.
Otzi, the caveman found in 1991, who lived 5,300 years in the past, was found with an antibacterial fungus in his leggings, prone to deal with his whipworm an infection (Kirsch and Ogas 4). Even cavemen had some understanding that crops might be used to deal with illnesses.

Finally, scientists realized that it wasn’t the plant itself that was treating the ailment, however compounds contained in the plant, and that they might mess with the molecular construction of those compounds within the lab and make them stronger or simpler. This was the start of natural chemistry and the way Bayer invented Aspirin (by tweaking Willow bark) and Heroin (by tweaking opium from poppies) (Hager 75; Kirsch and Ogas 69). This added a brand new class to the ontology: compounds. With every new scientific breakthrough, our understanding of the pure world developed, and we up to date our ontology accordingly.

Over time, medication developed a layered ontology, the place every new class didn’t substitute the earlier one however prolonged it. The ontology grew to incorporate pathogens after scientists Fritz Schaudinn and Erich Hoffmann found the underlying explanation for syphilis was a bacterium known as Treponema pallidum. We realized microbes might be discovered nearly in all places and a few of them might kill micro organism, like penicillin, so microbes had been added to our idea.

We realized that DNA comprises genes, which encode proteins, which work together with organic processes and danger elements. Each main advance in medication added new courses of issues to our shared understanding of actuality and compelled us to motive about how these courses work together. Lengthy earlier than computer systems, healthcare had already constructed a layered ontology. Data graphs didn’t introduce this mind-set; they merely gave it a proper, computational substrate.
As we speak, we have now ontologies for anatomy (Uberon), genes (Gene Ontology), chemical compounds (ChEBI) and a whole bunch of different domains. Repositories resembling BioPortal and the OBO Foundry present entry to nicely over a thousand biomedical ontologies.
Managed vocabularies
As soon as a category of issues was outlined, medication instantly started naming and cataloging each occasion it might discover. Scientists are nice at cataloging and defining cases of courses. De materia medica, the primary pharmacopoeia, was accomplished in 70 CE. It was a e book of about 600 crops and about 1000 medicines. When chemists started working with natural compounds within the lab, they created hundreds of recent molecules that wanted to be cataloged. In response, the primary quantity of the Beilstein Handbook of Natural Chemistry was launched in 1881. This handbook catalogued all recognized natural compounds, their reactions and properties, and grew to comprise thousands and thousands of entries.

This sample repeats all through the historical past of drugs. Each time our understanding of the pure world improved, and a brand new class was added to the ontology, scientists started cataloging the entire cases of that class. Following Louis Pasteur’s discovering in 1861 that germs trigger illness, folks started cataloging all of the pathogens they might discover. In 1923, the primary model of Bergey’s Handbook of Determinative Bacteriology was printed, which contained a couple of thousand distinctive micro organism species.

The identical sample repeated with the invention of genes, proteins, danger elements, and hostile results. As we speak, we have now wealthy managed vocabularies for circumstances and procedures (SNOMED CT), ailments (ICD 11), hostile results (MedDRA), medicine (RxNorm), compounds (CheBI and PubChem), proteins (UniProt), and genes (NCBI Gene). Most giant pharma firms work with dozens of those third-party managed vocabularies.
Considerably confusingly, ontologies and managed vocabularies are sometimes blended in observe. Massive managed vocabularies ceaselessly comprise cases from a number of courses together with a light-weight semantic mannequin (ontology) that relates them. SNOMED CT, for instance, contains cases of ailments, signs, procedures, and medical findings, in addition to formally outlined relationships resembling has intent and resulting from. In doing so, it combines a managed vocabulary with ontological construction, successfully functioning as a information graph in its personal proper.
Laws
Following a mass poisoning that killed 107 folks resulting from an improperly ready “elixir” in 1937, the US authorities gave the Meals and Drug Administration (FDA) elevated regulatory powers (Kirsch 97). The Federal Meals, Drug, and Beauty Act of 1938 had necessities on how medicine must be labeled and required that drug producers submit security information and an announcement of “meant use” to the FDA. This helped the US largely keep away from the thalidomide tragedy within the late Fifties in Europe, the place a tranquilizer was prescribed to pregnant girls to deal with anxiousness, bother sleeping, and morning illness—regardless of not ever being examined on pregnant girls. This induced the “largest anthropogenic medical catastrophe ever”, throughout which hundreds of ladies suffered miscarriages and greater than 10,000 infants had been born with extreme deformities.
Whereas the US largely averted this due to FDA reviewer warning, it additionally uncovered gaps within the system. The Kekauver-Harris Amendments to the Federal Meals, Drug, and Beauty Act in 1962 now required proof that medicine had been each protected and efficient. The elevated power of the FDA in 1938, and once more in 1962, pressured healthcare to standardize on the which means of phrases. Drug firms had been pressured to agree upon indications (what’s the drug meant for), circumstances (what does the drug deal with), hostile results (what different circumstances have been related to this drug) and medical outcomes. Elevated regulatory strain additionally required replicable, well-controlled research for all claims made a couple of drug. Regulation didn’t simply demand safer medicine; it demanded shared which means.
Observational information
These regulatory adjustments didn’t simply have an effect on approval processes; they basically reshaped how medical observations had been generated, structured, and in contrast. To make medical proof comparable, reviewable, and replicable, information requirements for medical trials turned codified by means of organizations just like the Medical Information Interchange Requirements Consortium (CDISC). CDISC defines how medical observations, endpoints, and populations should be represented for regulatory evaluation. Likewise, the FDA turned the shared terminologies cataloged in managed vocabularies from greatest observe to necessary.
Pre-competitive collaboration
One of many enabling elements that has led healthcare to dominate in information graphs is pre-competitive collaboration. Quite a lot of the work of healthcare is grounded in pure sciences like biology and chemistry which might be handled as a public good. Corporations nonetheless compete on merchandise, however most take into account a big portion of their analysis “pre-competitive.” Organizations just like the Pistoia Alliance facilitate this collaboration by offering impartial boards to align on shared semantics and infrastructure (see information requirements part beneath).
Public funding
Public funding has been important to constructing healthcare’s information infrastructure. Governments and public analysis establishments have invested closely within the creation and upkeep of ontologies, managed vocabularies, and large-scale observational information that no single firm might afford constructing alone. Businesses such because the Nationwide Institutes of Well being (NIH) fund many of those property as public items, leaving healthcare with a wealthy, open information base able to be linked and reasoned over utilizing information graphs.
Information requirements
Healthcare additionally embraced open information requirements early, making certain shared information might be represented and reused throughout techniques and distributors. Requirements from the World Broad Net Consortium (W3C) made medical information machine-readable and interoperable, permitting semantic fashions to be shared independently of any single system or vendor. By anchoring which means in open requirements quite than proprietary schemas, healthcare enabled information graphs to operate as shared, long-lived infrastructure quite than remoted implementations. Requirements ensured that which means might survive system upgrades, vendor adjustments, and many years of technological churn.
Conclusion
None of those elements alone explains healthcare’s maturity; it’s their interplay over many years—ontology shaping vocabularies, regulation imposing proof, funding sustaining shared infrastructure, and requirements enabling reuse—that made information graphs inevitable quite than non-compulsory. Lengthy earlier than trendy AI, healthcare invested in agreeing on what issues imply and the way observations must be interpreted. Within the ultimate a part of this collection, we’ll discover why most different industries lack these circumstances—and what they’ll realistically borrow from healthcare’s path.
Concerning the writer: Steve Hedden is the Head of Product Administration at TopQuadrant, the place he leads the technique for EDG, a platform for information graph and metadata administration. His work focuses on bridging enterprise information governance and AI by means of ontologies, taxonomies, and semantic applied sciences. Steve writes and speaks recurrently about information graphs, and the evolving position of semantics in AI techniques.
Bibliography
Hager, Thomas. Ten Medicine: How Vegetation, Powders, and Capsules Have Formed the Historical past of Drugs. Harry N. Abrams, 2019.
Isaacson, Walter. The Code Breaker: Jennifer Doudna, Gene Enhancing, and the Way forward for the Human Race. Simon & Schuster, 2021.
Kirsch, Donald R., and Ogi Ogas. The Drug Hunters: The Unbelievable Quest to Uncover New Medicines. Arcade, 2017.


