Predicting Cancer Risk with Digital Health Records

May, 2024

Every day we leave traces of our lives in digital systems. From online searches and location tracking to medical records and insurance claims, vast amounts of personal data are being collected and stored. While privacy concerns abound, this digitization of daily life also presents new opportunities for medical research. Scientists are now tapping into nationwide databases of electronic health records to gain insights that were previously impossible. One promising area is cancer risk prediction, which could help improve early detection and targeted screening. 
Cancer remains a leading cause of death worldwide. While screening and early detection have reduced mortality for some common cancers like breast and colon, many others still go unnoticed until they have reached an advanced stage. New liquid biopsy tests that can potentially detect multiple cancer types from a simple blood draw promise a more convenient screening approach. However, these multi-cancer screening tests also need to be implemented carefully for maximum benefit. Non-selective population screening could result in many false positive results and unnecessary invasive follow-ups. A more targeted approach based on individual cancer risk could make screening programs more cost-effective.
That’s where digital health data comes in. As more medical information is digitized, entire population health histories are accumulating in national databases. Researchers are now mining these vast troves of routine care data to better understand patterns of disease and develop predictive models. A recent large-scale study from researchers in Denmark and Germany took this approach to cancer risk prediction. They built models using national registry data covering over 6.7 million Danes and their lifetime medical histories dating back decades. The results suggest digital health records hold promise for personalized cancer screening based on individual risk profiles.
Data Power
The researchers harnessed five Danish health databases containing information on hospital visits, diagnoses, deaths, cancers, and free-text medical records from secondary care. Combined, these covered 60 million hospital visits, 90 million diagnoses, and 193 million life-years of follow-up for the Danish population from 1978 to 2018.
From this wealth of real-world data, the team distilled over 1,300 variables for each individual, including diagnoses, family cancer histories, and text-mined data on lifestyle factors. They then used statistical modeling techniques to determine how these different health and personal factors interacted and impacted risks for 20 major types of cancer. Crucially, the models were trained on registry information collected up to 2014 and validated on cancer incidence in subsequent years, allowing prospective prediction.
The results demonstrated these digital health databases can provide a wealth of insights into cancer risks. The prediction models achieved good discrimination, meaning they were generally accurate at distinguishing those who did and did not develop cancer. Performance was comparable to existing models designed for individual cancer types. Risks were associated not just with family history and known risk factors, but also patterns of previous diagnoses, highlighting disease interconnections.
Transferring Risks 
Validating cancer predictions across different healthcare systems and populations is an important test. To examine if the Danish risk profiles could transfer internationally, the researchers evaluated their models on genetic and health data from the UK Biobank covering over 377,000 individuals.
Remarkably, the cancer risk predictions generalized well between the two countries despite differences in healthcare and population characteristics. Discrimination remained high and calibration – how closely predicted risks matched actual rates – was similar after controlling for demographic shifts. This suggests digital health records contain transferable risk information beyond any single system. With appropriate validations, models built from one population could potentially be applied to new settings.
A key advantage is that the nationwide electronic datasets allowed quantifying cancer risks at a population scale without relying on self-reported or selectively collected information. The top influencing factors identified – such as alcohol use, reproductive history, height and weight – align well with established cancer risk knowledge. Being data-driven, this approach also surfaces unexpected links worth following up, like the role of immune-related conditions.
Improving Screening
While further validation is still needed, these digital risk prediction models could eventually support targeted cancer screening approaches. As multi-cancer blood tests move from research into real-world use, selectively applying them based on personal risk profiles could maximize their benefits. High-risk individuals could be screened more frequently or at younger ages, while low-risk groups may need less frequent testing to balance costs and patient burden.
The models may also augment existing screening programs. For cancers with established screening like breast and colon, risk scores could help guide which individuals might benefit most from starting screening earlier or getting tested more often. And for cancers currently not covered, like pancreatic or ovarian, these models may eventually help prioritize who to initially offer new screening modalities to.
Of course, such “precision screening” comes with challenges to address. Ensuring equitable access across populations will be important as not all groups are equally represented in routine health data. And risk predictions are not definitive diagnoses – false positives must still be carefully managed. With continuous model improvements over time, electronic health records show promise as a non-invasive source of “real-world” risk intelligence to supplement genetic and lifestyle information. As digitization transforms healthcare, big data approaches could personalize cancer screening in ways not previously possible.


  1. Alexander W Jung, Peter C Holm, Kumar Gaurav, Jessica Xin Hjaltelin, Davide Placido, Laust Hvas Mortensen, Ewan Birney, S⊘ren Brunak, Moritz Gerstung. Multi-cancer risk stratification based on national health data: a retrospective modelling and validation studyThe Lancet Digital Health, 2024; 6 (6): e396 DOI: 10.1016/S2589-7500(24)00062-1


Click TAGS to see related articles :


About the Author

  • Dilruwan Herath

    Dilruwan Herath is a British infectious disease physician and pharmaceutical medical executive with over 25 years of experience. As a doctor, he specialized in infectious diseases and immunology, developing a resolute focus on public health impact. Throughout his career, Dr. Herath has held several senior medical leadership roles in large global pharmaceutical companies, leading transformative clinical changes and ensuring access to innovative medicines. Currently, he serves as an expert member for the Faculty of Pharmaceutical Medicine on it Infectious Disease Commitee and continues advising life sciences companies. When not practicing medicine, Dr. Herath enjoys painting landscapes, motorsports, computer programming, and spending time with his young family. He maintains an avid interest in science and technology. He is a founder of DarkDrug

Pin It on Pinterest