var _gaq = _gaq || []; _gaq.push(['_setAccount', 'UA-21462253-7']); _gaq.push(['_trackPageview']); (function() { var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true; ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + ''; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s); })();

Bailey College of Science and Mathematics

Enhancing lives through learning, discovery and innovation

Undergraduate Research Magazine

Website Update

Fighting Cancer With Computation

Cal Poly research groupFrom left: computer sciences Professor Paul Anderson, biological sciences students Belle Aduaka and McClain Kressman, and biological sciences Professor Jean Davidson. photos by nick wilson


APRIL 2023
BY nick wilson

The evolving field of knowledge graphs coupled with advances in the computation of biological data could play an important role in improving diagnostic care for breast cancer patients, according to multidisciplinary Cal Poly research.

With more than 2.3 million people worldwide affected by breast cancer every year and 600,000 deaths annually, improved treatment methods could have widespread benefits.

A Cal Poly research paper published in June of 2022 studied the performance of state-of-the-art knowledge graph pipelines in biomedical research. “Comparisons of Knowledge Graphs and Entity Extraction in Breast Cancer Subtyping Biomedical Text Analysis” was coauthored by biological sciences Professor Jean Davidson, computer sciences Professor Paul Anderson, six undergraduates and one graduate student.

The research involves the development of a computational tool that examines gene expression to determine breast cancer subtypes, giving physicians a better picture of clinical outcomes and drug treatment responses.

The group also employed knowledge graphs, which use graph-structured models to integrate datasets. Researchers culled information from scholarly scientific articles and a dataset of 20,000 genes from 1,989 primary breast tumor samples and 144 normal breast tissue samples to analyze potential modeling graphs.

Two additional research articles by Davidson, Anderson and their student teams also examined big data analysis of predictors of breast cancer subtyping.

“Reducing variability of breast cancer subtype predictors by grounding deep learning models in prior knowledge” was published in the November 2021 edition of Elsevier’s Computers in Biology and Medicine. The second, “PUMP: An Underspecification Analysis Tool” appeared in the February 2023 edition of the journal Bioinformatics.Breast cancer subtypes are defined by gene expression profiles, cellular histology (microanatomy), and tissue of origin. Genes control how the cancer cells behave.

The possibility for identifying gene expression and breast cancer subtypes more quickly and cost effectively is potentially groundbreaking.  

“Once you know about the behavior of these cells, you know a lot more about the treatments that might work for them,” Davidson said.“To create this type of dataset analysis from scratch you would need a large team of people.”

Currently, to get to the subtype classification from the cell behavior is prohibitively expensive and requires highly trained medical and bioinformatics professionals — putting costs out of reach for most patients.

“We would like it to be like $1,000 and computation that we really believe in,” Davidson said. “It’s slowly happening, and the march is on. The more we can get papers out, it’s going to happen.”

Student coauthors involved in the research included Belle Aduaka, a microbiology major of Pleasanton, California; McClain Kressman, a biological sciences major of Santa Cruz, California; Griffith “Grif” Hawblitzel (Engineering, ’22) from Bellevue, Washington; Andrew Doud, a computer science major of Belmont, California; Harsha Lakshmankumar, biological sciences major of Pleasanton, California; Ella Thomas, microbiology major of Sunnyvale, California; Paul Kim, a biological sciences major of Irvine, California; and Ava Jakusovszky, a computer science major of Colorado Springs, Colorado.

Cal Poly research groupThe research group discusses their breast cancer subtyping research project.

“The dream is to identify patterns without us interacting with the computer program to get accurate results,” said Kressman, who co-presented research at a conference in Spain in 2021. “The idea is for the classifier to iterate through numerous possibilities and identify the patterns that are responsible for the true biological patterns behind the distinct subtypes.”

Breast cancer is a remarkably heterogeneous disease, with a variety of molecular subtypes with different cell characteristics and a wide range of prognostic outcomes.

Subtypes such as Luminal A have much higher survivability rates than others, such as triple-negative (ER- and PR-/HER2-), which have among the lowest survivability rates. The team is working to program algorithms that integrate massive datasets for its diagnostic modeling. 

“It is clinically important to identify the subtype as early as possible in order to identify the optimal treatment to target the specific cellular characteristics of each subtype,” Kressman said.

The Cal Poly study assesses how graphical representation can make any sense of the seemingly chaotic tangles of molecules and interactions.

Under current testing methods, a breast cancer tumor biopsy sample is sent to a histologist (who turn tissue samples into microscopic slide) and a pathologist; typically several medical specialists are involved in identifying the subtype. 

“What we’re proposing is that instead you send it for a targeted transcriptome (a collection of all the gene readouts present in a cell) say looking at 1,000 genes, along with maybe some interesting metadata about that patient, like they’re over 45 and postmenopausal, and it will say it’s pretty darn sure ‘This is Luminal A. Here are some things to check to validate that,’” Davidson said. “That’s our big dream.”

Anderson added: “Our job is to try to make the job of an oncologist easier. The oncologist would be very happy to know the subtype quickly. They will know the clinical route, and the data informs their decision.”

Anderson said that fine-tuning the computational tool is analogous to how adjustments are made for programming a self-driving car, which needs to be able to identify objects in the roadway or discoloration on a stop sign that a human driver would easily distinguish. 

Because of general fear of glitches in artificial intelligence designed for patient care, physicians have been reluctant to use AI methods “to diagnose the patient because of this problem.”

But artificial intelligence has been used successfully to identify precancerous lesions in lung cancer, even better than trained pathologists because of a computer’s ability to factor in millions of images of lesions to spot similar characteristics.

“Our mission is to push that boundary (with computational breast cancer subtyping) closer to reality,” Anderson said.

Knowledge graphs often are used in other fields with comparably limited application thus far in biomedical fields.

“I’ve learned a lot about gene expression data and how it comes together to make a story about a patient,” Aduaka said. “If you look at all of the points together, it is really fascinating.”

Aduaka said her role included helping to direct her student programming partners on what biomedical information to include for computational modeling, a collaborative process. 

Research publication link“We had to learn to communicate with each other to explain biology and computer science terms,” Aduaka said. “Sometimes we’d have no idea about each other’s explanations, because the terms are unfamiliar, so we learned to improve how we explain some of the concepts to each other.”

Aduaka added, “Learning to communicate with people that aren’t in your field is a really important thing to do because in the workforce you’re not always surrounded by people that are your specialty field."

Jean Davidson,
Paul Anderson,


Related Content

Undergraduate Research Magazine 2024

Research Magazine 2024

Read Here

Undergraduate Research Magazine - 2023

Read Here

DEI in the Bailey College

Bailey College DEI IDEAS gears graphic

Learn more here

Support Learn By Doing in the Bailey College

Support Learn by Doing in the Bailey College

Support Learn by Doing