New machine learning tool detects mutational signature linking bladder cancer to tobacco smoking

Researchers at the University of California San Diego have for the first time discovered a pattern of DNA mutations that link bladder cancer to tobacco smoking. The discovery is made possible by a powerful new machine learning tool the team has developed to find patterns of mutations caused by carcinogens and other DNA alterations.

The work was published on September 23 in cell genomicsIt could help researchers identify environmental factors, such as exposure to tobacco smoke and ultraviolet radiation, that cause cancer in certain patients.

Each of these environmental exposures alters DNA in a unique way, generating a specific pattern of mutation, called a mutational signature. If a signature is found in the DNA of a patient’s cancer cells, the cancer can be traced back to the exposure that created that signature. Knowing the mutational signatures present can also lead to more personalized therapies for a patient-specific cancer.

In this study, researchers found a mutational signature in bladder cancer DNA associated with tobacco smoking. This finding is significant because a mutational signature from tobacco smoking has been detected in lung cancer, but not yet in bladder cancer.

There is strong epidemiological evidence linking bladder cancer to tobacco smoking. We even see a specific mutational signature in other tissues – such as the mouth, esophagus and lungs – that are directly exposed to carcinogens from tobacco. The fact that we didn’t find this signature in the bladder was strange.”

Ludmil Alexandrov, first author of the study, Professor of Bioengineering and Cellular and Molecular Medicine, University of California, San Diego

Alexandrov and colleagues have now shown that there is a mutational signature of tobacco smoking in bladder cancer, which is different from the signature found in lung cancer. Moreover, they showed that this signature is also present in normal bladder tissues of tobacco smokers who did not develop bladder cancer. The signature was not found in the bladder tissues of non-smokers.

“What this signature tells us is that some of the mutations in your DNA are caused by exposure to tobacco smoke,” said study co-first author Marcos Diaz-Guy, a postdoctoral researcher in Alexandrov’s lab. “This does not necessarily mean that you have cancer. But the more you smoke, the more mutations you accumulate in your cells, and the higher your risk of cancer.”

Made possible by the next generation of machine learning

The researchers found the tobacco signature with a next-generation machine learning tool developed by the Alexandrov Lab. The team says it’s the most advanced automated bioinformatics tool for extracting mutational signatures directly from large amounts of genetic data.

“This is a powerful machine learning approach to identify and separate patterns of mutations from genetic data,” Alexandrov said. “He takes these patterns and decodes them, so we can see the mutational signatures and match them to their meaning.”

Compare the machine learning approach to choosing one-to-one conversations at a cocktail party.

“You have multiple groups of people talking all around you, and you are only interested in hearing certain individuals talk,” he said. “Our tool basically helps you do that, but with the genetic data for cancer. You have many people around the world who are exposed to different environmental mutations, and some of those exposures leave fingerprints on their genomes. This tool goes through all that data to pick out what processes are causing the mutations. “.

The tool was used to analyze 23,827 sequential human carcinomas. It found four mutational signatures – including one in bladder cancer associated with tobacco smoking – that were not detected by any other tool. The other three signatures, found in cancers of the stomach, colon and liver, still warrant further study to find out the processes that caused them.

To show how powerful their tool is, the researchers tested it against 13 existing bioinformatics tools. The tools were evaluated for their ability to extract mutational signatures from more than 80,000 synthetic cancer samples. The tool developed by Alexandrov’s team outperformed all others. Discover 20 to 50% of true positive signatures, with five times fewer false positive signatures. It even worked well when analyzing noisy data, while other tools failed.

“In bioinformatics, this is the first time that such a comprehensive measurement has been performed at this scale to extract the mutational signature,” Diaz-Jay said. “It’s a huge task, comparing many tools across many data sets.”

Alexandrov noted that this feat is also costly. “Thanks to funding from Cancer Research UK, we were able to perform this comprehensive technical evaluation, which is not commonly performed.”

Create a more user-friendly and personalized tool

The team’s ultimate goal is to create a web-based tool that more researchers can use and, as a result, identify more patients.

“Currently, this tool requires bioinformatics expertise to operate,” Alexandrov said. “What we want is to create a web-friendly version where researchers can just drop a patient’s mutations, and it immediately gives you the set of mutational signatures and the processes that caused them.”

“Our idea for the future is to take advantage of this tool to analyze patients at an individual level,” Diaz-Jay said.


Journal reference:

Ashiqul Islam, SM, et al. (2022) Detection of new mutational signatures by de novo extraction using SigProfilerExtractor. cell genomics.

Leave a Comment