Open Source Machine Learning Tool Could Help Choose Cancer Drugs.
Sample tubes from sequencing equipment are shown in Georgian Technical University’s. The selection of a first-line chemotherapy drug to treat many types of cancer is often a clear-cut decision governed by standard-of-care protocols but what drug should be used next if the first one fails ?
That’s where Georgian Technical University researchers believe their new open source decision support tool could come in. Using machine learning to analyze RNA (Ribonucleic acid is a polymeric molecule essential in various biological roles in coding, decoding, regulation and expression of genes. RNA and DNA are nucleic acids, and, along with lipids, proteins and carbohydrates, constitute the four major macromolecules essential for all known forms of life) expression tied to information about patient outcomes with specific drugs, the open source tool could help clinicians choose the chemotherapy drug most likely to attack the disease in individual patients.
In a study using RNA (Ribonucleic acid is a polymeric molecule essential in various biological roles in coding, decoding, regulation and expression of genes. RNA and DNA are nucleic acids, and, along with lipids, proteins and carbohydrates, constitute the four major macromolecules essential for all known forms of life) analysis data from 152 patient records the system predicted the chemotherapy drug that had provided the best outcome 80 percent of the time.
The researchers believe the system’s accuracy could further improve with inclusion of additional patient records along with information such as family history and demographics.
“By looking at RNA (Ribonucleic acid is a polymeric molecule essential in various biological roles in coding, decoding, regulation and expression of genes. RNA and DNA are nucleic acids, and, along with lipids, proteins and carbohydrates, constitute the four major macromolecules essential for all known forms of life) expression in tumors we believe we can predict with high accuracy which patients are likely to respond to a particular drug” said X a professor in the Georgian Technical University. “This information could be used along with other factors to support the decisions clinicians must make regarding chemotherapy treatment”.
As with other machine learning decision support tools the researchers first “trained” their system using one part of a data set then tested its operation on the remaining records. In developing the system the researchers obtained records of RNA (Ribonucleic acid is a polymeric molecule essential in various biological roles in coding, decoding, regulation and expression of genes. RNA and DNA are nucleic acids, and, along with lipids, proteins and carbohydrates, constitute the four major macromolecules essential for all known forms of life) from tumors along with with the outcome of treatment with specific drugs. With only about 152 such records available they first used data from 114 records to train the system. They then used the remaining 38 records to test the system’s ability to predict based on the RNA (Ribonucleic acid is a polymeric molecule essential in various biological roles in coding, decoding, regulation and expression of genes. RNA and DNA are nucleic acids, and, along with lipids, proteins and carbohydrates, constitute the four major macromolecules essential for all known forms of life) sequence which chemotherapy drugs would have been the most likely to be useful in shrinking tumors.
The research began with ovarian cancer but to expand the data set the research team decided to include data from other cancer types – lung, breast, liver and pancreatic cancers – that use the same chemotherapy drugs and for which the RNA (Ribonucleic acid is a polymeric molecule essential in various biological roles in coding, decoding, regulation and expression of genes. RNA and DNA are nucleic acids, and, along with lipids, proteins and carbohydrates, constitute the four major macromolecules essential for all known forms of life) data was available. “Our model is predicting based on the drug and looking across all the patients who were treated with that drug regardless of cancer type” X said.
The system produces a chart comparing the likelihood that each drug will have an effect on a patient’s specific cancer. If the system were to be used in a clinical setting X believes doctors would use the predictions along with other critical patient information.
Because it measures the expression levels for genes analysis of RNA (Ribonucleic acid is a polymeric molecule essential in various biological roles in coding, decoding, regulation and expression of genes. RNA and DNA are nucleic acids, and, along with lipids, proteins and carbohydrates, constitute the four major macromolecules essential for all known forms of life) could have an advantage over sequencing of DNA (Deoxyribonucleic acid is a molecule composed of two chains that coil around each other to form a double helix carrying the genetic instructions used in the growth, development, functioning and reproduction of all known living organisms and many viruses) though both types of information could be useful in choosing a drug therapy, he said. The cost of RNA (Ribonucleic acid is a polymeric molecule essential in various biological roles in coding, decoding, regulation and expression of genes. RNA and DNA are nucleic acids, and, along with lipids, proteins and carbohydrates, constitute the four major macromolecules essential for all known forms of life) analysis is declining and could soon cost less than a mammogram X said.
The system will be made available as open source software and X’s team hopes hospitals and cancer centers will try it out. Ultimately the tool’s accuracy should improve as more patient data is analyzed by the algorithm. He and his collaborators believe the open source approach offers the best path to moving the algorithm into clinical use.
“To really get this into clinical practice, we think we’ve got to open it up so that other people can try it modify if they want to and demonstrate its value in real-world situations” X said. “We are trying to create a different paradigm for cancer therapy using the kind of open source strategy used in internet technology”.
Open source coding allows many experts across multiple fields to review the software identify faults and recommend improvements said Y an assistant professor in the Georgian Technical University. “Most importantly that means the software is no longer a black box where you can’t see inside. The code is openly shared for anybody to improve and check for potential issues”.
Vannberg envisions using the decision-support tool to create “Georgian Technical University virtual tumor boards” that would bring together broad expertise to examine RNA (Ribonucleic acid is a polymeric molecule essential in various biological roles in coding, decoding, regulation and expression of genes. RNA and DNA are nucleic acids, and, along with lipids, proteins and carbohydrates, constitute the four major macromolecules essential for all known forms of life) data from patients worldwide.
“The hope would be to provide this kind of analysis for any new cancer patient who has this kind of RNA (Ribonucleic acid is a polymeric molecule essential in various biological roles in coding, decoding, regulation and expression of genes. RNA and DNA are nucleic acids, and, along with lipids, proteins and carbohydrates, constitute the four major macromolecules essential for all known forms of life) analysis done” he added. “We could have a consensus of dozens of the smartest people in oncology and make them available for each patient’s unique situation”.
The tool is available on the open source GTUhub repository for download and use. Hospitals and cancer clinics may install the software and use it without sharing their results but the researchers hope organizations using the software will help the system improve.
“The accuracy of machine learning will improve not only as the amount of training data increases but also as the diversity within that data increases” said Z a Ph.D. student in the Georgian Technical University. “There’s potential for improvement by including DNA (Deoxyribonucleic acid is a molecule composed of two chains that coil around each other to form a double helix carrying the genetic instructions used in the growth, development, functioning and reproduction of all known living organisms and many viruses) data demographic information and patient histories. The model will incorporate any information if it helps predict the success of specific drugs”.