Keywords
Key points
- •Understanding complex biological processes in a cell is critical to improve the understanding of the biology of most diseases and will help to optimally intervene to mitigate the disease.
- •The state of the cells involved in the disease can be characterized using multimodal imaging and next-generation sequencing technologies.
- •Commonly, the phenotype information of the disease is in clinical records as unstructured text.
- •Machine learning is a technology that enables us to understand the relationship between the state of the cells and the phenotype of the disease.
- •In this review, the authors introduce common machine learning paradigms and illustrate how they can be used for the progress in biology and medicine.
Introduction
Machine learning paradigm
Supervised Learning
Unsupervised Learning
Reinforcement Learning
Text mining for inference from literature
Deep learning
Lifelong learning
Caveats in the application of machine learning
Summary
Learning Category | ML Method | Characteristics | Scope of Applications | References |
---|---|---|---|---|
Supervised learning | K-Nearest Neighbor | Does not involve training (lazy learning) | Image classification, predicting the molecular subtype of cancers | Li et al [74] , 2012 |
Naïve Bayes | Assumes features are independent (conditioned on class membership) | Cancer type predictions | Banu & Thirumalaikolundusubramanian [75] , 2018 | |
Decision Trees | Provides interpretable rules of classification | Prognostic markers, predictive markers | Geurts et al [76] , 2009 | |
Support Vector Machines | Linear and nonlinear classification. Maximum-margin criteria provide robust generalization ability | Prognostic markers, predictive markers | Wu et al [77] , 2017 | |
Neural Networks | Capable of learning complex problems with little fine-tuning, computationally intensive; tends to be difficult to interpret | Cancer risk prediction, identifying new chemotypes | Mueller et al [78] , 2012 | |
Deep Learning | Essentially, neural networks with many hidden layers | Ideal for computer vision applications (pathology images) and text mining (EHR) | Angermueller et al [55] , 2016; Tang et al [79] , 2019; Webb [80] , 2018 | |
Unsupervised learning | K-means clustering | Partitions the observations in predefined k clusters in which each observation belongs to the cluster with the nearest mean | RNA-seq analysis, sequence clustering, image cytometry | Nugent & Meila [81] , 2010 |
Hierarchical clustering | Provides hierarchical organization of samples and clusters enable better visualization of the structure in the data | Very popular in biological domain because of the excellent visualization capabilities | Ronan et al [82] , 2016 | |
Spectral clustering | Global and local structure of the similarities among samples determines the clustering | Single-cell RNA-sequencing data analysis | Zheng et al [14] , 2019; Kiselev et al [42] , 2017 | |
RL | Q-learning | Finds a policy that maximizes the expected value of the total reward over all successive steps starting from the current state | Behavioral ecology | Frankenhuis et al [83] , 2019 |
Temporal difference | Learn by bootstrapping from the current estimate of the value function | Models for learning in biological systems | Neftci & Averbeck [84] , 2019 | |
Others | Generative Adversarial Networks | Two neural networks contest with each other to generate new data with the same statistics as the training set | Understanding the organization of biological systems. generating realistic datasets | Wang et al [85] , 2018 |
Text mining | A set of techniques that models and structures the information content of textual sources for obtaining information | Extracting information for EHR | Ohno-Machado et al [86] , 2013 |
References
- Molecular mechanisms of muscular dystrophies: old and new players.Nat Rev Mol Cell Biol. 2006; 7: 762-773
- Regulation of cardiac hypertrophy by intracellular signalling pathways.Nat Rev Mol Cell Biol. 2006; 7: 589-600
- New insights into cystic fibrosis: molecular switches that regulate CFTR.Nat Rev Mol Cell Biol. 2006; 7: 426-436
- Werner and Hutchinson-Gilford progeria syndromes: mechanistic basis of human progeroid diseases.Nat Rev Mol Cell Biol. 2007; 8: 394-404
- Molecular and metabolic mechanisms of insulin resistance and β-cell failure in type 2 diabetes.Nat Rev Mol Cell Biol. 2008; 9: 193-205
- Molecular mechanisms of the preventable causes of cancer in the United States.Genes Dev. 2018; 32: 868-902
- NMR methods to dissect the molecular mechanisms of disease-related mutations (DRMs): understanding how DRMs remodel functional free energy landscapes.Methods. 2018; 148: 19-27
- Challenges of identifying clinically actionable genetic variants for precision medicine.J Healthc Eng. 2016; 2016
- A new initiative on precision medicine.N Engl J Med. 2015; 372: 793-795
- Decoding neuroproteomics: integrating the genome, translatome and functional anatomy.Nat Neurosci. 2014; 17: 1491-1499
- A collaborative approach to developing an electronic health record phenotyping algorithm for drug-induced liver injury.J Am Med Inform Assoc. 2013; 20: e243-e252
- The Steele R, Nigam N, Electronic Medical Records and Genomics (eMERGE) Network: past, present, and future.Genet Med. 2013; 15: 761-771
- Design and anticipated outcomes of the eMERGE-PGx project: a multicenter pilot for preemptive pharmacogenomics in electronic health record systems.Clin Pharmacol Ther. 2014; 96: 482-489
- SinNLRR: a robust subspace clustering method for cell type detection by nonnegative and low rank representation.Bioinformatics. 2019; ([pii:btz139])
- Automated diagnosis of plus disease in retinopathy of prematurity using deep convolutional neural networks.JAMA Ophthalmol. 2018; 136: 803-810
- Man against machine: diagnostic performance of a deep learning convolutional neural network for dermoscopic melanoma recognition in comparison to 58 dermatologists.Ann Oncol. 2018; 29: 1836-1842
- Using smartphones and machine learning to quantify Parkinson disease severity: the mobile Parkinson disease score.JAMA Neurol. 2018; 75: 876-880
- Predicting readmission risk shortly after admission for CABG surgery.J Card Surg. 2018; 33: 163-170
- Prognostic value of an RNA expression signature derived from cell cycle proliferation genes in patients with prostate cancer: a retrospective study.Lancet Oncol. 2011; 12: 245-255
- Do we need hundreds of classifiers to solve real world classification problems?.J Mach Learn Res. 2014; 15: 3133-3181
- No free lunch theorems for optimization.IEEE Trans Evol Comput. 1997; 1: 67-82
- Prediction error estimation: a comparison of resampling methods.Bioinformatics. 2005; 21: 3301-3307
- Bias in error estimation when using cross-validation for model selection.BMC Bioinformatics. 2006; 7: 91
- On over-fitting in model selection and subsequent selection bias in performance evaluation.J Mach Learn Res. 2010; 11: 2079-2107
- How to improve reliability and efficiency of research about molecular markers: roles of phases, guidelines, and study design.J Clin Epidemiol. 2007; 60: 1205-1219
- Judging new markers by their ability to improve predictive accuracy.J Natl Cancer Inst. 2003; 95: 634-635
- A predictive model of rectal tumor response to preoperative radiotherapy using classification and regression tree methods.Clin Cancer Res. 2005; 11: 5440-5443
- An artificial neural network integrated pipeline for biomarker discovery using Alzheimer's disease as a case study.Comput Struct Biotechnol J. 2018; 16: 77-87
- An overview of the use of artificial neural networks in lung cancer research.J Thorac Dis. 2017; 9: 924-931
- 70-gene signature as an aid to treatment decisions in early-stage breast cancer.N Engl J Med. 2016; 375: 717-729
- How does gene expression clustering work?.Nat Biotechnol. 2005; 23: 1499-1501
- What are the true clusters?.Pattern Recognit Lett. 2015; 64: 53-62
- Algorithms for association rule mining—a general survey and comparison.SIGKDD Explor. 2000; 2: 58-64
- Association rules analysis of comorbidity and multimorbidity: the :concord health and aging in men project.J Gerontol A Biol Sci Med Sci. 2016; 71: 625-631
- Molecular portraits of human breast tumours.Nature. 2000; 406: 747-752
- Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications.Proc Natl Acad Sci U S A. 2001; 98: 10869-10874
- Breast cancer intrinsic subtype classification, clinical use and future trends.Am J Cancer Res. 2015; 5: 2929-2943
- Comprehensive genomic characterization defines human glioblastoma genes and core pathways.Nature. 2008; 455: 1061-1068
- Novel molecular subtypes of serous and endometrioid ovarian cancer linked to clinical outcome.Clin Cancer Res. 2008; 14: 5198-5208
- Comprehensive molecular characterization of human colon and rectal cancer.Nature. 2012; 487: 330-337
- Comprehensive genomic characterization of squamous cell lung cancers.Nature. 2012; 489: 519-525
- SC3: consensus clustering of single-cell RNA-seq data.Nat Methods. 2017; 14: 483-486
- Identification of cell types from single-cell transcriptomes using a novel clustering method.Bioinformatics. 2015; 31: 1974-1980
- Integrating single-cell transcriptomic data across different conditions, technologies, and species.Nat Biotechnol. 2018; 36: 411-420
- Combining kernel and model based learning for HIV therapy selection.AMIA Jt Summits Transl Sci Proc. 2017; 2017: 239-248
- The artificial intelligence clinician learns optimal treatment strategies for sepsis in intensive care.Nat Med. 2018; 24: 1716-1720
- Evaluating reinforcement learning algorithms in observational health settings.arxiv. 2018;
- Deep reinforcement learning for dynamic treatment regimes on medical registry data.Healthc Inform. 2017; 2017: 380-385
- Utility of the JAX Clinical Knowledgebase in capture and assessment of complex genomic cancer data.NPJ Precis Oncol. 2019; 3: 2
- Integrating electronic health record genotype and phenotype datasets to transform patient care.Clin Pharmacol Ther. 2016; 99: 298-305
- Machine learning in automated text categorization.ACM Comput Surv. 2002; 34: 1-47
- Contralateral breast cancer event detection using nature language processing.AMIA Annu Symp Proc. 2018; 2017: 1885-1892
- DeepPhe: a natural language processing system for extracting cancer phenotypes from clinical records.Cancer Res. 2017; 77: e115-e118
- Cancer Hallmarks Analytics Tool (CHAT): a text mining approach to organize and evaluate scientific literature on cancer.Bioinformatics. 2017; 33: 3973-3981
- Deep learning for computational biology.Mol Syst Biol. 2016; 12: 878
- Deep learning.Nature. 2015; 521: 436-444
- Challenges in representation learning: a report on three machine learning contests.Neural Netw. 2015; 64: 59-63
- Editorial introduction to the neural networks special issue on deep learning of representations.Neural Netw. 2015; 64: 1-3
- Backpropagation applied to handwritten zip code recognition.Neural Comput. 1989; 1: 541-551
He K, Zhang X, Ren S, et al. Deep Residual Learning for Image Recognition. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
- Artificial intelligence and digital pathology: challenges and opportunities.J Pathol Inform. 2018; 9: 38
- Deep convolutional neural networks for accurate somatic mutation detection.Nat Commun. 2019; 10: 1041
- A universal SNP and small-indel variant caller using deep neural networks.Nat Biotechnol. 2018; 36: 983-987
- Learning from data.2012
- Understanding neural networks through deep visualization.arxiv.org, 2015
- DeepSynergy: predicting anti-cancer drug synergy with deep learning.Bioinformatics. 2018; 34: 1538-1546
- Dermatologist-level classification of skin cancer with deep neural networks.Nature. 2017; 542: 115-118
- Deep learning enables robust assessment and selection of human blastocysts after in vitro fertilization.NPJ Digit Med. 2019; 2
- Continual lifelong learning with neural networks: a review.Arxiv. 2019;
- How we analyzed the COMPAS recidivism algorithm.2016 (Available at:)
- Gender bias in the diagnosis and treatment of coronary artery disease.Heart Lung. 1995; 24: 427-435
- Why is my classifier discriminatory? Arxiv 2018.Advances in Neural Information Processing Systems 31, 2018: 3543-3554
Amini A, Soleimany A, Schwarting W, et al. Uncovering and mitigating algorithmic bias through learned latent structure. In AAAI/ACM Conference on Artificial Intelligence, Ethics and Society. 2019. Honolulu, Hawaii.
- Using the K-nearest neighbor algorithm for the classification of lymph node metastasis in gastric cancer.Comput Math Methods Med. 2012; 2012: 876545
- Comparison of Bayes classifiers for breast cancer classification.Asian Pac J Cancer Prev. 2018; 19: 2917-2920
- Supervised learning with decision tree-based methods in computational and systems biology.Mol Biosyst. 2009; 5: 1593-1605
- A pathways-based prediction model for classifying breast cancer subtypes.Oncotarget. 2017; 8: 58809-58822
- Discovery of 2-(2-benzoxazoyl amino)-4-aryl-5-cyanopyrimidine as negative allosteric modulators (NAMs) of metabotropic glutamate receptor 5 (mGlu(5)): from an artificial neural network virtual screen to an in vivo tool compound.ChemMedChem. 2012; 7: 406-414
- Recent advances of deep learning in bioinformatics and computational biology.Front Genet. 2019; 10: 214
- Deep learning for biology.Nature. 2018; 554: 555-557
- An overview of clustering applied to molecular biology.Methods Mol Biol. 2010; 620: 369-404
- Avoiding common pitfalls when clustering biological data.Sci Signal. 2016; 9: re6
- Enriching behavioral ecology with reinforcement learning methods.Behav Processes. 2019; 161: 94-100
- Reinforcement learning in artificial and biological systems.Nat Mach Intell. 2019; 1: 133-143
- Conditional generative adversarial network for gene expression inference.Bioinformatics. 2018; 34: 1603-1611
- Natural language processing: algorithms and tools to extract computable information from EHRs and from the biomedical literature.J Am Med Inform Assoc. 2013; 20: 805
Article info
Publication history
Footnotes
Disclosure Statement: The work was partially supported by the National Cancer Institute of the National Institutes of Health under Award Number P30CA034196.
Identification
Copyright
User license
Creative Commons Attribution – NonCommercial – NoDerivs (CC BY-NC-ND 4.0) |
Permitted
For non-commercial purposes:
- Read, print & download
- Redistribute or republish the final article
- Text & data mine
- Translate the article (private use only, not for distribution)
- Reuse portions or extracts from the article in other works
Not Permitted
- Sell or re-use for commercial purposes
- Distribute translations or adaptations of the article
Elsevier's open access license policy