Abstract
Detecting the patterns of DNA sequence variants across the human genome is a crucial step for unraveling the genetic basis of complex human diseases. The human HapMap constructed by single nucleotide polymorphisms (SNPs) provides efficient sequence variation information that can speed up the discovery of genes related to common diseases. In this article, we present a generalized linear model for identifying specific nucleotide variants that encode complex human diseases. A novel approach is derived to group haplotypes to form composite diplotypes, which largely reduces the model degrees of freedom for an association test and hence increases the power when multiple SNP markers are involved. An efficient two-stage estimation procedure based on the expectation-maximization (EM) algorithm is derived to estimate parameters. Non-genetic environmental or clinical risk factors can also be fitted into the model. Computer simulations show that our model has reasonable power and type I error rate with appropriate sample size. It is also suggested through simulations that a balanced design with approximately equal number of cases and controls should be preferred to maintain small estimation bias and reasonable testing power. To illustrate the utility, we apply the method to a genetic association study of large for gestational age (LGA) neonates. The model provides a powerful tool for elucidating the genetic basis of complex binary diseases.
Keywords: Nucleotide sequence, complex disease, EM algorithm, logistic regression, haplotype
Current Genomics
Title: Mapping Nucleotide Sequences that Encode Complex Binary Disease Traits with HapMap
Volume: 8 Issue: 5
Author(s): Yuehua Cui, Wenjiang Fu, Kelian Sun, Roberto Romero and Rongling Wu
Affiliation:
Keywords: Nucleotide sequence, complex disease, EM algorithm, logistic regression, haplotype
Abstract: Detecting the patterns of DNA sequence variants across the human genome is a crucial step for unraveling the genetic basis of complex human diseases. The human HapMap constructed by single nucleotide polymorphisms (SNPs) provides efficient sequence variation information that can speed up the discovery of genes related to common diseases. In this article, we present a generalized linear model for identifying specific nucleotide variants that encode complex human diseases. A novel approach is derived to group haplotypes to form composite diplotypes, which largely reduces the model degrees of freedom for an association test and hence increases the power when multiple SNP markers are involved. An efficient two-stage estimation procedure based on the expectation-maximization (EM) algorithm is derived to estimate parameters. Non-genetic environmental or clinical risk factors can also be fitted into the model. Computer simulations show that our model has reasonable power and type I error rate with appropriate sample size. It is also suggested through simulations that a balanced design with approximately equal number of cases and controls should be preferred to maintain small estimation bias and reasonable testing power. To illustrate the utility, we apply the method to a genetic association study of large for gestational age (LGA) neonates. The model provides a powerful tool for elucidating the genetic basis of complex binary diseases.
Export Options
About this article
Cite this article as:
Cui Yuehua, Fu Wenjiang, Sun Kelian, Romero Roberto and Wu Rongling, Mapping Nucleotide Sequences that Encode Complex Binary Disease Traits with HapMap, Current Genomics 2007; 8 (5) . https://dx.doi.org/10.2174/138920207782446188
DOI https://dx.doi.org/10.2174/138920207782446188 |
Print ISSN 1389-2029 |
Publisher Name Bentham Science Publisher |
Online ISSN 1875-5488 |
Call for Papers in Thematic Issues
Advanced AI Techniques in Big Genomic Data Analysis
The thematic issue on "Advanced AI Techniques in Big Genomic Data Analysis" aims to explore the cutting-edge methodologies and applications of artificial intelligence (AI) in the realm of genomic research, where vast amounts of data pose both challenges and opportunities. This issue will cover a broad spectrum of AI-driven strategies, ...read more
Current Genomics in Cardiovascular Research
Cardiovascular diseases are the main cause of death in the world, in recent years we have had important advances in the interaction between cardiovascular disease and genomics. In this Research Topic, we intend for researchers to present their results with a focus on basic, translational and clinical investigations associated with ...read more
Genomic Insights into Oncology: Harnessing Machine Learning for Breakthroughs in Cancer Genomics.
This special issue aims to explore the cutting-edge intersection of genomics and oncology, with a strong emphasis on original data and experimental validation. While maintaining the focus on how machine learning and advanced data analysis techniques are revolutionizing our understanding and treatment of cancer, this issue will prioritize contributions that ...read more
Integrating Artificial Intelligence and Omics Approaches in Complex Diseases
Recent advancements in AI and omics methodologies have revolutionized the landscape of biomedical research, enabling us to extract valuable information from vast amounts of complex data. By combining AI algorithms with omics technologies such as genomics, proteomics, metabolomics, and transcriptomics, researchers can obtain a more comprehensive and multi-dimensional analysis of ...read more
Related Journals
- Author Guidelines
- Graphical Abstracts
- Fabricating and Stating False Information
- Research Misconduct
- Post Publication Discussions and Corrections
- Publishing Ethics and Rectitude
- Increase Visibility of Your Article
- Archiving Policies
- Peer Review Workflow
- Order Your Article Before Print
- Promote Your Article
- Manuscript Transfer Facility
- Editorial Policies
- Allegations from Whistleblowers
- Announcements