Variant association tools for quality control and analysis of large-scale sequence and genotyping array data.

TitleVariant association tools for quality control and analysis of large-scale sequence and genotyping array data.
Publication TypeJournal Article
Year of Publication2014
AuthorsWang, GT, Peng, B, Leal, SM
JournalAm J Hum Genet
Volume94
Issue5
Pagination770-83
Date Published2014 May 01
ISSN1537-6605
KeywordsGenetic Association Studies, Genetic Variation, Genotyping Techniques, High-Throughput Nucleotide Sequencing, Humans, Multifactorial Inheritance, Quality Control, Software
Abstract

Currently there is great interest in detecting associations between complex traits and rare variants. In this report, we describe Variant Association Tools (VAT) and the VAT pipeline, which implements best practices for rare-variant association studies. Highlights of VAT include variant-site and call-level quality control (QC), summary statistics, phenotype- and genotype-based sample selection, variant annotation, selection of variants for association analysis, and a collection of rare-variant association methods for analyzing qualitative and quantitative traits. The association testing framework for VAT is regression based, which readily allows for flexible construction of association models with multiple covariates and weighting themes based on allele frequencies or predicted functionality. Additionally, pathway analyses, conditional analyses, and analyses of gene-gene and gene-environment interactions can be performed. VAT is capable of rapidly scanning through data by using multi-process computation, adaptive permutation, and simultaneously conducting association analysis via multiple methods. Results are available in text or graphic file formats and additionally can be output to relational databases for further annotation and filtering. An interface to R language also facilitates user implementation of novel association methods. The VAT's data QC and association-analysis pipeline can be applied to sequence, imputed, and genotyping array, e.g., "exome chip," data, providing a reliable and reproducible computational environment in which to analyze small- to large-scale studies with data from the latest genotyping and sequencing technologies. Application of the VAT pipeline is demonstrated through analysis of data from the 1000 Genomes project.

DOI10.1016/j.ajhg.2014.04.004
Alternate JournalAm. J. Hum. Genet.
PubMed ID24791902
PubMed Central IDPMC4067555
Grant ListU54 HG006493 / HG / NHGRI NIH HHS / United States
UC2 HL102926 / HL / NHLBI NIH HHS / United States
P30 CA016672 / CA / NCI NIH HHS / United States
RC2 HL102926 / HL / NHLBI NIH HHS / United States
RC4 MD005964 / MD / NIMHD NIH HHS / United States
R01 HG005859 / HG / NHGRI NIH HHS / United States
1R01HG005859 / HG / NHGRI NIH HHS / United States
HG006493 / HG / NHGRI NIH HHS / United States
HL102926 / HL / NHLBI NIH HHS / United States
UM1 HG006493 / HG / NHGRI NIH HHS / United States
MD005964 / MD / NIMHD NIH HHS / United States