Title | PhaseME: Automatic rapid assessment of phasing quality and phasing improvement. |
Publication Type | Journal Article |
Year of Publication | 2020 |
Authors | Majidian, S, Sedlazeck, FJ |
Journal | Gigascience |
Volume | 9 |
Issue | 7 |
Date Published | 2020 07 01 |
ISSN | 2047-217X |
Abstract | BACKGROUND: The detection of which mutations are occurring on the same DNA molecule is essential to predict their consequences. This can be achieved by phasing the genomic variations. Nevertheless, state-of-the-art haplotype phasing is currently a black box in which the accuracy and quality of the reconstructed haplotypes are hard to assess. FINDINGS: Here we present PhaseME, a versatile method to provide insights into and improvement of sample phasing results based on linkage data. We showcase the performance and the importance of PhaseME by comparing phasing information obtained from Pacific Biosciences including both continuous long reads and high-quality consensus reads, Oxford Nanopore Technologies, 10x Genomics, and Illumina sequencing technologies. We found that 10x Genomics and Oxford Nanopore phasing can be significantly improved while retaining a high N50 and completeness of phase blocks. PhaseME generates reports and summary plots to provide insights into phasing performance and correctness. We observed unique phasing issues for each of the sequencing technologies, highlighting the necessity of quality assessments. PhaseME is able to decrease the Hamming error rate significantly by 22.4% on average across all 5 technologies. Additionally, a significant improvement is obtained in the reduction of long switch errors. Especially for high-quality consensus reads, the improvement is 54.6% in return for only a 5% decrease in phase block N50 length. CONCLUSIONS: PhaseME is a universal method to assess the phasing quality and accuracy and improves the quality of phasing using linkage information. The package is freely available at https://github.com/smajidian/phaseme. |
DOI | 10.1093/gigascience/giaa078 |
Alternate Journal | Gigascience |
PubMed ID | 32706368 |
PubMed Central ID | PMC7379178 |
Grant List | UM1 HG008898 / HG / NHGRI NIH HHS / United States |