Rqc: A Bioconductor Package for Quality Control of High-Throughput Sequencing Data
Main Article Content
Abstract
As sequencing costs drop with the constant improvements in the field, next-generation sequencing becomes one of the most used technologies in biological research. Sequencing technology allows the detailed characterization of events at the molecular level, including gene expression, genomic sequence and structural variants. Such experiments result in billions of sequenced nucleotides and each one of them is associated to a quality score. Several software tools allow the quality assessment of whole experiments. However, users need to switch between software environments to perform all steps of data analysis, adding an extra layer of complexity to the data analysis workflow. We developed Rqc, a Bioconductor package designed to assist the analyst during assessment of high-throughput sequencing data quality. The package uses parallel computing strategies to optimize large data sets processing, regardless of the sequencing platform. We created new data quality visualization strategies by using established analytical procedures. That improves the ability of identifying patterns that may affect downstream procedures, including undesired sources technical variability. The software provides a framework for writing customized reports that integrates seamlessly to the R/Bioconductor environment, including publication-ready images. The package also offers an interactive tool to generate quality reports dynamically. Rqc is implemented in R and it is freely available through the Bioconductor project (https://bioconductor.org/packages/Rqc/) for Windows, Linux and Mac OS X operating systems.