Stronger, Faster, Smarter: ARUP’s New NGS Pipeline and Analytical Platform

Biocomputing NGS Pipeline called Pipey and its team
January 18, 2018

A number of in-house ARUP teams brought Pipey—the new biocomputing NGS pipeline—to life over a period of 18 months. Each person played a critical role and brought an important perspective to the design.


“We’ve now moved from a system that processed one sample at a time to an elastic system that can process thousands of samples in the same time as a single sample,” says Dr. Elaine Gee, director of Bioinformatics. “This is a true demonstration of scalability.”

Gee is referring to Pipey, ARUP’s new bioinformatics pipeline and cloud-based compute infrastructure for next generation sequencing (NGS) testing. Pipey went live early January.

While ARUP has been offering NGS testing for the past four years, the demand has continually increased—more and more physicians turn to precision medicine as NGS costs decrease. NGS has grown in the areas of oncology, hereditary (germline) genetics, and infectious disease.

ARUP is not a software shop. We are a national reference laboratory that used best-in-class software engineering practices in building Pipey to improve the quality of the NGS pipeline while allowing for unlimited scalability.

Elaine Gee, PhD, ARUP Director of Bioinformatics.

In order to handle the analytics for NGS testing at scale, a new platform design was necessary to enable efficient processing for large sample volumes while supporting short turnaround times. “The analytics platform required to transform the raw sequencing data into interpretable results was rebuilt—from the software, hardware, to infrastructure,” adds Gee.

Over the course of some jam-packed, 18-plus months, Gee led a talented team of bioinformaticians, data engineers, software programmers, and system administrators to bring Pipey online.

“Each person played a critical role in building Pipey by bringing an important perspective to the design; the saying that ‘the whole is greater than the sum of its parts’ truly applies here in synthesizing various expertise to create a high-quality analytic platform relevant for a national reference laboratory,” emphasizes Gee. This system was validated using recommendations from CAP and AMP’s newest validation guidelines.

What Does Cloud Power Have to Do With It?

In NGS testing, two key factors are involved in the analysis: human interpretation and software/instrumentation. Medical directors and clinical variant scientists (PhDs) use their training and knowledge from the medical literature to identify mutations that are of clinical significance.

The non-human analysis is done by Pipey, where raw sequencing data, generated from extracted specimens, runs through Pipey’s platform to churn out lists of genetic variants of interest with associated annotations and quality control metrics using optimized alignment and variant calling algorithms.

Mark Monroe at work

“With Pipey, we are working on standardizing the coding so we have one workflow that we can use in more than 40 different ways,” says Lead Data Engineer Mark Monroe.

With this new bioinformatics data processing system, running one sample takes as long as running 10,000 samples. This is because of the computational power leveraged through the cloud. For patients, this means receiving answers faster.

This is the first time ARUP has built a system using cloud resources to support internal computational needs. “The cloud provides access to vast amounts of compute power and data storage that allows for day-to-day flexibility,” says Gee. This cloud-based system includes key security measures to protect patient data.

Bringing such compute capabilities in-house would require a large upfront cost to purchase equipment, hire a full staff for support and maintenance, and would limit the compute resources at hand with fixed specifications. Other in-house drawbacks include idle resources during downtimes. “With the click of one button, I can essentially rent 100 computers, knowing what our costs will be, and do it all faster,” says lead data engineer, Mark Monroe.

Pipey standardizes the analytical workflows across many NGS tests, making it easier to maintain while providing high-quality analytic results. Prior to Pipey, each NGS test was supported by a custom analytical workflow with unique parameters.

“With Pipey, we are working on standardizing the coding so we have one workflow that we can use in more than 40 different ways,” says Monroe. This makes it easier to validate tests, fix bugs, and allows for efficiency and speed.

The new pipeline also provides high sensitivity in variant calling across a wide range of variant classes, and because it is modular, it can be easily upgraded as needed to bring more testing functionality online.

Gee adds, “ARUP is not a software shop. We are a national reference laboratory that used best-in-class software engineering practices in building Pipey to improve the quality of the NGS pipeline while allowing for unlimited scalability.” 

Peta Owens-Liston