Next Generation Sequence Analysis

Data storage and return

All data collected from the HiSeq are saved locally on the Dell machine that controls the HiSeq instrument. Our core is equipped with a file server with 50TB of space that are solely dedicated to sequencing data. The data are backed up daily to external mirrored storage, which is backed up monthly to off-site tape storage. The sequencing file server can only be accessed by designated personnel, who are regularly trained for information security protocols. Currently, the raw data are stored for at least 6 months (while space permitting) after FASTQ generation and data processing. Upon request of researchers, we deliver on hard disks or transfer the files (both raw and processed) via secured FTP.

Computational Pipeline

The raw data from the sequencer are transferred to a high-performance 3,000-CPU cluster on campus of Arizona State University via a 10GB ultra-fast connection, and processed for downstream application, from base calling to data quantification. For 8 flow cells of 2x100 paired end runs, typical turnover time from raw data to FASTQ is about 2 days, and alignment/quantification takes around 5 days, for example, for RNA-Seq.

Sequence Quality

Utilization of an automated sample preparation pipeline and a thorough quality control protocol have been consistently producing optimal cluster density and thus sequence reads with excellent quality. For example, in a recent 2x100b p paired-ended DNA-Seq run on human cancer samples, we obtained 144 million paired reads with average Phred score of over 30 across entire sequences. Consequently, when aligned to Ensembl 69 genome by Bowtie2, 85% of reads were mapped. Similarly, a recent 1x50bp RNA-Seq run on rhesus monkey samples produced 150 M reads per lane and showed an outstanding alignment ratio of 91%.