The overall goal of the Bioinformatics Shared Resource is to provide comprehensive bioinformatics services to the CCSG research programs through the different phases of a cancer research project. Shared Resource personnel assist Cancer Center researchers in conducting high-throughput experiments. These experiments are guided by computational insights and designed to ensure that the correct bioinformatics analysis plans are developed in relevant grant and protocol development and ultimately employed in subsequent manuscript preparations.
Resource personnel work closely with principal investigators and other members of the research team to define the nature and scope of relevant bioinformatics needs, as well as the type(s) of omics data to be collected and analyzed to achieve the study objectives. This includes detailed discussion on the hypothesis of interest, preliminary data-mining strategies using public data repositories, consideration of the appropriate high-throughput techniques, employment of the optimal experiment design scheme, and anticipated limitations of the data. One example of this type of service is to determine the optimal sequencing coverage in a targeted sequencing project in order to accurately detect genetic variants of interest in a cost-effective way.
Resource personnel provide exploratory data-mining support for the development of the preliminary data section. They review — and, in general, develop — the bioinformatics analysis plan for the majority of CCSG and RPCI grant applications.
Resource personnel perform bioinformatics analysis and data mining of various omics and other biological datasets. They assist investigators with the integration of omics data with clinical information and develop analytical models for these data based on the hypothesis of interest.
The Resource provides processing, visualization and analysis pipelines for various next-generation sequence (NGS) applications. This includes detection of single nucleotide variations (SNV), structural variations (SV), copy number variations (CNV) and insertions and deletions (indel), from DNA-Seq; transcript abundance estimation, splice and fusion discovery, differential expression analysis, and post-transcriptional variation detection from RNA-Seq and/or miRNA-Seq; detection of epigenetic changes, transcriptional factor binding sites and RNA-binding sites from methyl-Seq, ChIP-Seq and CLIP-Seq.
Data analysis pipelines are also available for multiple microarray platforms (gene expression chip, SNP genotyping array, CGH array, methylation BeadChip, etc.), as well as several mass spectrometry–based proteomics platforms (LC-MS, MALDI-TOF and FTICR-MS). Bioinformatics personnel also perform in silico analysis (such as pathway and function enrichment analysis, and gene network analysis) and assist investigators with finding plausible biological and/or clinical interpretations of their respective results.
Resource personnel, serving as scientific collaborators, review and write the bioinformatics analysis section of the manuscript of interest, and provide the relevant interpretation of the data models as they relate to the conclusions presented in the given manuscript.
Education & Training
Resource personnel provide consultation, assistance, and hands-on training, when necessary, for investigators on the bioinformatics tools and resources needed to analyze their own data. Exemplary services include assisting investigators in utilizing online tools for protein and nucleotide sequence analysis, and with respect to utilizing Galaxy to perform programming-free genomics data analysis. Services also include instructing users in the use of IGV and/or UCSC genome browsers to visualize cancer genome features and methods for accessing the 1,000 genome project data to retrieve genetic variants.
In order to provide data warehousing and computing resources access for CCSG investigators for storing, managing, analyzing and sharing omics data and other types of data, the Resource contributes to the development of the underlying informatics infrastructure in close collaboration with the information technology (IT) department. The Resource personnel serve on the IT Advisory Board and make recommendations on key data management and computer system infrastructure. It is critical that they be involved in decisions to ensure that the right data in the correct format flows into the appropriate data repositories.
Examples of these types of services include maintaining the high-performance computing facility required for various cancer genome sequencing applications, developing appropriate data repository products for querying and updating the annotated omics analysis results, implementing customized Laboratory Information Management Systems (LIMS) for project and sample tracking and management, and coordinating efforts with other Resources, including the Genomics Shared Resource, Data Bank and BioRepository, Pathology Research Network, and Clinical Data Network. The primary focus for this coordination is centralized access and integration of omics, epidemiologic, biospecimen and clinical data using Enterprise GeneTegra-based solutions deployed by the IT department.
Resource personnel collect and test available bioinformatics products in order to help investigators select the appropriate tools for their specific studies. Exemplary services include comparison of gene fusion detection algorithms for RNA-Seq data analysis and assessment of CNV-calling methods for whole-genome and/or whole-exome sequencing data. The evaluation results are summarized and reported to investigators for their consideration.
Tool, Database, and Web Development
Resource personnel develop customized bioinformatics tools, interactive web applications and underlying back-end databases, as necessary, when existing products are unavailable or do not meet the customized needs of CCSG investigators relative to answering specific study objectives. Whenever possible, we make these tools available to the broader cancer research community.