Exceptional computing and laboratory resources are available in Pittsburgh and throughout the country. Access to some of these resources requires an account, which you can set up directly with the groups responsible for their maintenance.
Resources at the University of Pittsburgh
- Your large datasets might come from the Genomic and Proteomic Core Laboratories (GPCL) or from other laboratories throughout the university that conduct sequencing or other omics assays. The GPCL services include specimen processing (DNA/RNA extraction, amplification), sequencing (Sanger, next-generation, Life Technologies Ion Torrent), genotyping (Illumina, Affymetrics, Sequenom), expression analysis (TaqMan, SABiosciences, Illumina, Affymetrix), epigenetic analysis, mutation detection, and NanoString platforms (mRNA, miRNA, CNV).
- In partnership with the Institute for Precision Medicine, the Molecular Biology Information Service of the Pitt Health Sciences Library is pleased to provide University of Pittsburgh researchers free access to the CLC genomics workbench from QIAGEN as well as the CLC BioMedical Workbench, Ingenuity Pathway analysis and Ingenuity Variant analysis for analysis of next generation sequencing (NGS) datasets. Monthly workshops are offered in the application of these tools as well as assistance by analysts from the Department of Biomedical Informatics Genome Analysis Core (DBMI-GAC).
- The Pittsburgh Genomic Resource Repository (PGRR), in partnership with the Pittsburgh Supercomputing Center (PSC) and UPMC, provides a framework through which researchers can access and analyze large national datasets, with links to complete patient data for those who are UPMC patients and who provided consent for their clinical data to be re-linked with research data. Currently, PGRR mirrors The Cancer Genome Atlas (TCGA), with de-identified clinical data from UPMC for patients whose tumors were contributed to TCGA. Additional large omic datasets will be managed in the same way. Investigators interested in gaining access to these datasets and computing infrastructure must request an account from PGRR. The Center for Simulation and Modeling (SaM), a partner in the PGRR, separately helps university researchers utilize the latest advances in parallel computing and partner with faculty and staff at the Center in analyzing and modeling large datasets.
- You can also directly contact the PSC, which offers university, government, and industry researchers access to exceptional high-performance computing hardware, software, and expertise. You can request supercomputing time and receive training through the PSC.
- The Clinical and Translational Science Institute (CTSI), in addition to maintaining the GPCL, also offers investigators help with regulatory issues (including IRB protocols and consent forms), patient recruitment, using electronic health record data, conduct text-based searches of pathology and radiology reports (and identify tissue available at the Health Sciences Tissue Bank), and ethics training (through the Responsible Conduct of Research Center).
Resources outside the University of Pittsburgh
- You can find extensive diverse data about tens of thousands of human genes at GeneCards®. If you are looking for (or how) to order next-generation sequencing services and library preparation from the top providers, you might start with Genohub.
- Once you have your sequencing data, you may wish to access supercomputing resources throughout the United States through the Extreme Science & Engineering Discovery Environment (XSEDE). Funded by the National Science Foundation (NSF), XSEDE allocates computational, visualization, and storage resources to investigators at US-based institutions (e.g., Pittsburgh Supercomputing Center).
- You can also use another NSF-supported effort, CyVerse, which provides both platforms and services for the computational infrastructure to manage, analyze, and share big and complex data.
- If you need bioinformatics software with which to analyze your omics data, CLCBio offers both user-friendly graphical and command-line interfaces for software that are compatible across sequencing platforms as well as with open access and third-party software. The Broad Institute has a wide range of software and data resources as well. The Encyclopedia of DNA Elements (ENCODE) can be used to examine the functional elements of your gene(s) of interest.
- In the cancer realm, the Broad Institute's Tumor Portal and the Pediatric Cancer Gene Database offer tools for exploring tumor genomics.
- If your sequencing data could have therapeutic implications, you can check the Druggable Genome, Online Mendelian Inheritance of Man, MedGen, and Pharmacogenomics Knowledgebase (see the Health Professionals page for more details and suggestions). You might also check the Human Protein Atlas to evaluate your genomic findings in the context of what is known about the human proteome.
- The open-source software developed by IPM to create the PGRR, TCGA Expedition, is freely available for download.
- Data Harmonization: Advances in personalized medicine research will be more rapid if researchers use consistent common nomenclature to describe phenotypes, drugs, procedures, outcomes, and so on. Using harmonized nomenclature and definitions in creating data collection forms or screens and database fields will permit sharing and merging of large datasets. BioPortal offers the most comprehensive repository of biomedical ontologies. You can see how this works in creating common, sharable phenotypes at the Phenotype Knowledgebase.