Abstract
Stem cell laboratories around the world routinely generate whole-genome expression data to study systems-level processes in stem cell biology, and computational clustering methods are critical for the genome-wide analysis of such large data sets. To address major limitations with commonly used clustering approaches, we developed a novel computational method called AutoSOME to automatically cluster large, high-dimensional data sets, such as whole-genome microarray expression data, without prior assumptions about cluster number or data structure. In previous work we demonstrated that AutoSOME clustering is an effective method for studying genome-wide expression patterns in stem cells. Here we present a primer that describes how to use this method to perform comprehensive cluster analyses of stem cell gene expression data. We include two detailed protocols illustrating the identification of gene co-expression modules and clusters of cellular phenotypes in a single step (Protocol 1), and the visualization of transcriptome variation among stem cells using an intuitive network display (Protocol 2). The workflow described in this chapter is sufficiently general for use with a wide variety of in-house and publicly available genomics data sets.
Keywords: Gene clustering, whole-genome expression data, cellular phenotypes, AutoSOME gene co-expression modules, machine learning, cartography, graph theory.