Description
Microarrays are part of a new class of biotechnologies which allow the monitoring of expression levels for thousands of genes simultaneously. This paper describes statistical methods for the identification of differentially expressed genes in replicated cDNA microarray experiments. Although it is not the main focus of the paper, we stress the importance of issues such as image processing and normalization. Image processing is required to extract measures of transcript abundance for each gene spotted on the array from the laser scan images. Normalization is needed to identify and remove systematic sources of variation, such as differing dye labeling efficiencies and scanning properties. There can be many systematic sources of variation and their effects can be large relative to the effects of interest. After a brief presentation of our image processing method, we describe a within-slide normalization approach which handles spatial and intensity dependent effects on the measured expression levels. Given suitably normalized data, our proposed method for the identification of single differentially expressed genes is to consider a univariate testing problem for each gene and then correct for multiple testing using adjusted p-values. No specific parametric form is assumed for the distribution of the expression levels and a permutation procedure is used to estimate the joint null distribution of the test statistics for each gene. Several data displays are suggested for the visual identification of genes with altered expression and of important features of these genes. The above methods are applied to microarray data from a study of gene expression in two mouse models with very low HDL cholesterol levels. The genes identified using data from replicated slides are compared to those obtained by applying recently published single-slide methods.