1,p-value计算方法
Estimating Significance. We assess the significance of an observed ES by comparing it with the set of scores ESNULL computed with randomly assigned phenotypes.
- Randomly assign the original phenotype labels to samples, reorder genes, and re-compute ES(S).
- Repeat step 1 for 1,000 permutations, and create a histogram of the corresponding enrichment scores ESNULL.
- Estimate nominal P value for S from ESNULL by using the positive or negative portion of the distribution corresponding to the sign of the observed ES(S).
2,FDR计算方法
Multiple Hypothesis Testing.
- Determine ES(S) for each gene set in the collection or database.
- For each S and 1000 fixed permutations π of the phenotype labels, reorder the genes in L and determine ES(S, π).
- Adjust for variation in gene set size. Normalize the ES(S, π) and the observed ES(S), separately rescaling the positive and negative scores by dividing by the mean of the ES(S, π) to yield the normalized scores NES(S, π) and NES(S) (see Supporting Text).
- Compute FDR. Control the ratio of false positives to the total number of gene sets attaining a fixed level of significance separately for positive (negative) NES(S) and NES(S, π).
Create a histogram of all NES(S, π) over all S and π. Use this null distribution to compute an FDR q value, for a given NES(S) = NES* ≥ 0. The FDR is the ratio of the percentage of all (S, π) with NES(S, π) ≥ 0, whose NES(S, π) ≥ NES*, divided by the percentage of observed S with NES(S) ≥ 0, whose NES(S) ≥ NES*, and similarly if NES(S) = NES* ≤ 0.