Vous êtes sur la page 1sur 8

TPIEA User's Guide

Version 1.0
June 2016

Developers: Feng Zhang


License: GPL
URL: https://sourceforge.net/projects/tpieav1/files
Email: fzhxjtu@mail.xjtu.edu.cn

1
2
1. Installing and Running TPIEA
TPIEA is developed by C to interface with R for efficient data analysis. Please make
sure that R (http://www.r-project.org/) has been installed on your system. TPIEA is
a command line based program. Unzip the downloaded "TPIEA" package and run
"TPIEA" program at your terminal window to start analysis.
Example:
./TPIEA

2. Parameter Setting
TPIEA provides three choices for pathway association analysis, including TPIEA, the
classic pathway association analysis (PEA) as well as a combined approach of TPIEA
and PEA with users assigned weighting parameters. TPIEA will ask users to input a
set of parameters before starting analysis:
[1] Types of your GWAS summary data: TPIEA provides two choices for GWAS
summary data, including (1) SNP level GWAS summary data; (2) Gene level
GWAS summary data. The “(2) Gene level GWAS summary data” option allows
researchers to use other SNP-gene scoring approaches (please see following [9]).
[2] SNP-gene annotation file name: Please input the storage path and name of
SNP-gene annotation file. Please make your SNP-gene annotation files using
following format:
Example:
rs2101576 AJAP1
rs10915493 AJAP1

3
rs10799221 AJAP1
….

Note: Each line of this file records a SNP (the first column) and corresponding
gene (the second column).
[3] Gene-pathway annotation file name: Please input the storage path and name of
gene-pathway annotation file. Users can obtain pathway annotation information
from public database, such as KEGG (http://www.genome.ad.jp/kegg/
pathway.html), Gene Ontology (http://www.geneontology.org), GSEA
Molecular Signatures Database (http://www.broadinstitute.org/gsea/
msigdb/index.jsp), Reactome (http://www.reactome.org/). Please make your
gene-pathway annotation files using following format:
Example:
KEGG_GALACTOSE_METABOLISM LALBA
KEGG_GALACTOSE_METABOLISM HK2
KEGG_GALACTOSE_METABOLISM HK1
KEGG_GALACTOSE_METABOLISM GLB1
….
Note: Each line of this file records a pathway (the first column) and
corresponding gene (the second column).
[4] Gene interaction annotation file name: Please input the storage path and name
of gene interaction network file name. Please make your gene interaction
network file using following format:
Example:
OR52N1 OR51S1
OR52N1 OR52E2
OR52N1 OR52E8
OR52N1 OR52H1
OR52N1 OR52L1
OR52N1 OR52M1
OR52N1 OR52N4
….
Note: Each line of this file records a pair of genes with interaction effects.

4
[5] GWAS summary file name: Please input the storage path and name of GWAS
summary data file (please see [1] for detail). Please make your GWAS summary
data files using following formats:
(1) SNP level GWAS summary data:
Example:
rs10003405 4 111267403 0.23346
rs10003483 4 94458996 0.38401
rs1000499 12 39272339 0.46119
rs10005551 4 41326053 0.49407
….
Note: Each line of this file records SNP name, chromosome, position (bp) and
GWAS P value in order.
(2) Gene level GWAS summary data:
Example:
gene1 1 111267403 2.34678
gene2 2 94458996 1.02343
gene3 3 39272339 3.11790
gene4 4 41326053 0.22190
….
Note: Each line of this file records gene name, chromosome, start position (bp)
and gene score (generated by users) in order.
[6] Permuation times: Permutations are used for P value calculation in TPIEA.
More than 1,000 permutations are recommended for obtaining accurate P values.
Note, too large permutation times will make TPIEA taking a long time to
complete data analysis.
[7] Max and min sizes of analyzed pathways: Users need to define the maximum
(fault value = 200 genes) and minimum (fault value = 5 genes) sizes of pathways
analyzed by TPIEA.
[8] Weight parameter of TPIEA and PEA statistics: Please input the weighting
parameters (fault value = 0.5) of enrichment score statistics of TPIEA and classic
pathway enrichment analysis approach.
[9] Type of SNP-Gene scoring approach: TPIEA provides 5 choices for gene scoring

5
using GWAS SNP summary data , including (1) Maximum SNP statistics; (2)
PCA1-based pseudo-SNP statistics (require the genotype data of reference
population); (3) PCA2-based pseudo-SNP statistics (require the genotype data of
reference population); (4) Sidak's combination statistics; (5) Modified Sidak's
combination statistics; (6) Fisher's combination statistics.

[10] Random permutation approach: Random permutations are used to obtain the
null distribution of TPIEA statistics. TPIEA provides 2 choices for random
permutation, including (1) Random circular genome permutations conducted by
TPIEA; (2) Permutated GWAS summary data generated by Users. The “(2)
Permutated GWAS summary data generated by users” option allows researchers
to use other permutation approaches, such as phenotype label permutations
(require individual genotype and phenotype data).
[11] Genotype file name of reference population: For “PCA1-based pseudo-SNP
statistics” or “PCA2-based pseudo-SNP statistics” (please see [9]), please input
the TXT file name of genotype data of reference population (do not include file
suffix “.txt” ). Please make your genotype data files using following formats:
Example:

6
SNP1 1 3 0 1
SNP2 2 1 2 2
SNP3 1 3 0 3
SNP4 3 0 2 2
….
Note: Each line of this file records SNP name and corresponding genotype
scores (for instance, numbers of minor allele) of reference population (four
individuals in above example).
To accelerate the speed of TPIEA, please make one genotype file for each
chromosome, such as Refgeno1.txt, Refgeno2.txt, Refgeno3.txt…..

3. Output files
TPIEA will output two result files:
[1] Pathway_result.txt: Pathway analysis result files. Each line of this file records a
pathway and corresponding analysis results.
Example:
Pathway name NIES P FDR NES P FDR COM P FDR
KEGG_ABC_TRANSPORTERS 0.09 0.52 0.97 0.48 0.31 1.00 0.32 0.37 1.00
KEGG_ACUTE_MYELOID_LEUKEMIA 1.57 0.06 0.39 1.17 0.13 0.68 1.43 0.08 0.47

NIES denotes the normalized interaction enrichment score statistics of TPIEA.
NES denotes the normalized enrichment score statistics of PEA; COM denotes
the weighted score statistics of TPIEA and PEA. Random circular genome
permutation approach is used to calculate the emperical P values and false
discovery rates (FDR) of each pathway. The pathways with FDR < 0.05 are
considered as significant pathways.
[2] NIESplot.pdf, NESplot.pdf and COMplot.pdf: Plots of pathway analysis results
of TPIEA, PEA and Combined approach of TPIEA and PEA. In generated figures,
each point denotes a pathway. X-axis presents the total number of pathways
analyzed by TPIEA software. Y-axis shows –log10 (P values) calculated by TPIEA.

7
[3] Pathway-interacting genes.txt: List of interacting gene pairs within each
pathway.
Example:
KEGG_ABC_TRANSPORTERS
TAP2 TAP1
TAP1 ABCC3
TAP1 ABCC1
CFTR ABCC11
….

Vous aimerez peut-être aussi