Below is a collection of bioinformatics tools/resources that are developed by our CGS researchers.


Combination of ChIP-seq/chip TFBS and expression profiling analysis is an effective tool for biologists to study the function of a TF. ChIP-Array analyze ChIP-seq/chip and expression data together, and construct a regulatory network around a TF of interests in human, mouse, yeast, fly and arabidopsis.

Paper: Qin J, Li MJ, Wang P, Zhang MQ, Wang J. 2011. ChIP-Array: combinatory analysis of ChIP-seq/chip and microarray gene expression data to discover direct/indirect targets of a transcription factor. Nucleic Acids Res. 39 (Web Server issue):W430-6. Pubmed

Web tool: ChIP-Array


The database contained curated publications about positive selection in different human populations, which consisted of over 15,000 loci from either publications attempting to study positively selected genomic locus and gene related to specific functions/traits/diseases, or publications to detect the genome-wide selective signals with different statistical methods.

Paper: Li MJ, Wang LY, Xia Z, Wong MP, Sham PC, Wang J. 2014. dbPSHP: a database of recent positive selection across human populations. Nucleic Acids Res. 42(Database issue):D910-6. Pubmed

Web database: dbPSHP


DDGni (dynamic delay gene-network inference), a novel gene-network-inference algorithm based on the gapped local alignment of gene-expression profiles. The local alignment can detect short-term gene regulations, that are usually overlooked by traditional correlation and mutual Information based methods.

Paper: Yalamanchili HK, Yan B, Li MJ, Qin J, Zhao Z, Chin FY, Wang J. 2014. DDGni: dynamic delay gene-network inference from high-temporal data using gapped local alignment. Bioinformatics. 30(3):377-83. doi: 10.1093/bioinformatics/btt692. Pubmed

Software: Download page


A novel SNP-detection program, FaSD, to call SNPs from NGS data. Evaluated on two independent datasets from The Cancer Genome Atlas project (TCGA) with Illumina and Affymetrix SNP arrays as gold standards, FaSD showed superior performance over current state-of-the-art SNP calling software.

Software: Download page


A fast and accurate somatic single-nucleotide variations detection program, FaSD-somatic, that combines the joint genotype likelihoods, and the FaSD score, which is previously defined in FaSD.

Paper: Wang W, Wang P, Xu F, Luo R, Wong MP, Lam TW, Wang J. 2014. FaSD-somatic: a fast and accurate somatic SNV detection algorithm for cancer genome sequencing data. Bioinformatics. pii: btu338. Pubmed

Software: Download page


EpiRegNet aims to build a transcriptional regulatory network composing of histone modification and transcription factor binding in promoters and interactions between factors in these two fields.

Paper: Wang LY, Wang P, Li MJ, Qin J, Wang X, Zhang MQ, Wang J. 2011. EpiRegNet: constructing epigenetic regulatory network from high throughput gene expression data for humans. Epigenetics. 6(12):1505-12. Pubmed

Web tool: EpiRegNet


FastPval is a two stage p-value computation software which compute the empirical p-value by two stage ranking strategy, it can produce very low P-value based on huge dataset. This fast and powerful tool takes advantage of a delicate cutoff which separate the exactly significant area. Compared to the traditional ranking method, Tspvc has a good time efficiency, lower memory consuming and tiny storage spaces with high accuracy.

Paper: Li MJ, Sham PC, Wang J. 2010. FastPval: a fast and memory efficient program to calculate very low P-values from empirical distribution. Bioinformatics. 26(22):2897-9. Pubmed

Software: Download page

GEC: Genetic Type 1 Error Calculator

GEC is a Java-based application developed to address multiple-testing issue with dependent Single-nucleotide polymorphisms (SNPs).

Paper: Li MX, Yeung JM, Cherny SS, Sham PC. 2012. Evaluating the effective numbers of independent tests and significant p-value thresholds in commercial genotyping arrays and public imputation reference datasets. Hum Genet. 131(5):747-56. Pubmed

Software: Download page

Genetic Power Calculator

A website for performing power calculations for the design of linkage and association genetic mapping studies of complex traits.

Paper: Purcell S, Cherny SS, Sham PC. 2003. Genetic Power Calculator: design of linkage and association genetic mapping studies of complex traits. Bioinformatics. 19(1):149-50. Pubmed

Software: Download page


Interpreting noncoding phenotypically associated variants is an indispensable step to understand molecular mechanism of complex traits, GWAS3D systematically compute the probability of genetics variants affecting regulatory pathways and underlying disease/trait associations by integrating chromatin state, functional genomics, sequence motif, and conservation information when given GWAS data or variant list.

Paper: Li MJ, Wang LY, Xia Z, Sham PC, Wang J. 2013. GWAS3D: Detecting human regulatory variants by integrative analysis of genome-wide associations, chromosome interactions and histone modifications. Nucleic Acids Res. 41(Web Server issue):W150-8. doi: 10.1093/nar/gkt456. Pubmed

Web database: GWAS3D


GWASdb is a one stop shop which combines collections of traits/diseases associated SNP (TASs) from current GWAS and their comprehensive functional annotations, as well as disease classifications. We aim to help researchers and clinicians to maximize the utilility of the most recent GWAS data and gain biological insights through an integrative, multi-dimensional functional annotation portal.

Paper: Li MJ, Wang P, Liu X, Lim EL, Wang Z, Yeager M, Wong MP, Sham PC, Chanock SJ, Wang J. 2012. GWASdb: a database for human genetic variants identified by genome-wide association studies. Nucleic Acids Res. 40(Database issue):D1047-54. Pubmed

Web database: GWASdb


GWASrap is a gateway for the variants representation, annotation and prioritization on genome wide association study. The framework expressed satisfactory convenience and performance for post-GWAS analysis. Using a one-stop solution, users can quickly fetch very comprehensive annotation when they are browsing the GWAS result in highly interactive Circos-style graph or dynamic Manhattan panel. System can perform independent variant prioritization based on additive effect principle by combining the original statistical value and variant prioritization score.

Paper: Li MJ, Sham PC, Wang J. 2012. Genetic variant representation, annotation and prioritization in the post-GWAS era. Cell Res. 22(10):1505-8. Pubmed

Web database: GWASrap


IGG is an open-source Java package with graphic interface to efficiently and consistently integrate genotypes across high throughput genotyping platforms (e.g., Affymetrix and Illumina), the HapMap genotype repository (http://www.hapmap.org/), and even genotypes from the collaborators’ projects.

Paper: Li MX, Jiang L, Kao PY, Sham PC, Song YQ. 2009. IGG3: a tool to rapidly integrate large genotype datasets for whole-genome imputation and individual-level meta-analysis. Bioinformatics. 25(11):1449-50. Pubmed

Software: Download page

KGG: A systematic biological Knowledge-based mining system for Genome-wide Genetic studies

KGG is a software tool to perform knowledge-based analysis for genome-wide association studies (GWAS).

Paper: Li MX, Sham PC, Cherny SS, Song YQ. 2010. A knowledge-based weighting framework to boost the power of genome-wide association studies. PLoS One. 5(12). Pubmed

Software: Download page

KGGSeq: A biological Knowledge-based mining platform for Genomic and Genetic studies using Sequence data

KGGSeq is a software platform constituted of Bioinformatics and statistical genetics functions making use of valuable biologic resources and knowledge for sequencing-based genetic mapping of variants/genes responsible for human diseases/traits. Currently, a comprehensive and efficient framework was newly implemented on KGGSeq to filter and prioritize genetic variants from whole exome sequencing data.

Paper: Li MX, Gui HS, Kwan JS, Bao SY, Sham PC. 2012 A comprehensive framework for prioritizing variants in exome sequencing studies of Mendelian diseases. Nucleic Acids Res. 40(7):e53. Pubmed

Software: Download page


MapIn is an interactive mapping tool for biological descriptors.

Web tool: MapIn


NRProF is a novel automated protein functional assignment method based on the neural response algorithm, which simulates the neuronal behavior of the visual cortex in the human brain. The main idea of this algorithm is to define a distance metric that corresponds to the similarity of the subsequences and reflects how the human brain can distinguish between different sequences.

Paper: Yalamanchili HK, Xiao QW, Wang J. 2012. A novel neural response algorithm for protein function prediction. BMC Syst Biol. 6 Suppl 1:S19. Pubmed

Software: Download page


OpenADAM is a web-based database management system for the large amount of genotype data generated from the Affymetrix GeneChip® Mapping Array and Genome-Wide Human SNP Array platforms.

Paper: Yeung JM, Sham PC, Chan AS, Cherny SS.2008. OpenADAM: an open source genome-wide association data management system for Affymetrix SNP arrays. BMC Genomics. 9:636. Pubmed

Software: Download page


We developed a computational algorithm that uses orthology and protein-protein interaction information to infer gene-phenotype associations for multiple species. Furthermore, we developed a web server that provides genome-wide phenotype inference for six species: fly, human, mouse, worm, yeast, and zebrafish.

Paper: Y Wang P, Lai WF, Li MJ, Xu F, Yalamanchili HK, Lovell-Badge R, Wang J. 2013. Inference of gene-phenotype associations via protein-protein interaction and orthology. PLoS One. 8(10):e77478. doi: 10.1371/journal.pone.0077478. Pubmed

Web tool: PhenoPPIOrth

PI: Rapid and versatile imputation of p-values for genetic association

Multi-thread Java-based application developed to infer p-values of untyped Single-nucleotide polymorphisms (SNPs) through p-values of SNPs in LD with the untyped one.

Software: Download page


PLINK is a free, open-source whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.

Paper: Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, Maller J, Sklar P, de Bakker PI, Daly MJ, Sham PC. 2007. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 81(3):559-75. Pubmed

Software: Download page


Four-prime-number genetic code for indel decryption and sequence read alignment.

Paper: Lam CW. 2014. PrimeIndel: Four-prime-number genetic code for indel decryption and sequence read alignment. Clin Chim Acta. 436C:1-4. Pubmed

Web tool: PrimeIndel


ProteoMirExpress integrates proteomic and mRNA expression data together to infer miRNA-centered regulatory networks. With both types of high-throughput data from the users, ProteoMirExpress is able to discover not only miRNA targets that have decreased mRNA, but also subgroups of targets with suppressed proteins whose mRNAs are not significantly changed or with decreased mRNA whose proteins are not significantly changed, which are usually ignored by most current methods.

Paper: Qin J, Li MJ, Wang P, Wong NS, Wong MP, Xia Z, Tsao GS, Zhang MQ, Wang J. 2013. ProteoMirExpress: inferring microRNA and protein-centered regulatory networks from high-throughput proteomic and mRNA expression data. Mol Cell Proteomics. 12(11):3379-87. doi: 10.1074/mcp.O112.019851. Pubmed

Web tool: ProteoMirExpress


SnpTracker is a Java-based tool developed to extract the latest version rsID and genomic coordinates of SNPs given any version of rs ID(s) according to the SNP track history and coordinates data.

Software: Download page


SpliceNet goes beyond differentially expressed genes and abstracts isoform specific co-expression networks from exon-level RNA-Seq data using Large Dimensional Trace. It provides a more comprehensive picture to our understanding of complex diseases by inferring networks rewiring between normal and cancer/diseased samples at isoform resolution. It can be applied to any exon level RNA-Seq data and exon array data.

Paper: Yalamanchili HK, Li Z, Wang P, Wong MP, Yao J, Wang J. 2014. SpliceNet: recovering splicing isoform-specific differential gene networks from RNA-Seq data of normal and diseased samples. Nucleic Acids Res. pii: gku577. Pubmed

Software: Download page


The online tool wKGGSeq offers a strategy-based pipeline to facilitate the detection of disease causal variants with various genetic inheritance patterns, supported by many versatile functions to control quality of sequence variants and to prioritize potential disease-causal variants.

Web tool: wKGGSeq

