A Supervised Weighted Similarity Measure for Gene Expressions using Biological Knowledge

Authors: Shubhra Sankar Ray and Sampa Misra
shubhra@isical.ac.in and sampamisra1989@gmail.com

        Welcome to the page for predicting weights of Pearson correlations obtained from various types of microarray experiments using functional annotations of genes.

The steps for estimating the weights of WPC (using Matlab and 16GB RAM) are as follows:

·        Download yeastallgoprocessvector.mat. When loaded in Matlab the file will create a variable "vector" which will provide annotation profiles for Saccharomyces cerevisiae genes using yeast GO-Slim process annotations in SGD.

·        Download the expression values Expression.xls for All Yeast data set.

·        Download the Main Program (weightcalculationforWPC.m) for estimating weights for different type of microarray experiments. Before running this program one has to keep all other downloadable files, mentioned above, in the same folder of the Main Program. The main program will return the weights in the variable index and various weight combinations & corresponding PPVs in the file combination_PPV.txt.


For gene annotations one can use the existing yeastallgoprocessvector.mat or one can create the new/latest annotation profiles of genes as follows:

·        Upload gene names genes.txt to the website SGD GO Slim Mapper, select GO Slim terms as 'Yeast GO-Slim: Process', click 'SELECT ALL Terms' from GO Slim Terms, and download the result file. Delete the last two rows if categories are 'others' & 'not yet decided' and replace the 'commas' with 'Tab' in the result file and save it.

·        Divide the tab delimited result file in two files, one containing Go-Slim terms and named as Go-ProcessCategory.txt and other one containg only genes belonging to different categories and named as yeast_GOslimprocessGene.xls.

·        Download GenetoORF.txt file and the program genetoORFmapping.m to convert gene names (in yeast_GOslimprocessGene.xls) to their corresponding ORFs. The program will create a file yeast_GOslimprocessORF.txt in tab delimated format to be used in the next step.

·        Run the program vectorconstruction.m . The program will use genes.txt and yeast_GOslimprocessORF.txt to construct yeastallgoprocessvector.mat .

The steps to construct GenetoORF.txt are as follows:

1. In the website SGD_YeastMine, paste the genes name in Analyse section and click Analyse.
2. Export the file as (filename).tsv format.
3. Copy the columns representing genes and ORFs only and save the file as GenetoORF.txt (this GenetoORF.txt file shows the format for saving).
4. Replace ' ' with N.