My Research

1.     Publications

2.     Ph.D thesis

3.     GO Charts: A novel GO-based approach for gene functional analyses

4.     Gene function prediction

5.     Identifying biological process representative genes from transcriptomics experiments.

6.     Microarray data analyses


                                            [scroll down for the individual topics]

1.      My publications

Bhat P, Yang H, Bogre L, Devoto A and Paccanaro A., Computational selection of transcriptomics experiments improves Guilt-by-Association analyses (2012), PLoS ONE.

Bhat P, Caniza-Vierci H, Yang H, Romero A, Garg P and Paccanaro A., Navigating transcriptomics experiments with GOcharts (2012), Manuscript in preparation.

Yang H, Bhat P, Ferrara E, Rigby-Jones C, Bogre L, Barak S and Paccanaro A., Detecting representative genes in biological networks through Graph Condensation (2012). Manuscript in preparation.

Yang H, Romero A, Bhat P and Paccanaro A., Protein Function Prediction from sequence alone through graph diffusion (2012), Manuscript in preparation.

Hatzimasoura E, Doczi R, Ditengou F, Bhat P, Magyar Z, Helfer A., Menke F, Hirt H, Lopez, E, Paccanaro A, Palme K, Bogre L., Meristem outgrowth is repressed by a stress activated MAPK kinase pathway through regulation of auxin transport (2012), manuscript in preparation.

Yang H, Bhat P, Shanahan H, & Paccanaro A., A maximal eigenvalue for detecting process representative genes by intergrating data from multiple sources (2008), NIPS Workshop on learning from multiple sources, Vancouver, Canada.


 

2.      Doctoral thesis

 “On the effects of large-scale transcriptomics experiments on Guilt-by-Association Analyses”

My thesis is focussed on an area of bioinformatics known as gene functional analyses. Although the genomes of hundreds of organisms have been sequenced, the functions of these genes are largely unknown. Traditional approaches for understanding gene function are resource intensive and are unlikely to match the pace of the sequencing projects any time soon. In the past decade, efforts for elucidating gene function have gained new impetus with the emergence of large-scale transcriptomics and protein-protein interaction experiments.

These datasets are mined to identify groups of genes sharing similar features, which could imply that they share similar functions – this principle is known as Guilt-By-Association (GBA). The GBA principle is generally implemented using clustering techniques which are able to group genes with similar expression profiles based on a suitable measure of similarity. Recently, graph theoretical techniques have also been used (here, nodes are genes and edges between the nodes are weighted by similarity score).

However, the focus of my thesis was on understanding the effects of the large collections of data used in these analyses. Generally, data are pooled from hundreds of experiments and similarities between genes are then analysed using the pooled dataset. I believe that this may in fact mask many functional relationships between genes. It is more reasonable to only those experiments that are relevant to the biological function one is interested in. However, what makes an experiment functionally relevant or irrelevant? One obvious way would be to analyse literature. However, in most cases it is not practical and also, the relevance of an experiment to the functional category of interest may not be evident from literature alone.

We developed a method that, given a functional category of interest, computationally identifies functionally relevant experiments from a large collection of data. We found that compared to using a large unselected collection of data:

  • Experiments selected by our method improve correlation among genes that belong to the same functional category.
  • Correlations calculated using the experiments selected by our method are a stronger feature.
  • Experiments selected by our method improve the performance of a GBA based classifier.
  • The performance of the selected experiments increases with annotation specificity. The more specific the annotation, the better the quality of the selection.
  • Experiments selected by our method also improve GBA based pathway reconstruction.
  • Importantly, our experiment selection method is applicable to various functionally classification systems such as GO, MIPS FunCat and KEGG. Based on our results for Yeast and Arabidopsis, the method is also found to be organism independent.

A paper on this work has been published in PLoS ONE (2012).

Bhat P, Yang H, Bogre L, Devoto A and Paccanaro A., Computational selection of transcriptomics experiments improves Guilt-by-Association analyses (2012), PLoS ONE.


More details regarding the method, results, datasets and MATLAB code can be found here. We are currently implementing the algorithm as a web-based tool.

 


3.      GO Charts: A novel GO based approach for gene functional analyses

In microarray experiments, often large lists or groups of genes are identified as responsive to a given treatment or experimental condition. It is often necessary to summarize the functions of genes in the group. Traditionally, function labels such as GO terms are summarized using GO over-representation analysis (ORA). However, the results are only as good as the starting gene lists of groups.

 

We are currently developing a method that summarizes the functions of the genes in a completely novel way that harnesses the structure of Gene Ontology. The technique over-rides the need for identifying groups of genes apriori and overcomes many of the limitations of traditional GO over-representation analyses techniques. We combine this approach with a powerful visualization scheme that enables the user to visually summarize the active biological functions in the dataset and its similarity to each other. We are currently implementing the technique as a web-based tool.

 

 

 

4.      Gene function prediction

 

Although the genomes of hundreds of organisms have been sequenced, the functions of these genes are largely unknown. Traditional approaches for understanding gene function are resource intensive and are unlikely to match the pace of the sequencing projects any time soon. Availability of large-scale data sources such as microarrays and protein-protein interaction have led to the development of techniques for computationally predicting gene function. This often means predicting the most probable GO label for a gene. PaccanaroLab has developed a graph diffusion based approach that makes a prediction of gene function by integrating data from various sources such as homology, orthology, gene expression, protein-protein interactions and several other data types. I was largely involved in the conceptualization and hypotheses development. We are currently preparing a manuscript for publication.

 

5.     Identifying biological process representative genes in gene expression datasets

In gene expression analyses, it is often necessary to identify biological processes activated in the experiment and to identify genes whose expression could be considered as representative of the biological process. Along with colleagues at the Paccanaro Lab, I was involved in the development of a graph-based technique for solving this problem. Our approach finds a wide range of applications in biology alone such as discovering bio-markers or identifying candidates for qPCR experiments from a large list of putative candidates. We are currently preparing a manuscript for publication.

 

6.      Microarray data analyses

I routinely work with microarray pre-processing and data analyses pipe-lines starting from raw data. I have worked with microarrays from a wide variety of technologies such as Affymetrix, CATMA, Custom 2-colour arrays and Agilent. My tools include MATLAB, R Bioconductor, GeneSpring and GeneMaths suites. I routinely support other projects with microarray data analyses.