BCB420 - Computational Systems Biology

class: center, middle, inverse, title-slide

# BCB420 - Computational Systems Biology
## Lecture 11 - Enrichment Map and other Cytoscape Apps cont’d
### Ruth Isserlin
### 2020-03-22

---

## Assignment #3 
  * Data set Pathway and Network Analysis
  * <font size=5> Due April 3, 2020! @ 13:00 </font>

## What to hand in?
  * **html rendered RNotebook** - you should submit this through quercus
  * Make sure the notebook and all associated code is checked into your github repo as I will be pulling all the repos at the deadline and using them to compile your code. - Your checked in code must replicate the handed in notebook.
  * Document your work and your code directly in the notebook.
  * **Reference the paper associated with your data!**
  * **Introduce your paper and your data again** 
  * You are allowed to use helper functions or methods but make sure when you source those files the paths to them are relative and that they are checked into your repo as well.

---

## Outline for Today's lecture

* review of Enrichment map
  * looking at Pathways in depth - Reactome app, Pathway commons, String and GeneMania
  * **Post analysis**
  * **Enrichment Map Dark Matter**
  
---

---

---

---

---

---

---

---

---

---

---
## Post Analysis Summary

---

---

---

## Dark Matter

Files needed in order to conduct a dark matter analysis:

1. Definitions of the genesets used in the analysis - gmt file.

```r
library(GSA)
gmt_file <- file.path(getwd(),"data",
                    "Human_GOBP_AllPathways_no_GO_iea_February_01_2020_symbol.gmt")

capture.output(genesets<- GSA.read.gmt(gmt_file),file="gsa_load.out")

names(genesets$genesets) <- genesets$geneset.names
```

---

## Dark Matter

Files needed in order to conduct a dark matter analysis:
1. Definitions of the genesets used in the analysis - gmt file.
2. Expression file + rank file

```r
expression <- read.table(file.path(getwd(),"data",
                         "Supplementary_Table6_TCGA_OV_RNAseq_expression.txt"), 
                         header = TRUE, sep = "\t", quote="\"",  
                         stringsAsFactors = FALSE)
ranks <- read.table(file.path(getwd(),"data","Supplementary_Table2_MesenvsImmuno_RNASeq_ranks.rnk"), 
                    header = TRUE, sep = "\t", quote="\"",  
                    stringsAsFactors = FALSE)
```

---

## Dark Matter

Files needed in order to conduct a dark matter analysis:
1. Definitions of the genesets used in the analysis - gmt file.
2. Expression file
3. GSEA results files - the na_pos and na_neg spreadsheets in GSEA results directories

```r
#get all the GSEA directories
gsea_directories <- list.files(path = file.path(getwd(),"data"), 
                                 pattern = "\\.GseaPreranked")
if(length(gsea_directories) == 1){
  gsea_dir <- file.path(getwd(),"data",gsea_directories[1])
  #get the gsea result files
  gsea_results_files <- list.files(path = gsea_dir, 
                                 pattern = "gsea_report_*.*.xls")
  #there should be 2 gsea results files
  enr_file1 <- read.table(file.path(gsea_dir,gsea_results_files[1]), 
                        header = TRUE, sep = "\t", quote="\"",  
                        stringsAsFactors = FALSE,row.names=1)
  enr_file2 <- read.table(file.path(gsea_dir,gsea_results_files[1]), 
                        header = TRUE, sep = "\t", quote="\"",  
                        stringsAsFactors = FALSE,row.names=1)
}
```

---

## Dark Matter

Collect the Data we need to calculate the dark matter from the above files:
1. all genes in the expression set - already loaded above
2. all genes in the enrichment results

```r
#get the genes from the set of enriched pathwasy (no matter what threshold)
all_enr_genesets<- c(rownames(enr_file1), rownames(enr_file2))
genes_enr_gs <- c()
for(i in 1:length(all_enr_genesets)){
  current_geneset <- unlist(genesets$genesets[which(genesets$geneset.names %in% all_enr_genesets[i])]) 
  genes_enr_gs <- union(genes_enr_gs, current_geneset)
}
```

---

## Dark Matter

Data we need to calculate the dark matter:
1. all genes in the expression set - row names of the expression matrix
2. all genes in the enrichment results
3. all genes in the **significant enrichment results** - define your thresholds

```r
FDR_threshold <- 0.001
#get the genes from the set of enriched pathwasy (no matter what threshold)
all_sig_enr_genesets<- c(rownames(enr_file1)[which(enr_file1[,"FDR.q.val"]<=FDR_threshold)], rownames(enr_file2)[which(enr_file2[,"FDR.q.val"]<=FDR_threshold)])
genes_sig_enr_gs <- c()
for(i in 1:length(all_sig_enr_genesets)){
  current_geneset <- unlist(genesets$genesets[which(genesets$geneset.names %in% all_sig_enr_genesets[i])]) 
  genes_sig_enr_gs <- union(genes_sig_enr_gs, current_geneset)
}
```

---

## Dark Matter

Data we need to calculate the dark matter:
1. all genes in the expression set - row names of the expression matrix
2. all genes in the enrichment results
3. all genes in the significant enrichment results - define your thresholds
4. all genes in geneset file

```r
genes_all_gs <- unique(unlist(genesets$genesets))
```

---

## Dark Matter

Data we need to calculate the dark matter:
1. all genes in the expression set - row names of the expression matrix - There are 15196 unique genes in the expression file.
2. all genes in the enrichment results - There are 11267 unique genes in the enrichment results.
3. all genes in the significant enrichment results - There are 4773 unique genes in the enrichment results.
4. all genes in geneset file - There are 16475 unique genes in the geneset file.

---

## Venn Diagram of Dark Matter Overlaps

```r
library(VennDiagram)

A <- genes_all_gs
B <- genes_enr_gs
C <- expression[,1]
png(file.path(getwd(),"data","dark_matter_overlaps.png"))
draw.triple.venn( area1=length(A), area2=length(B), area3 = length(C),
                  n12 = length(intersect(A,B)), n13=length(intersect(A,C)),
                  n23 = length(intersect(B,C)), 
                  n123 = length(intersect(A,intersect(B,C))),
                  category = c("all genesets","all enrichment results","expression"),
                  fill = c("red","green","blue"),
                  cat.col = c("red","green","blue")
)
```

```
## (polygon[GRID.polygon.1], polygon[GRID.polygon.2], polygon[GRID.polygon.3], polygon[GRID.polygon.4], polygon[GRID.polygon.5], polygon[GRID.polygon.6], text[GRID.text.7], text[GRID.text.8], text[GRID.text.9], text[GRID.text.10], text[GRID.text.11], text[GRID.text.12], text[GRID.text.13], text[GRID.text.14])
```

```r
dev.off()
```

```
## quartz_off_screen 
##                 2
```

---

## Dark matter - overlaps
![](./data/dark_matter_overlaps.png)

---

## Dark Matter

Get the set of genes that have no annotation

```r
genes_no_annotation <- setdiff(expression[,1], genes_all_gs)
```

Get the top ranked genes that have no annotation

```r
ranked_gene_no_annotation <- ranks[which(ranks[,1] %in% genes_no_annotation),]
```

---

## Top ten Mesenchymal Dark matter genes

```r
ranked_gene_no_annotation[1:10,]
```

```
##    GeneName     rank
## 1    IGDCC3 36.32958
## 14   ZNF469 28.83028
## 40   GLT8D2 24.77158
## 53 KIAA1644 23.58145
## 61  TSPAN18 22.71841
## 74     LHFP 21.54415
## 77    VGLL3 21.34833
## 86    MEIS3 20.77773
## 88  ZCCHC24 20.71234
## 90  FAM198B 20.49151
```

---

## IGDCC3 - Immunoglobulin superfamily DCC subclass member 3

[Uniprot reference](uniprot.org/uniprot/Q8IVU1)

![](./images/img_lecture11/uniprot_igdcc3.png)

---