`CORUM.Rmd`

Get the latest version of the 293T PPI network …

`bp.293t.3 <- BioPlex::getBioPlex(cell.line = "293T", version = "3.0")`

`## Using cached version from 2022-08-18 13:22:31`

`head(bp.293t.3)`

```
## GeneA GeneB UniprotA UniprotB SymbolA SymbolB pW pNI
## 1 100 100 P00813 A5A3E0 ADA POTEF 6.881844e-10 0.0001176357
## 2 222389 222389 Q8N7W2-2 P26373 BEND7 RPL13 1.340380e-18 0.2256644741
## 3 222389 222389 Q8N7W2-2 Q09028-3 BEND7 RBBP4 7.221401e-21 0.0000641669
## 4 222389 222389 Q8N7W2-2 Q9Y3U8 BEND7 RPL36 7.058372e-17 0.1281827343
## 5 222389 222389 Q8N7W2-2 P36578 BEND7 RPL4 1.632313e-22 0.2006379109
## 6 222389 222389 Q8N7W2-2 P23396 BEND7 RPS3 3.986270e-26 0.0010264311
## pInt
## 1 0.9998824
## 2 0.7743355
## 3 0.9999358
## 4 0.8718173
## 5 0.7993621
## 6 0.9989736
```

… and turn into a graph:

`gr.293t.3 <- BioPlex::bioplex2graph(bp.293t.3)`

Obtain the complete set of human protein complexes from CORUM …

`corum.df <- BioPlex::getCorum(set = "core", organism = "Human")`

`## Using cached version from 2022-08-18 13:43:49`

… and turn into a `list`

:

```
corum.list <- BioPlex::corum2list(corum.df)
head(corum.list)
```

```
## $`CORUM1_BCL6-HDAC4_complex`
## [1] "P41182" "P56524"
##
## $`CORUM2_BCL6-HDAC5_complex`
## [1] "P41182" "Q9UQL6"
##
## $`CORUM3_BCL6-HDAC7_complex`
## [1] "P41182" "Q8WUI4"
##
## $CORUM4_Multisubunit_ACTR_coactivator_complex
## [1] "Q09472" "Q92793" "Q92831" "Q9Y6Q9"
##
## $`CORUM11_BLOC-3_(biogenesis_of_lysosome-related_organelles_complex_3)`
## [1] "Q92902" "Q9NQG7"
##
## $`CORUM12_BLOC-2_(biogenesis_of_lysosome-related_organelles_complex_2)`
## [1] "Q86YV9" "Q969F9" "Q9UPZ3"
```

Identify complexes with at least three subunits:

```
## has.subunits
## FALSE TRUE
## 787 1630
```

Identify complexes with at least one subunit targeted as bait:

```
hasBait <- function(s, df) any(s %in% df$UniprotA)
has.bait <- vapply(corum.list, hasBait, logical(1), df = bp.293t.3)
table(has.bait)
```

```
## has.bait
## FALSE TRUE
## 724 1693
```

Identify complexes with at least one PPI between subunits:

```
hasEdge <- function(s, gr)
{
s <- intersect(s, graph::nodes(gr))
graph::numEdges(graph::subGraph(s, gr)) > 0
}
has.edge <- vapply(corum.list, hasEdge, logical(1), gr = gr.293t.3)
table(has.edge)
```

```
## has.edge
## FALSE TRUE
## 1232 1185
```

The bioplexpy package implements a function for testing overlaps of PPIs with a complex of interest based on random sampling.

The function samples random subnetworks from the given PPI network, matching the number of subunits and the bait:prey ratio of the complex being tested. We then count the number of interactions in each replication, and compare against the observed number of interactions overlapping with the complex.

Here, we set up a python environment via basilisk that contains the bioplexpy package in order to invoke the function from within R, facilitating direct exchange of data and results between the BioPlex R and BioPlex Python packages:

```
bp.env <- basilisk::BasiliskEnvironment(envname="bp.env",
pkgname = "BioPlexAnalysis",
packages = "pandas==0.25.1",
pip = "bioplexpy==1.0.0")
proc <- basilisk::basiliskStart(bp.env)
bp <- reticulate::import("bioplexpy")
```

Here, we use a functionality from the BioPlex Python package to turn the BioPlex data into a NetworkX graph:

```
gr <- bp$bioplex2graph(bp.293t.3)
gr
```

`## <networkx.classes.digraph.DiGraph object at 0x7f456f2d6e10>`

We then subset the CORUM complexes to those having (i) at least three subunits, (ii) at least one subunit targeted as bait, and (iii) at least one PPI between subunits.

```
corum <- bp$getCorum(complex_set = "core", organism = "Human")
ind <- has.subunits & has.bait & has.edge
table(ind)
```

```
## ind
## FALSE TRUE
## 1477 940
```

`sub.corum <- corum[ind,] `

Now we carry out the overlap permutation test as implemented in the
`bioplexpy`

package for all complexes passing the filter
criteria:

```
ps <- vapply(sub.corum$ComplexID,
function(i) bp$permutation_test_for_CORUM_complex(gr, sub.corum, i, 100),
numeric(1))
summary(ps)
```

```
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.009901 0.009901 0.009901 0.012039 0.009901 0.178218
```

We stop the process when ready with the analysis.

`basilisk::basiliskStop(proc)`

We can now take a peak at the p-value distribution from the overlap permutation test, observing a strong concentration near zero, confirming that, as expected, many complexes are enriched for PPIs - when compared to random subsets of the overall PPIs network matching the complexes in size and composition.

`hist(ps, breaks = 20, xlab = "p-value", ylab = "frequency", col = "#00AFBB")`