Setup

Data retrieval

Get the latest version of the 293T PPI network …

bp.293t.3 <- BioPlex::getBioPlex(cell.line = "293T", version = "3.0")
## Using cached version from 2022-08-18 13:22:31
head(bp.293t.3)
##    GeneA  GeneB UniprotA UniprotB SymbolA SymbolB           pW          pNI
## 1    100    100   P00813   A5A3E0     ADA   POTEF 6.881844e-10 0.0001176357
## 2 222389 222389 Q8N7W2-2   P26373   BEND7   RPL13 1.340380e-18 0.2256644741
## 3 222389 222389 Q8N7W2-2 Q09028-3   BEND7   RBBP4 7.221401e-21 0.0000641669
## 4 222389 222389 Q8N7W2-2   Q9Y3U8   BEND7   RPL36 7.058372e-17 0.1281827343
## 5 222389 222389 Q8N7W2-2   P36578   BEND7    RPL4 1.632313e-22 0.2006379109
## 6 222389 222389 Q8N7W2-2   P23396   BEND7    RPS3 3.986270e-26 0.0010264311
##        pInt
## 1 0.9998824
## 2 0.7743355
## 3 0.9999358
## 4 0.8718173
## 5 0.7993621
## 6 0.9989736

… and turn into a graph:

gr.293t.3 <- BioPlex::bioplex2graph(bp.293t.3)

Obtain the complete set of human protein complexes from CORUM …

corum.df <- BioPlex::getCorum(set = "core", organism = "Human")
## Using cached version from 2022-08-18 13:43:49

… and turn into a list:

corum.list <- BioPlex::corum2list(corum.df)
head(corum.list)
## $`CORUM1_BCL6-HDAC4_complex`
## [1] "P41182" "P56524"
## 
## $`CORUM2_BCL6-HDAC5_complex`
## [1] "P41182" "Q9UQL6"
## 
## $`CORUM3_BCL6-HDAC7_complex`
## [1] "P41182" "Q8WUI4"
## 
## $CORUM4_Multisubunit_ACTR_coactivator_complex
## [1] "Q09472" "Q92793" "Q92831" "Q9Y6Q9"
## 
## $`CORUM11_BLOC-3_(biogenesis_of_lysosome-related_organelles_complex_3)`
## [1] "Q92902" "Q9NQG7"
## 
## $`CORUM12_BLOC-2_(biogenesis_of_lysosome-related_organelles_complex_2)`
## [1] "Q86YV9" "Q969F9" "Q9UPZ3"

Identify complexes with at least three subunits:

has.subunits <- lengths(corum.list) > 2
table(has.subunits)
## has.subunits
## FALSE  TRUE 
##   787  1630

Identify complexes with at least one subunit targeted as bait:

hasBait <- function(s, df) any(s %in% df$UniprotA)
has.bait <- vapply(corum.list, hasBait, logical(1), df = bp.293t.3)
table(has.bait)
## has.bait
## FALSE  TRUE 
##   724  1693

Identify complexes with at least one PPI between subunits:

hasEdge <- function(s, gr)
{
    s <- intersect(s, graph::nodes(gr))
    graph::numEdges(graph::subGraph(s, gr)) > 0
}
has.edge <- vapply(corum.list, hasEdge, logical(1), gr = gr.293t.3)
table(has.edge)
## has.edge
## FALSE  TRUE 
##  1232  1185

Overlap analysis with protein complexes

The bioplexpy package implements a function for testing overlaps of PPIs with a complex of interest based on random sampling.

The function samples random subnetworks from the given PPI network, matching the number of subunits and the bait:prey ratio of the complex being tested. We then count the number of interactions in each replication, and compare against the observed number of interactions overlapping with the complex.

Here, we set up a python environment via basilisk that contains the bioplexpy package in order to invoke the function from within R, facilitating direct exchange of data and results between the BioPlex R and BioPlex Python packages:

bp.env <- basilisk::BasiliskEnvironment(envname="bp.env",
                                        pkgname = "BioPlexAnalysis",
                                        packages = "pandas==0.25.1",
                                        pip = "bioplexpy==1.0.0")
proc <- basilisk::basiliskStart(bp.env)
bp <- reticulate::import("bioplexpy")

Here, we use a functionality from the BioPlex Python package to turn the BioPlex data into a NetworkX graph:

gr <- bp$bioplex2graph(bp.293t.3)
gr
## <networkx.classes.digraph.DiGraph object at 0x7f456f2d6e10>

We then subset the CORUM complexes to those having (i) at least three subunits, (ii) at least one subunit targeted as bait, and (iii) at least one PPI between subunits.

corum <- bp$getCorum(complex_set = "core", organism = "Human")
ind <- has.subunits & has.bait & has.edge
table(ind)
## ind
## FALSE  TRUE 
##  1477   940
sub.corum <- corum[ind,] 

Now we carry out the overlap permutation test as implemented in the bioplexpy package for all complexes passing the filter criteria:

ps <- vapply(sub.corum$ComplexID,
             function(i) bp$permutation_test_for_CORUM_complex(gr, sub.corum, i, 100),
             numeric(1))
summary(ps)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## 0.009901 0.009901 0.009901 0.012039 0.009901 0.178218

We stop the process when ready with the analysis.

basilisk::basiliskStop(proc)

Visualization

We can now take a peak at the p-value distribution from the overlap permutation test, observing a strong concentration near zero, confirming that, as expected, many complexes are enriched for PPIs - when compared to random subsets of the overall PPIs network matching the complexes in size and composition.

hist(ps, breaks = 20, xlab = "p-value", ylab = "frequency", col = "#00AFBB")