(A) Construction of a PI network based on protein copurification and detection by mass spectrometry. The confidence scoring of the LCMS and MALDI networks was conducted using a logistic regression with datasets consisting of PI from low-throughput studies curated in DIP, BIND, and IntAct (gold positives) and proteins in different subcellular localizations (gold negatives). The two networks were integrated using a probabilistic model [61] (Protocol S6). The resulting PI network, with edge weights corresponding to likelihood ratios, was clustered using MCL to delimit “multiprotein complexes.”
(B) Integration of four GC methods into a single functional interaction network using the same probabilistic model [61] and resulting scores (edge weights) were input to MCL to delimit “functional modules.”
(C) Orphan function prediction was conducted using a “guilt-by-association” procedure. After integration of PI and GC interactions into a single probabilistic network [61], a machine learning algorithm (StepPLR) newly developed for this study was used to assign functions based on the binary associations of orphans with annotated proteins, the respective interaction edge weights, and the overall network topology. Correlations between vectors of these function predictions (orphans), and the annotations were then used as input to delimit “functional neighborhoods” by clustering using MCL.

[from  P. Hu, S. C. Janga, M. Babu, J. J. D’iaz-Mej’ia, G. Butland, W. Yang, O. Pogoutse, X. Guo, S. Phanse, P. Wong, S. Chandran, C. Christopoulos, A. Nazarians-Armavil, N. K. Nasseri, G. Musso, M. Ali, N. Nazemof, V. Eroukova, A. Golshani, A. Paccanaro, J. F. Greenblatt, G. Moreno-Hagelsieb, and A. Emili Global functional atlas of Escherichia coli encompassing previously uncharacterized proteins PLoS biology, vol. 7, iss. 4, p. 1000096, 2009].