Document Type: Original Research Paper

Authors

1 Faculty of Computer Engineering , Shahid Rajaee Teacher Training University,Tehran, Iran

2 Faculty of Computer Engineering, Shahid Rajaee Teacher Training University, Tehran, Iran

10.22061/jecei.2020.7135.359

Abstract

Background: The link prediction issue is one of the most widely used problems in complex network analysis. Link prediction requires knowing the background of previous link connections and combining them with available information. The link prediction local approaches with node structure objectives are fast in case of speed but are not accurate enough. On the other hand, the global link prediction methods identify all path structures in a network and can determine the similarity degree between graph-extracted entities with high accuracy but are time-consuming instead. Most existing algorithms are only using one type of feature (global or local) to represent data, which not well described due to the large scale and heterogeneity of complex networks.
Methods: In this paper, a new method presented for Link Prediction using node embedding due to the high dimensions of real-world networks. The proposed method extracts a smaller model of the input network by getting help from the deep neural network and combining global and local nodes in a way to preserve the network's information and features to the desired extent. First, the feature vector is being extracted by an encoder-decoder for each node, which is a suitable tool for modeling complex nonlinear phenomena. Secondly, both global and local information concurrently used to improve the loss function. More obvious, the clustering similarity threshold considered as the local criterion and the transitive node similarity measure used to exploit the global features. To the end, the accuracy of the link prediction algorithm increased by designing the optimization operation accurately.
Results: The proposed method applied to 4 datasets named Cora, Wikipedia, Blog catalog, Drug-drug-interaction, and the results are compared with laplacian, Node2vec, and GAE methods. Experimental results show an average accuracy achievement of 0.620, 0.723, 0.875, and 0.845 on the mentioned datasets, and confirm that the link prediction can effectively improve the prediction performance using network embedding based on global similarity.

Keywords

Main Subjects

[1] G. Tsoumakas, I. Katakis, “Multi-label classification: An overview,” International Journal of Data Warehousing and Mining (IJDWM), 3(3): 1-13, 2007.

[2] L. Lü, T. Zhou, “Link prediction in complex networks: A survey,” Physica A: statistical mechanics and its applications, 390(6), 2011.

[3] S. Fortunato, Community detection in graphs. Physics reports, 2010.

[4]  L. Lü, M. Medo, C.H. Yeung, Y.C. Zhang, Z.K. Zhang, T. Zhou, Recommender systems. Physics reports, 519(1): 1-49, 2012.

[5] S. Wang, Q.Wang, M. Gong. "Multi-Task  Learning Based Network Embedding," Frontiers in Neuroscience, 2020.

[6] E.Airoldi “Mixed membership block models for relational data with application to protein-protein interactions in Proc. International Biometric Society-ENAR Annual Meetings, 2006.

[7] Z. Huang, X. Li, H. Chen, “Link prediction approach to   collaborative filtering,” in Proc.5th ACM/IEEE-CS Joint Conference on Digital Libraries, NK,: 141–142.

[8] M. Al Hasan, V. Chaoji, S. Salem, M. Zaki, Link prediction using supervised learning. In SDM06: workshop on link analysis, counter-terrorism and security 30: 798-805, 2006.

[9] Z. Huang, D.K. Lin, “The time-series link prediction problem with applications in communication surveillance,” INFORMS Journal on Computing, 21(2): 286-303, 2009.

[10] L. A. Adamic, E. Adar, Friends and neighbors on the web, Social 1networks, 25(3):230 , 2003.  

[11] P. Jaccard, Etude comparative de la distribution florale  dans une   portiondes Alpes et du  Jura, Impr. Corbaz, 1901.

[12]A, Clauset, C. Moore, M.E. Newman, “Hierarchical structure and the prediction of missing links in networks,” Nature, 453: 98-101, 2008.

[13] H. C. White, S. A. Boorman, R. L. Breiger, “Social structure from multiple networks,” i. blockmodels of roles and positions, America journal of sociology780.

[14] N. Friedman, L. Getoor, D. Koller, A. Pfeffer, “Learning probabilistic relational models,” In IJCAI, 99: 1300-1309, 1999.

[15] D. Heckerman, C. Meek, D. Koller, “Probabilistic entity-relationship models, PRMs, and plate models,” Introduction to statistical relational learning, 201-238, 2007.

[16] P. Goyal, E. Ferrara, "Graph embedding techniques, applications, and performance: A survey, “Knowledge-Based Systems, 151:78-94, 2018.

[17] W. L. Hamilton, R. Ying, J. Leskovec, Representation  learning on graphs: Methods and applications, “arXiv preprint,” arXiv:1709.05584, 2017.‏

[18] X. Yue, et al., "Graph embedding on biomedical networks: methods, applications and evaluations," Bioinformatics, 36(4): 1241-1251, 2020.‏

 [19] A. Ahmed, N. Shervashidze, S. Narayanamurthy, V. Josifovski, A. J.  Smola, “Distributed   large-scale natural graph factorization,” in: Proce 22nd international conference on World Wide Web, ACM, :37–48, 2013.

 [20] D. Wang, P. Cui, W. Zhu, “Structural deep network embedding,” in  Proce. 22nd International Conference on Knowledge Discovery and Data Mining, ACM,:1225–12, 2016.

[21] C. Hongyun, V.W. Zheng, K. Chen-Chuan Chang. "A comprehensive survey of graph embedding: Problems, techniques, and applications." IEEE Transactions on Knowledge and Data Engineering 3(2018):1616-1637, 2018.

[22] Ou, Mingdong, et al., "Asymmetric transitivity preserving graph   embedding,” in Proce. 22nd ACM SIGKDD international conference on Knowledge discovery and data mining. 2016.‏

[23] B. Perozzi, R. Al-Rfou, S. Skiena, "August. Deepwalk: "Online Learning Of Social Representations,” In Proce 20th ACM SIGKDD international conference on Knowledge discovery and data mining,: 701-710, 2014. 

[24] A. Grover, J. Leskovec, “node2vec: Scalable feature learning for networks,” in Proce 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, 2016.‏

[25] C. Peng, et al., “A survey on network embedding,” IEEE Transactions on Knowledge and Data Engineering, 31(5): 833-852, 2018.

[26] J. Tang, M. Qu, M. Wang, M. Zhang, J. Yan, Q. Mei, “Line: Large-scale Information Network Embedding,” in Proce. 24th International Conference on World Wide Web,: 1067-1077, 2015.

[27] T.N. Kipf, M. Welling, "Semi-supervised Classification With Graph Convolutional networks" arXiv,: 1-11, 2016.

[28] Y. Wang, Y. Hongxun, S. Zhao, "Auto-encoder based dimensionality reduction," Neurocomputing, 184: 232-242, 2016.

[29] A. Papadimitriou, “Panagiotis Symeonidis, and Yannis  Manolopoulos,” Friendlink: link prediction in social networks via bounded local path traversal,” 2011 International Conference on Computational Aspects of Social Networks (CASoN). IEEE, 2011.

[30] C. Xing, et al., "The application of degree related clustering coefficient in estimating the link predictability and predicting missing links of networks," Chaos: An Interdisciplinary Journal of NonlinearScience, 29(5):053135, 2019.

[31] S. Panagiotis, E. Tiakas, Y. Manoulos, “Transitive node similarity for link prediction in social networks   with positive and negative links,” in Proce. The fourth ACM conference on Recommender systems, 2010.

[32] M. Belkin, P. Niyogi, “Laplacian eigenmaps and spectral techniques for embedding and clustering,” In Advances in neural information processing systems,: 585591, 2002.

[33] T.N. Kipf, M. Welling, “Variational graph auto-encoders,” arXiv preprint arXiv:1611.07308, 2016.

[34] L. Tang, H. Liu, “Relational learning via latent social dimensions,” In Proce 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, :817–826,Paris,France,Jun.2009.

[35] http://www.mattmahoney.net/dc/textdata

[36] T. Yang, R. Jin, Y. Chi, Z. Shenghuo, “Combining link and content for community detection: a discriminative approach,” In 15th ACM SIGKDD International  Conference on Knowledge Discovery and Data Mining,: 927–936,Paris, France, Jun.2009.

[37] D.S. Wishart, Y.D. Feunang, A.C. Guo, et al., “Drugbank 5.0: a major update to the drugbank database for 2018,” Nucleic acids research, 46(D1), D1074–D1082, 2017.