Document Type : Original Research Paper


Department of Computer Systems Architecture, Faculty of Computer Engineering, K. N. Toosi University of Technology, Tehran, Iran.


Background and Objectives: Graph processing is increasingly gaining attention during era of big data. However, graph processing applications are highly memory intensive due to nature of graphs. Processing-in-memory (PIM) is an old idea which revisited recently with the advent of technology specifically the ability to manufacture 3D stacked chipsets. PIM puts forward to enrich memory units with computational capabilities to reduce the cost of data movement between processor and memory system.
This approach seems to be a way of dealing with large-scale graph processing, considering recent advances in the field.
Methods: This paper explores real-world PIM technology to improve graph processing efficiency by reducing irregular access patterns and improving temporal locality using HMC.
We propose NodeFetch, a new method to access nodes and their neighbors while processing a graph by adding a new command to HMC system.
Results: Results of our simulation on a set of real-world graphs point out that the proposed idea can achieve 3.3x speed up in average and 69% reduction of energy consumption over the baseline PIM architecture which is HMC.
Conclusion: Most of the techniques in the field of processing-in-memory, hire methods to reduce movement of data between processor and memory. This paper proposes a method to reduce graph processing execution time and energy consumption by reducing cache misses while processing a graph.

©2021 The author(s). This is an open access article distributed under the terms of the Creative Commons Attribution (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, as long as the original authors and source are cited. No permission is required from the authors or the publishers.


Main Subjects

[1] X. Chen, "GraphCage: Cache Aware Graph Processing on GPUs," arXiv preprint arXiv:1904.02241, 2019.

[2] J.E. Gonzalez, Y. Low, H. Gu, D. Bickson, C. Guestrin, "Powergraph: Distributed graph-parallel computation on natural graphs," in Proc. the 10th Symposium on Operating Systems Design and Implementation (OSDI): 17-30, 2012.

[3] A. Fidel, N.M. Amato, L. Rauchwerger, "Kla: A new algorithmic paradigm for parallel graph computations," in Proc. 23rd International Conference on Parallel Architecture and Compilation Techniques (PACT): 27-38, 2014.

[4] S. Hong, H. Chafi, E. Sedlar, K. Olukotun, "Green-Marl: a DSL for easy and efficient graph analysis," in Proc. Seventeenth International Conference on Architectural Support for Programming Languages and Operating Systems: 349-362, 2012.

[5] T.J. Ham, L. Wu, N. Sundaram, N. Satish, M. Martonosi, "Graphicionado: A high-performance and energy-efficient accelerator for graph analytics," in Proc. 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO): 1-13, 2016.

[6] S. Ghose, K. Hsieh, A. Boroumand, R. Ausavarungnirun, O. Mutlu, "Enabling the adoption of processing-in-memory: Challenges, mechanisms, future research directions," arXiv preprint arXiv:1802.00320, 2018.

[7] M.A. Mosayebi, A.M. Hasani, M. Dehyadegari, "Enhanced graph processing in PIM accelerators with improved queue management," Microelectron. J., 94: 104637, 2019.

[8] M. Zhang et al., "GraphP: Reducing communication for PIM-based graph processing with efficient data partition," in Proc. 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA): 544-557, 2018.

[9] G. Dai et al., "Graphh: A processing-in-memory architecture for large-scale graph processing," IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 38(4): 640-653, 2018.

[10] L. Nai, R. Hadidi, J. Sim, H. Kim, P. Kumar, H. Kim, "Graphpim: enabling instruction-level pim offloading in graph computing frameworks," in Proc. 2017 IEEE International symposium on high performance computer architecture (HPCA): 457-468, 2017.

[11] H.M.C. Specification, "2.1, Nov. 2015, Hybrid Memory Cube Consortium," Tech. Rep.

[12] B. Soltani Farani, H. Dorosti, M. Salehi, S. M. Fakhraie, "Ultra-low-energy dsp processor design for many-core parallel applications," JECEI,” J. Electr. Comput. Eng. Innovations (JECEI), 8(1): 71-84, 2019.

[13] L. Song, Y. Zhuo, X. Qian, H. Li, Y. Chen, "GraphR: Accelerating graph processing using ReRAM," in Proc. 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA): 531-543, 2018.

[14] G. Kim, J. Kim, J. H. Ahn, J. Kim, "Memory-centric system interconnect design with hybrid memory cubes," in Proc. the 22nd international conference on Parallel architectures and compilation techniques: 145-155, 2013.

[15] J. Kim, W. Dally, S. Scott, D. Abts, "Cost-efficient dragonfly topology for large-scale systems," IEEE micro, 29(1): 33-40, 2009.

[16] J. Kim, W. J. Dally, D. Abts, "Flattened butterfly: a cost-efficient topology for high-radix networks," in Proc. 34th Annual International Symposium on Computer Architecture: 126-137, 2007.

[17] J. Ahn, S. Hong, S. Yoo, O. Mutlu, K. Choi, "A scalable processing-in-memory accelerator for parallel graph processing," in Proc. 42nd Annual International Symposium on Computer Architecture:  105-117, 2015.

[18] D.-I. Jeon, K.-B. Park, K.-S. Chung, "HMC-MAC: Processing-in memory architecture for multiply-accumulate operations with hybrid memory cube," IEEE Comput. Archit. Lett., 17(1): 5-8, 2017.

[19] A. Addisie, V. Bertacco, "Centaur: Hybrid processing in on/off-chip memory architecture for graph analytics," in Proc. 57th ACM/IEEE Design Automation Conference (DAC): 1-6, 2020.

[20] S. Beamer, K. Asanović, D. Patterson, "The GAP benchmark suite," arXiv preprint arXiv:1508.03619, 2015.

[21] M. Ahmad, F. Hijaz, Q. Shi, O. Khan, "Crono: A benchmark suite for multithreaded graph algorithms executing on futuristic multicores," in Proc. EEE International Symposium on Workload Characterization: 44-55, 2015.

[22] J. Leskovec, A. Krevl, "SNAP Datasets: Stanford large network dataset collection," ed, 2014.

[23] Y. Eckert, N. Jayasena, and G. H. Loh, "Thermal feasibility of die-stacked processing in memory," 2014.


Journal of Electrical and Computer Engineering Innovations (JECEI) welcomes letters to the editor for the post-publication discussions and corrections which allows debate post publication on its site, through the Letters to Editor. Letters pertaining to manuscript published in JECEI should be sent to the editorial office of JECEI within three months of either online publication or before printed publication, except for critiques of original research. Following points are to be considering before sending the letters (comments) to the editor.

[1] Letters that include statements of statistics, facts, research, or theories should include appropriate references, although more than three are discouraged.

[2] Letters that are personal attacks on an author rather than thoughtful criticism of the author’s ideas will not be considered for publication.

[3] Letters can be no more than 300 words in length.

[4] Letter writers should include a statement at the beginning of the letter stating that it is being submitted either for publication or not.

[5] Anonymous letters will not be considered.

[6] Letter writers must include their city and state of residence or work.

[7] Letters will be edited for clarity and length.