Document Type : Original Research Paper

Authors

1 Department of Electrical and Computer Engineering, University of Birjand, Birjand, Iran.

2 Omid Cancer Center, Ahvaz, Iran.

3 Department of Internal Medicine, School of Medicine, Cellular and Molecular Research Center, Valiasr Hospital, Birjand University of Medical Sciences, Birjand, Iran.

Abstract

Background and Objectives: Understanding the heterogeneity of breast cancer is crucial for improving treatment strategies. This study investigates the application of K-Means and Hierarchical Clustering to a local dataset of breast cancer patients from Iranmehr Hospital, Birjand, Iran, with the primary goal of identifying potential patient subgroups based on their clinical and treatment characteristics for knowledge discovery. The potential of these subgroups to inform future research on personalized treatment approaches is explored.
Methods: A retrospective dataset comprising pathological and clinical information was analyzed using K-Means and Agglomerative Hierarchical Clustering to identify patient subgroups. The optimal number of clusters was consistently determined to be two (k=2) for both methods based on rigorous internal validation metrics (Elbow Method, Silhouette Analysis, Calinski-Harabasz Index, and Largest Jump Analysis for Hierarchical Clustering). Statistical tests (ANOVA and Chi-squared) were employed to assess significant differences in features across the identified clusters from both K-Means and Hierarchical analyses, providing insights into the key factors differentiating these groups. Internal cluster validity was assessed using Silhouette Score and Calinski-Harabasz Index.
Results: The K-Means analysis identified two clusters exhibiting significant differences in characteristics such as age, chemotherapy session intensity, menopausal status, nodal involvement, and biomarker expression (ER, PR, HER2, Ki67). The Hierarchical Clustering also yielded two clusters with varying characteristics, and a comparison between the two methods highlighted both similarities and differences in the identified patient stratifications. The overall agreement between K-Means and Hierarchical Clustering was quantified by an Adjusted Rand Index (ARI) of 0.4697.
Conclusion: Both K-Means and Hierarchical Clustering effectively revealed potential patient subgroups within the studied dataset, highlighting the heterogeneity of breast cancer presentation and treatment at a local level These clusters exhibited statistically significant differences across key clinical and treatment features. Future research is needed to validate these findings in larger, multi-center studies, explore the clinical significance of these subgroups in terms of treatment outcomes, and compare the effectiveness of different clustering methodologies for this purpose.

Keywords

Main Subjects

Open Access

This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit: http://creativecommons.org/licenses/by/4.0/

 

Publisher’s Note

JECEI Publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

 

Publisher

Shahid Rajaee Teacher Training University


LETTERS TO EDITOR

Journal of Electrical and Computer Engineering Innovations (JECEI) welcomes letters to the editor for the post-publication discussions and corrections which allows debate post publication on its site, through the Letters to Editor. Letters pertaining to manuscript published in JECEI should be sent to the editorial office of JECEI within three months of either online publication or before printed publication, except for critiques of original research. Following points are to be considering before sending the letters (comments) to the editor.


[1] Letters that include statements of statistics, facts, research, or theories should include appropriate references, although more than three are discouraged.

[2] Letters that are personal attacks on an author rather than thoughtful criticism of the author’s ideas will not be considered for publication.

[3] Letters can be no more than 300 words in length.

[4] Letter writers should include a statement at the beginning of the letter stating that it is being submitted either for publication or not.

[5] Anonymous letters will not be considered.

[6] Letter writers must include their city and state of residence or work.

[7] Letters will be edited for clarity and length.

CAPTCHA Image