Document Type : Original Research Paper


1 University of Waterloo, Waterloo, Canada and Chabahar Maritime University, Chabahar, Iran.

2 Systems Design Engineering Department, University of Waterloo, Canada.


Background and Objectives: Signage is everywhere, and a robot should be able to take advantage of signs to help it localize (including Visual Place Recognition (VPR)) and map. Robust text detection & recognition in the wild is challenging due to pose, irregular text instances, illumination variations, viewpoint changes, and occlusion factors.
Methods: This paper proposes an end-to-end scene text spotting model that simultaneously outputs the text string and bounding boxes. The proposed model leverages a pre-trained Vision Transformer based (ViT) architecture combined with a multi-task transformer-based text detector more suitable for the VPR task. Our central contribution is introducing an end-to-end scene text spotting framework to adequately capture the irregular and occluded text regions in different challenging places. We first equip the ViT backbone using a masked autoencoder (MAE) to capture partially occluded characters to address the occlusion problem. Then, we use a multi-task prediction head for the proposed model to handle arbitrary shapes of text instances with polygon bounding boxes.
Results: The evaluation of the proposed architecture's performance for VPR involved conducting several experiments on the challenging Self-Collected Text Place (SCTP) benchmark dataset. The well-known evaluation metric, Precision-Recall, was employed to measure the performance of the proposed pipeline. The final model achieved the following performances, Recall = 0.93 and Precision = 0.8, upon testing on this benchmark.
Conclusion: The initial experimental results show that the proposed model outperforms the state-of-the-art (SOTA) methods in comparison to the SCTP dataset, which confirms the robustness of the proposed end-to-end scene text detection and recognition model.


Main Subjects

Open Access

This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit:


Publisher’s Note

JECEI Publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.



Shahid Rajaee Teacher Training University



Journal of Electrical and Computer Engineering Innovations (JECEI) welcomes letters to the editor for the post-publication discussions and corrections which allows debate post publication on its site, through the Letters to Editor. Letters pertaining to manuscript published in JECEI should be sent to the editorial office of JECEI within three months of either online publication or before printed publication, except for critiques of original research. Following points are to be considering before sending the letters (comments) to the editor.

[1] Letters that include statements of statistics, facts, research, or theories should include appropriate references, although more than three are discouraged.

[2] Letters that are personal attacks on an author rather than thoughtful criticism of the author’s ideas will not be considered for publication.

[3] Letters can be no more than 300 words in length.

[4] Letter writers should include a statement at the beginning of the letter stating that it is being submitted either for publication or not.

[5] Anonymous letters will not be considered.

[6] Letter writers must include their city and state of residence or work.

[7] Letters will be edited for clarity and length.