Background and Objectives: Discourse coherence modeling evaluation becomes a critical but challenging task for all content analysis tasks in Natural Language Processing subfields, such as text summarization, question answering, text generation and machine translation. Existing methods like entity-based and graph-based models are engaging in semantic and linguistic concepts of a text. It means that the problem cannot be solved very well and these methods are only very limited to available word co-occurrence information in the sequential sentences within a short part of a text. One of the greatest challenges of the above methods is their limitation in long documents coherence evaluation and being suitable for documents with low number of sentences.
Methods: Our proposed method focuses on both local and global coherence. It can also assess the local topic integrity of text at the paragraph level regardless of word meaning and handcrafted rules. The global coherence in the proposed method is evaluated by sequence paragraph dependency. According to the derived results in word embeddings, by applying statistical approaches, the presented method incorporates the external word correlation knowledge into short and long stories to assess both local and global coherence, simultaneously.
Results: Using the effect of combined word2vec vectors and most likely n-grams, we show that our proposed method is independent of the language and its semantic concepts. The derived results indicate that the proposed method offers the higher accuracy with respect to the other algorithms, in long documents with a high number of sentences.
Conclusion: Our current study, comparing our proposed method with BGSEG method showed that the mean degree of coherence evaluation 1.19 percent improvement. The results in this study also indicate improvement results are much more in larger texts with more sentences.