In recent years, social media has seen exponential growth, becoming a primary source for daily news updates and a platform for individuals to discuss various issues and express personal opinions. These contributions can assist policymakers in making informed decisions. However, processing the vast amounts of textual data generated on these platforms can be time-consuming and costly. Therefore, efficient and accurate machine learning techniques are necessary to handle this data. While there is extensive research on highresource languages, studies on low-resource languages remain limited. These languages are less common and often possess complex phonetic structures, making research more challenging. Current Transformer models have shown promising results in stance detection tasks. However, these models face limitations regarding maximum token length, especially with variant models. Recognizing this issue, I have conducted research and proposed methods that integrate Transformers with text summarization techniques to detect stances in Vietnamese, a low-resource language. The experimental results demonstrate that the CafeBERT model, combined with Py-rouge for extractive summarization, achieves an accuracy of 77.44%, outperforming other models such as VisoBERT and PhoBERT. These findings highlight the potential of incorporating text summarization techniques to enhance the training and performance of text classification models.
Readership Map
Content Distribution
In recent years, social media has seen exponential growth, becoming a primary source for daily news updates and a platform for individuals to discuss various issues and express personal opinions. These contributions can assist policymakers in making informed decisions. However, processing the vast amounts of textual data generated on these platforms can be time-consuming and costly. Therefore, efficient and accurate machine learning techniques are necessary to handle this data. While there is extensive research on highresource languages, studies on low-resource languages remain limited. These languages are less common and often possess complex phonetic structures, making research more challenging. Current Transformer models have shown promising results in stance detection tasks. However, these models face limitations regarding maximum token length, especially with variant models. Recognizing this issue, I have conducted research and proposed methods that integrate Transformers with text summarization techniques to detect stances in Vietnamese, a low-resource language. The experimental results demonstrate that the CafeBERT model, combined with Py-rouge for extractive summarization, achieves an accuracy of 77.44%, outperforming other models such as VisoBERT and PhoBERT. These findings highlight the potential of incorporating text summarization techniques to enhance the training and performance of text classification models.