Main Article Content

Abstract

Abstract


Language has an important role in human life. With language, humans can communicate and exchange ideas with each other. However, the diversity of ethnic groups in Indonesia means that Indonesia has a variety of regional languages, therefore regional languages can make it difficult to convey information and communication. This research aims to identify the Kedang language and the Lamaholot language in text form. Identification is carried out to find out the language of each region using computerized technology. This identification uses a classification technique using a method, namely NAIVE BAYES WITH TF-IDF FEATURES. This method is used to identify the language according to the text that has been entered and then calculate the accuracy value. The identified data is 2000 sentences, so it can be seen which methods are effective and can be used to identify language. The research results found that this method was quite effective in identifying Kedang language with an accuracy value of 0.93 or 93%. And Lamaholot language with an accuracy value of 0.93 or 93%. And there are 63 examples of Kedang language and 58 examples of Lamaholot language


Keywords: Regional Languange Classification, TF-IDF, Naïve Baye

Keywords

Keywords: Regional Languange Classification, TF-IDF, Naïve Baye

Article Details

How to Cite
Ladopurab, B., Rahman, A. Y., & Pahlevi, R. (2025). Classification of Local Language Texts in Lembata Regency Using Naïve Bayes with TF-IDF Features. Edutran Computer Science and Information Technology, 3(1), 1-10. https://doi.org/10.59805/ecsit.v3i1.144

References

  1. [1] G. Setiawan and I. Adnyana, “Improving helpdesk chatbot performance with term frequency-inverse document frequency (tf-idf) and cosine similarity models,” Journal of Applied Informatics and Computing, vol. 7, no. 2, pp. 252–257, 2023, doi: https://doi.org/10.30871/jaic.v7i2.6527.
  2. [2] M. Afif, M. Ula, L. Rosnita, and R. Rizal, “Applying tf-idf and k-nn for clickbait detection in indonesian online news headlines,” Jo. Adv. Comp. Know. Algo, vol. 1, no. 2, pp. 38–41, 2024, doi: https://doi.org/10.29103/jacka.v1i2.15810.
  3. [3] “An atn based framework for arabic text analysis,” International Research Journal of Modernization in Engineering Technology and Science, 2024, doi: https://doi.org/10.56726/irjmets64774.
  4. [4] J. Xue, “Research on korean literature corpus processing based on computer system improved tf-idf algorithm,” Intelligent Decision Technologies, vol. 18, no. 4, pp. 3011–3024, 2024, doi: https://doi.org/10.3233/idt-230772.
  5. [5] A. Purpura, D. Giorgianni, G. Orrù, G. Melis, and G. Sartori, “Identifying single-item faked responses in personality tests: a new tf-idf-based method,” PLoS One, vol. 17, no. 8, p. e0272970, 2022, doi: https://doi.org/10.1371/journal.pone.0272970.
  6. [6] L. Xiang, “Application of an improved tf-idf method in literary text classification,” Advances in Multimedia, vol. 2022, pp. 1–10, 2022, doi: https://doi.org/10.1155/2022/9285324.
  7. [7] H. Jadia, “Comparative analysis of sentiment analysis techniques: svm, logistic regression, and tf-idf feature extraction,” International Research Journal of Modernization in Engineering Technology and Science, 2023, doi: https://doi.org/10.56726/irjmets45265.
  8. [8] R. Putranto, M. Purbolaksono, and W. Astuti, “Sentiment analysis of practo application reviews using naïve bayes and tf-idf methods,” Jurnal Media Informatika Budidarma, vol. 7, no. 3, p. 1070, 2023, doi: https://doi.org/10.30865/mib.v7i3.6311.
  9. [9] E. Heikel and L. Espinosa-Leal, “Indoor scene recognition via object detection and tf-idf,” J Imaging, vol. 8, no. 8, p. 209, 2022, doi: https://doi.org/10.3390/jimaging8080209.
  10. [10] J. Abbas, C. Zhang, and B. Luo, “Bet-bilstm model: a robust solution for automated requirements classification,” Journal of Software Evolution and Process, vol. 37, no. 3, 2025, doi: https://doi.org/10.1002/smr.70012.
  11. [11] M. Dhiyaulhaq and P. Gunawan, “Sentiment analysis of the jakarta - bandung fast train project using the svm method,” Jurnal Media Informatika Budidarma, vol. 7, no. 4, p. 2128, 2023, doi: https://doi.org/10.30865/mib.v7i4.6855.
  12. [12] T. Swalar, “Deiksis persona bahasa lamaholot dialek lamaholot tengah,” MAJU, vol. 1, no. 3, pp. 83–93, 2024, doi: https://doi.org/10.62335/t792n796.
  13. [13] Y. Demon, “Morphophonemics in the lamalera dialect of lamaholot,” Randwick International of Education and Linguistics Science Journal, vol. 3, no. 1, pp. 112–127, 2022, doi: https://doi.org/10.47175/rielsj.v3i1.414.
  14. [14] K. Austad and B. W. Jack, “Linguistic and Cultural Competence at Hospital Discharge,” Journal of Healthcare Management Standards, 2023, doi: 10.4018/jhms.330644.
  15. [15] S. V Kusnoor et al., “Design and Implementation of a Massive Open Online Course on Enhancing the Recruitment of Minorities in Clinical Trials – Faster Together,” BMC Med Res Methodol, 2021, doi: 10.1186/s12874-021-01240-x.
  16. [16] L. Cayón and T. C. Chacon, “Diversity, Multilingualism and Inter-Ethnic Relations in the Long-Term History of the Upper Rio Negro Region of the Amazon,” Interface Focus, 2022, doi: 10.1098/rsfs.2022.0050.
  17. [17] L. T. Kim Ha, T. Van Le, L. T. Phan, L. T. Bich Nguyen, and A. T. Van Dam, “Perspectives of Vietnamese Students and Teachers Regarding the Preservation of Languages of Ethnic Minorities,” Revista De Gestão Social E Ambiental, 2024, doi: 10.24857/rgsa.v18n9-026.
  18. [18] S. Budiono and T. Jaya, “Evaluation of Local Language Learning in the Limola Language Revitalization,” Journal of Applied Studies in Language, 2024, doi: 10.31940/jasl.v8i1.20-30.
  19. [19] J. Jupri, A. Aprianoto, and E. Firman, “The Application of Linguistic Landscape in Mataram City Kota Madya Mataram, West Nusa Tenggara Province, Indonesia,” Jurnal Ilmiah Mandala Education, 2022, doi: 10.58258/jime.v8i3.3761.
  20. [20] B. Rivaldo, “Interest of Youth of the Batak Karo Protestant Church (GBKP) Cikarang in Using Regional Language Communication,” Adv, 2024, doi: 10.46799/adv.v2i10.291.
  21. [21] B. Sinclair et al., “Machine Learning Approaches for Imaging‐based Prognostication of the Outcome of Surgery for Mesial Temporal Lobe Epilepsy,” Epilepsia, 2022, doi: 10.1111/epi.17217.
  22. [22] O. Yossofzai et al., “Development and Validation of Machine Learning Models for Prediction of Seizure Outcome After Pediatric Epilepsy Surgery,” Epilepsia, 2022, doi: 10.1111/epi.17320.
  23. [23] G. O. Ghosheh, L. Thwaites, and T. Zhu, “Synthesizing Electronic Health Records for Predictive Models in Low-Middle-Income Countries (LMICs),” Biomedicines, 2023, doi: 10.3390/biomedicines11061749.
  24. [24] R. Muralidhar, M. L. Demory, and M. M. Kesselman, “Exploring the Impact of Batch Size on Deep Learning Artificial Intelligence Models for Malaria Detection,” Cureus, 2024, doi: 10.7759/cureus.60224.
  25. [25] S. T. Arasteh, C. Kühl, M.-J. Saehn, P. Isfort, D. Truhn, and S. Nebelung, “Enhancing Domain Generalization in the AI-based Analysis of Chest Radiographs With Federated Learning,” Sci Rep, 2023, doi: 10.1038/s41598-023-49956-8.
  26. [26] M. Aliyari and Y. Z. Ayele, “Application of Artificial Neural Networks for Power Load Prediction in Critical Infrastructure: A Comparative Case Study,” Applied System Innovation, 2023, doi: 10.3390/asi6060115.
  27. [27] A. B. Nugraha and A. Romadhony, “Identification of 10 Regional Indonesian Languages Using Machine Learning,” Sinkron, 2023, doi: 10.33395/sinkron.v8i4.12989.
  28. [28] D. Farah Zhafira, B. Rahayudi, and P. Korespondensi, “ANALISIS SENTIMEN KEBIJAKAN KAMPUS MERDEKA MENGGUNAKAN NAIVE BAYES DAN PEMBOBOTAN TF-IDF BERDASARKAN KOMENTAR PADA YOUTUBE,” 2021.
  29. [29] G. Mandar, A. H. Muhamamd, and S. Sudin, “Klasifikasi Berita Indonesia Menggunakan Naïve Bayes dengan Porter Stemmer,” Jurnal Teknik Informatika (J-Tifa), vol. 3, no. 2, pp. 17–22, Sep. 2020, doi: 10.52046/j-tifa.v3i2.1121.
  30. [30] L. Mayasari and D. Indarti, “KLASIFIKASI TOPIK TWEET MENGENAI COVID MENGGUNAKAN METODE MULTINOMIAL NAÏVE BAYES DENGAN PEMBOBOTAN TF-IDF,” Jurnal Ilmiah Informatika Komputer, vol. 27, no. 1, pp. 43–53, 2022, doi: 10.35760/ik.2022.v27i1.6184.
  31. [31] N. Abinaya, P. Jayadharshini, S. Priyanka, S. Keerthika, and S. Santhiya, “Identification of Language From Multi-Lingual Dataset Using Classification Algorithms,” J Phys Conf Ser, 2023, doi: 10.1088/1742-6596/2664/1/012009.