UPSI Digital Repository (UDRep)
Start | FAQ | About
Menu Icon

QR Code Link :

Type :article
Subject :P Language and Literature
Main Author :Karthik Puranik
Title :Attentive fine-tuning of transformers for translation of low-resourced languages @LoResMT 2021
Place of Production :Tanjung Malim
Publisher :Fakulti Bahasa dan Komunikasi
Year of Publication :2021
Notes :Proceedings of the 4th Workshop on Technologies for Machine Translation of Low-Resource Languages, LoResMT 2021
Corporate Name :Universiti Pendidikan Sultan Idris
HTTP Link :Click to view web link

Abstract : Universiti Pendidikan Sultan Idris
This paper reports the Machine Translation (MT) systems submitted by the IIITT team for the English?Marathi and English-Irish language pairs LoResMT 2021 shared task. The task focuses on getting exceptional translations for rather low-resourced languages like Irish and Marathi. We fine-tune IndicTrans, a pretrained multilingual NMT model for English?Marathi, using external parallel corpus as input for additional training. We have used a pretrained Helsinki-NLP Opus MT English-Irish model for the latter language pair. Our approaches yield relatively promising results on the BLEU metrics. Under the team name IIITT, our systems ranked 1, 1, and 2 in English?Marathi, Irish?English, and English?Irish respectively. The codes for our systems are published. ? 2021 Proceedings of the 4th Workshop on Technologies for Machine Translation of Low-Resource Languages, LoResMT 2021. All rights reserved.

References

Achchuthan, Y., & Sarveswaran, K. (2016). Language localisation of tamil using statistical machine translation. Paper presented at the 15th International Conference on Advances in ICT for Emerging Regions, ICTer 2015 - Conference Proceedings, 125-129. doi:10.1109/ICTER.2015.7377677 Retrieved from www.scopus.com

Adi Narayana Reddy, K., Shyam Chandra Prasad, G., Rajashekar Reddy, A., Naveen Kumar, L., & Kannaiah. (2021). English-marathi neural machine translation using local attention doi:10.1007/978-981-16-0401-0_21 Retrieved from www.scopus.com

Aharoni, R., Johnson, M., & Firat, O. (2019). Massively multilingual neural machine translation. Paper presented at the NAACL HLT 2019 - 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Proceedings of the Conference, , 1 3874-3884. Retrieved from www.scopus.com

Ananiadou, S., McNaught, J., & Thompson, P. (2012). The croatian language in the digital age. META-NET White Paper Series, Retrieved from www.scopus.com

Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. Neural Machine Translation by Jointly Learning to Align and Translate, Retrieved from www.scopus.com

Bojar, O., Chatterjee, R., Federmann, C., Graham, Y., Haddow, B., Huck, M., . . . Zampieri, M. (2016). Findings of the 2016 conference on machine translation (WMT16). Paper presented at the Proceedings of the Annual Meeting of the Association for Computational Linguistics, , 2 131-198. Retrieved from www.scopus.com

Chakravarthi, B. R. (2020). Leveraging Orthographic Information to Improve Machine Translation of Under-Resourced Languages, Retrieved from www.scopus.com

Chakravarthi, B. R., Arcan, M., & McCrae, J. P. (2019). Comparison of different orthographies for machine translation of under-resourced dravidian languages. 2nd Conference on Language, Data and Knowledge (LDK 2019), Retrieved from www.scopus.com

Chakravarthi, B. R., Arcan, M., & McCrae, J. P. (2018). Improving wordnets for under-resourced languages using machine translation. Paper presented at the GWC 2018 - 9th Global WordNet Conference, , 2018-January Retrieved from www.scopus.com

Chakravarthi, B. R., Arcan, M., & McCrae, J. P. (2019). WordNet gloss translation for under-resourced languages using multilingual neural machine translation. Proceedings of the Second Workshop on Multilingualism at the Intersection of Knowledge Bases and Machine Translation, , 1-7. Retrieved from www.scopus.com

Chakravarthi, B. R., Priyadharshini, R., Stearns, B., Jayapal, A., S, S., Arcan, M., . . . McCrae, J. P. (2019). Multilingual multimodal machine translation for dravidian languages utilizing phonetic transcription. Proceedings of the 2nd Workshop on Technologies for MT of Low Resource Languages, , 56-63. Retrieved from www.scopus.com

Chakravarthi, B. R., Rani, P., Arcan, M., & McCrae, J. P. (2021). A survey of orthographic information in machine translation. SN Computer Science, 2(4) doi:10.1007/s42979-021-00723-4

Cheng, Y., Xu, W., He, Z., He, W., Wu, H., Sun, M., & Liu, Y. (2016). Semi-supervised learning for neural machine translation. Paper presented at the 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016 - Long Papers, , 4 1965-1974. doi:10.18653/v1/p16-1185 Retrieved from www.scopus.com

Christodouloupoulos, C., & Steedman, M. (2014). A massively parallel corpus: The bible in 100 languages. Language Resources and Evaluation, 49, 1-21. Retrieved from www.scopus.com

Christodouloupoulos, C., & Steedman, M. (2015). A massively parallel corpus: The bible in 100 languages. Language Resources and Evaluation, 49(2), 375-395. doi:10.1007/s10579-014-9287-y

Dhonnchadha, E., Judge, J., Chasaide, A., Dhubhda, R., & Scannell, K. (2012). The Irish Language in the Digital Age: An Ghaeilge Sa ré Dhigiteach, Retrieved from www.scopus.com

Dowling, M., Castilho, S., Moorkens, J., Lynn, T., & Way, A. (2020). A Human Evaluation of Englishirish Statistical and Neural Machine Translation, Retrieved from www.scopus.com

Dowling, M., Castilho, S., Moorkens, J., Lynn, T., & Way, A. (2020). A human evaluation of english-irish statistical and neural machine translation. Paper presented at the Proceedings of the 22nd Annual Conference of the European Association for Machine Translation, EAMT 2020, 431-440. Retrieved from www.scopus.com

Dowling, M., Judge, J., Lynn, T., & Graham, Y. (2016). English to Irish Machine Translation with Automatic Post-Editing, Retrieved from www.scopus.com

Fan, A., Bhosale, S., Schwenk, H., Ma, Z., El-Kishky, A., Goyal, S., . . . Joulin, A. (2020). Beyond english-centric multilingual machine translation. Beyond English-Centric Multilingual Machine Translation, Retrieved from www.scopus.com

Feng, F., Yang, Y., Cer, D., Arivazhagan, N., & Wang, W. (2020). Language-Agnostic BERT Sentence Embedding, Retrieved from www.scopus.com

Garje, G. V., Gupta, A., Desai, A., Mehta, N., & Ravetkar, A. (2014). Marathi to english machine translation for simple sentences. International Journal of Science and Research (IJSR) ISSN (Online), 3, 3166-3168. Retrieved from www.scopus.com

Ha, T. -., Niehues, J., & Waibel, A. (2016). Toward multilingual neural machine translation with universal encoder and decoder. Toward Multilingual Neural Machine Translation with Universal Encoder and Decoder, Retrieved from www.scopus.com

Hande, A., Puranik, K., Priyadharshini, R., & Chakravarthi, B. R. (2021). Domain identification of scientific articles using transfer learning and ensembles doi:10.1007/978-3-030-75015-2_9 Retrieved from www.scopus.com

Hande, A., Puranik, K., Priyadharshini, R., Thavareesan, S., & Chakravarthi, B. R. (2021). Evaluating pretrained transformer-based models for COVID-19 fake news detection. Paper presented at the Proceedings - 5th International Conference on Computing Methodologies and Communication, ICCMC 2021, 766-772. doi:10.1109/ICCMC51019.2021.9418446 Retrieved from www.scopus.com

Hegde, S. U., Hande, A., Priyadarshini, R., Thavareesan, S., & Chakravarthi, B. R. (2021). UVCE-IIITT@DravidianLangTech-EACL2021: Tamil troll meme classification: You need to pay more attention. Paper presented at the Proceedings of the 1st Workshop on Speech and Language Technologies for Dravidian Languages, DravidianLangTech 2021 at 16th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2021, 180-186. Retrieved from www.scopus.com

Imankulova, A., Dabre, R., Fujita, A., & Imamura, K. (2019). Exploiting out-of-domain parallel data through multilingual transfer learning for low-resource neural machine translation. Proceedings of Machine Translation Summit XVII Volume 1: Research Track, , 128-139. Retrieved from www.scopus.com

Jada, P. K., Reddy, D. S., Hande, A., Priyadharshini, R., Sakuntharaj, R., & Chakravarthi, B. R. (2021). IIITT at CASE 2021 task 1: Leveraging pretrained language models for multilingual protest detection. Paper presented at the 4th Workshop on Challenges and Applications of Automated Extraction of Socio-Political Events from Text, CASE 2021 - Proceedings, 98-104. Retrieved from www.scopus.com

Jadhav, S. A. (0000). Marathi to English Neural Machine Translation with Near Perfect Corpus and Transformers, Retrieved from www.scopus.com

Jadhav, S. A. (0000). Marathi to English Neural Machine Translation with Near Perfect Corpus and Transformers, Retrieved from www.scopus.com

Johnson, M., Schuster, M., Le, Q. V., Krikun, M., Wu, Y., Chen, Z., . . . Corrado, G. (2016). Google’s multilingual neural machine translation system: Enabling zero-shot translation. Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation, Retrieved from www.scopus.com

Kalchbrenner, N., & Blunsom, P. (2013). Recurrent continuous translation models. Paper presented at the EMNLP 2013 - 2013 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference, 1700-1709. Retrieved from www.scopus.com

Krishnamurthy, P. (2019). Development of telugu-tamil transfer-based machine translation system: An improvization using divergence index. Journal of Intelligent Systems, 28(3), 493-504. doi:10.1515/jisys-2018-0214

Krishnamurthy, P. (2015). Development of telugu-tamil transfer-based machine translation system: With special reference to divergence index. Proceedings of the 1st Deep Machine Translation Workshop, , 48-54. Retrieved from www.scopus.com

Kumar, A., Mundotiya, R. K., & Singh, A. K. (2020). Unsupervised approach for zero-shot experiments: Bhojpuri-hindi and magahi-Hindi@LoResMT 2020. Proceedings of the 3rd Workshop on Technologies for MT of Low Resource Languages, , 43-46. Retrieved from www.scopus.com

Lynn, T., Scannell, K., & Maguire, E. (2015). Minority Language Twitter: Part-of-Speech Tagging and Analysis of Irish Tweets, Retrieved from www.scopus.com

Micikevicius, P., Narang, S., Alben, J., Diamos, G., Elsen, E., Garcia, D., . . . Venkatesh, G. (2017). Mixed precision training. Mixed Precision Training, Retrieved from www.scopus.com

Ojha, A. K., Liu, C. -., Kann, K., Ortega, J., Shatam, S., & Fransen, T. (2021). Findings of the LoResMT 2021 shared task on COVID and sign language for low-resource languages. Paper presented at the Proceedings of the 4th Workshop on Technologies for Machine Translation of Low-Resource Languages, LoResMT 2021, 114-123. Retrieved from www.scopus.com

Ott, M., Edunov, S., Baevski, A., Fan, A., Gross, S., Ng, N., . . . Auli, M. (2019). Fairseq: A fast, extensible toolkit for sequence modeling. Paper presented at the NAACL HLT 2019 - 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Proceedings of the Demonstrations Session, 48-53. Retrieved from www.scopus.com

Papineni, K., Roukos, S., Ward, T., & Zhu, W. -. (2002). BLEU: A method for automatic evaluation of machine translation. Paper presented at the Proceedings of the Annual Meeting of the Association for Computational Linguistics, , 2002-July 311-318. Retrieved from www.scopus.com

Parameswari, K., Sreenivasulu, N. V., Uma Maheshwar Rao, G., & Christopher, M. (2012). Development of telugu-tamil bidirectional MT system: A special focus on case divergence. Proceedings of 11Th International Tamil Internet Conference, , 180-191. Retrieved from www.scopus.com

Pathak, A., & Pakray, P. (2019). Neural machine translation for indian languages. Journal of Intelligent Systems, 28(3), 465-477. doi:10.1515/jisys-2018-0065

Philip, J., Siripragada, S., Namboodiri, V. P., & Jawahar, C. V. (2020). Revisiting low resource status of indian languages in machine translation. Paper presented at the ACM International Conference Proceeding Series, 178-187. doi:10.1145/3430984.3431026 Retrieved from www.scopus.com

Poncelas, A., Shterionov, D., Way, A., de Buy Wenniger, G. M., & Passban, P. (2018). Investigating Backtranslation in Neural Machine Translation, Retrieved from www.scopus.com

Popović, M. (2015). Chrf: Character n-gram f-score for automatic mt evaluation. Paper presented at the 10th Workshop on Statistical Machine Translation, WMT 2015 at the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015 - Proceedings, 392-395. Retrieved from www.scopus.com

Puranik, K., Hande, A., Priyadharshini, R., Thavareesan, S., & Chakravarthi, B. R. (2021). IIITT@LT-EDIEACL2021-hope speech detection: There is always hope in transformers. Proceedings of the First Workshop on Language Technology for Equality, Diversity and Inclusion, , 98-106. Retrieved from www.scopus.com

Ramesh, G., Doddapaneni, S., Bheemaraj, A., Jobanputra, M., Raghavan, A., Sharma, A., . . . Khapra, M. S. (2021). Samanantar: The largest publicly available parallel corpora collection for 11 indic languages. Samanantar: The Largest Publicly Available Parallel Corpora Collection for 11 Indic Languages, Retrieved from www.scopus.com

Revanuru, K., Turlapaty, K., & Rao, S. (2017). Neural machine translation of indian languages. Paper presented at the ACM International Conference Proceeding Series, 11-20. doi:10.1145/3140107.3140111 Retrieved from www.scopus.com

Sarveswaran, K., Dias, G., & Butt, M. (2021). ThamizhiMorph: A morphological parser for the tamil language. Machine Translation, 35(1), 37-70. doi:10.1007/s10590-021-09261-5

Scannell, K. P. (2006). Machine translation for closely related language pairs. Proceedings of the LREC 2006 Workshop on Strategies for Developing Machine Translation for Minority Languages, , 103-107. Retrieved from www.scopus.com

Scannell, K. P. (2007). The crúbadán project: Corpus building for under-resourced languages. Building and Exploring Web Corpora: Proceedings of the 3rd Web as Corpus Workshop, 4, 5-15. Retrieved from www.scopus.com

Senthil Kumar, B., Thenmozhi, D., & Kayalvizhi, S. (2020). Tamil paraphrase detection using encoder-decoder neural networks doi:10.1007/978-3-030-63467-4_3 Retrieved from www.scopus.com

Snover, M., Dorr, B., Schwartz, R., Micciulla, L., & Makhoul, J. (2006). A study of translation edit rate with targeted human annotation. Proceedings of Association for Machine Translation in the Americas, 200(6) Retrieved from www.scopus.com

Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence to sequence learning with neural networks. Sequence to Sequence Learning with Neural Networks, Retrieved from www.scopus.com

Thenmozhi, D., Senthil Kumar, B., & Aravindan, C. (2018). Deep learning approach to english-tamil and hindi-tamil verb phrase translations. Paper presented at the CEUR Workshop Proceedings, , 2266 323-331. Retrieved from www.scopus.com

Tiedemann, J., & Thottingal, S. (2020). OPUS-MT - building open translation services for the world. Paper presented at the Proceedings of the 22nd Annual Conference of the European Association for Machine Translation, EAMT 2020, 479-480. Retrieved from www.scopus.com

Tiedemann, J. (2020). The Tatoeba Translation Challenge-Realistic Data Sets for Low Resource and Multilingual MT, Retrieved from www.scopus.com

Torregrosa, D., Pasricha, N., Masoud, M., Chakravarthi, B. R., Alonso, J., Casas, N., & Arcan, M. (2019). Leveraging rule-based machine translation knowledge for under-resourced neural machine translation models. Proceedings of Machine Translation Summit XVII Volume 2: Translator, Project and User Tracks, , 125-133. Retrieved from www.scopus.com

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., . . . Polosukhin, I. (2017). Attention is all you need. Attention is all You Need, , 6000-6010. Retrieved from www.scopus.com

Vázquez, R., Raganato, A., Tiedemann, J., & Creutz, M. (2019). Multilingual nmt with a language-independent attention bridge. Paper presented at the ACL 2019 - 4th Workshop on Representation Learning for NLP, RepL4NLP 2019 - Proceedings of the Workshop, 33-39. Retrieved from www.scopus.com

Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue, C., Moi, A., . . . Rush, A. M. (2020). Transformers: State-of-the-art natural language processing. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, , 38-45. Retrieved from www.scopus.com

Wu, H., & Wang, H. (2007). Pivot language approach for phrase-based statistical machine translation. Paper presented at the ACL 2007 - Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics, 856-863. Retrieved from www.scopus.com

Yasaswini, K., Puranik, K., Hande, A., Priyadarshini, R., Thavareesan, S., & Chakravarthi, B. R. (2021). IIITT@DravidianLangTech-EACL2021: Transfer learning for offensive language detection in dravidian languages. Paper presented at the Proceedings of the 1st Workshop on Speech and Language Technologies for Dravidian Languages, DravidianLangTech 2021 at 16th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2021, 187-194. Retrieved from www.scopus.com

Zhang, Z. (2019). Improved adam optimizer for deep neural networks. Paper presented at the 2018 IEEE/ACM 26th International Symposium on Quality of Service, IWQoS 2018, doi:10.1109/IWQoS.2018.8624183 Retrieved from www.scopus.com

Zoph, B., & Knight, K. (2016). Multi-source neural translation. Paper presented at the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2016 - Proceedings of the Conference, 30-34. doi:10.18653/v1/n16-1004 Retrieved from www.scopus.com


This material may be protected under Copyright Act which governs the making of photocopies or reproductions of copyrighted materials.
You may use the digitized material for private study, scholarship, or research.

Back to previous page

Installed and configured by Bahagian Automasi, Perpustakaan Tuanku Bainun, Universiti Pendidikan Sultan Idris
If you have enquiries, kindly contact us at pustakasys@upsi.edu.my or 016-3630263. Office hours only.