UPSI Digital Repository (UDRep)
|
|
|
Abstract : Universiti Pendidikan Sultan Idris |
This paper reports the Machine Translation (MT) systems submitted by the IIITT team for the English?Marathi and English-Irish language pairs LoResMT 2021 shared task. The task focuses on getting exceptional translations for rather low-resourced languages like Irish and Marathi. We fine-tune IndicTrans, a pretrained multilingual NMT model for English?Marathi, using external parallel corpus as input for additional training. We have used a pretrained Helsinki-NLP Opus MT English-Irish model for the latter language pair. Our approaches yield relatively promising results on the BLEU metrics. Under the team name IIITT, our systems ranked 1, 1, and 2 in English?Marathi, Irish?English, and English?Irish respectively. The codes for our systems are published. ? 2021 Proceedings of the 4th Workshop on Technologies for Machine Translation of Low-Resource Languages, LoResMT 2021. All rights reserved. |
References |
Achchuthan, Y., & Sarveswaran, K. (2016). Language localisation of tamil using statistical machine translation. Paper presented at the 15th International Conference on Advances in ICT for Emerging Regions, ICTer 2015 - Conference Proceedings, 125-129. doi:10.1109/ICTER.2015.7377677 Retrieved from www.scopus.com Adi Narayana Reddy, K., Shyam Chandra Prasad, G., Rajashekar Reddy, A., Naveen Kumar, L., & Kannaiah. (2021). English-marathi neural machine translation using local attention doi:10.1007/978-981-16-0401-0_21 Retrieved from www.scopus.com Aharoni, R., Johnson, M., & Firat, O. (2019). Massively multilingual neural machine translation. Paper presented at the NAACL HLT 2019 - 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Proceedings of the Conference, , 1 3874-3884. Retrieved from www.scopus.com Ananiadou, S., McNaught, J., & Thompson, P. (2012). The croatian language in the digital age. META-NET White Paper Series, Retrieved from www.scopus.com Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. Neural Machine Translation by Jointly Learning to Align and Translate, Retrieved from www.scopus.com Bojar, O., Chatterjee, R., Federmann, C., Graham, Y., Haddow, B., Huck, M., . . . Zampieri, M. (2016). Findings of the 2016 conference on machine translation (WMT16). Paper presented at the Proceedings of the Annual Meeting of the Association for Computational Linguistics, , 2 131-198. Retrieved from www.scopus.com Chakravarthi, B. R. (2020). Leveraging Orthographic Information to Improve Machine Translation of Under-Resourced Languages, Retrieved from www.scopus.com Chakravarthi, B. R., Arcan, M., & McCrae, J. P. (2019). Comparison of different orthographies for machine translation of under-resourced dravidian languages. 2nd Conference on Language, Data and Knowledge (LDK 2019), Retrieved from www.scopus.com Chakravarthi, B. R., Arcan, M., & McCrae, J. P. (2018). Improving wordnets for under-resourced languages using machine translation. Paper presented at the GWC 2018 - 9th Global WordNet Conference, , 2018-January Retrieved from www.scopus.com Chakravarthi, B. R., Arcan, M., & McCrae, J. P. (2019). WordNet gloss translation for under-resourced languages using multilingual neural machine translation. Proceedings of the Second Workshop on Multilingualism at the Intersection of Knowledge Bases and Machine Translation, , 1-7. Retrieved from www.scopus.com Chakravarthi, B. R., Priyadharshini, R., Stearns, B., Jayapal, A., S, S., Arcan, M., . . . McCrae, J. P. (2019). Multilingual multimodal machine translation for dravidian languages utilizing phonetic transcription. Proceedings of the 2nd Workshop on Technologies for MT of Low Resource Languages, , 56-63. Retrieved from www.scopus.com Chakravarthi, B. R., Rani, P., Arcan, M., & McCrae, J. P. (2021). A survey of orthographic information in machine translation. SN Computer Science, 2(4) doi:10.1007/s42979-021-00723-4 Cheng, Y., Xu, W., He, Z., He, W., Wu, H., Sun, M., & Liu, Y. (2016). Semi-supervised learning for neural machine translation. Paper presented at the 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016 - Long Papers, , 4 1965-1974. doi:10.18653/v1/p16-1185 Retrieved from www.scopus.com Christodouloupoulos, C., & Steedman, M. (2014). A massively parallel corpus: The bible in 100 languages. Language Resources and Evaluation, 49, 1-21. Retrieved from www.scopus.com Christodouloupoulos, C., & Steedman, M. (2015). A massively parallel corpus: The bible in 100 languages. Language Resources and Evaluation, 49(2), 375-395. doi:10.1007/s10579-014-9287-y Dhonnchadha, E., Judge, J., Chasaide, A., Dhubhda, R., & Scannell, K. (2012). The Irish Language in the Digital Age: An Ghaeilge Sa ré Dhigiteach, Retrieved from www.scopus.com Dowling, M., Castilho, S., Moorkens, J., Lynn, T., & Way, A. (2020). A Human Evaluation of Englishirish Statistical and Neural Machine Translation, Retrieved from www.scopus.com Dowling, M., Castilho, S., Moorkens, J., Lynn, T., & Way, A. (2020). A human evaluation of english-irish statistical and neural machine translation. Paper presented at the Proceedings of the 22nd Annual Conference of the European Association for Machine Translation, EAMT 2020, 431-440. Retrieved from www.scopus.com Dowling, M., Judge, J., Lynn, T., & Graham, Y. (2016). English to Irish Machine Translation with Automatic Post-Editing, Retrieved from www.scopus.com Fan, A., Bhosale, S., Schwenk, H., Ma, Z., El-Kishky, A., Goyal, S., . . . Joulin, A. (2020). Beyond english-centric multilingual machine translation. Beyond English-Centric Multilingual Machine Translation, Retrieved from www.scopus.com Feng, F., Yang, Y., Cer, D., Arivazhagan, N., & Wang, W. (2020). Language-Agnostic BERT Sentence Embedding, Retrieved from www.scopus.com Garje, G. V., Gupta, A., Desai, A., Mehta, N., & Ravetkar, A. (2014). Marathi to english machine translation for simple sentences. International Journal of Science and Research (IJSR) ISSN (Online), 3, 3166-3168. Retrieved from www.scopus.com Ha, T. -., Niehues, J., & Waibel, A. (2016). Toward multilingual neural machine translation with universal encoder and decoder. Toward Multilingual Neural Machine Translation with Universal Encoder and Decoder, Retrieved from www.scopus.com Hande, A., Puranik, K., Priyadharshini, R., & Chakravarthi, B. R. (2021). Domain identification of scientific articles using transfer learning and ensembles doi:10.1007/978-3-030-75015-2_9 Retrieved from www.scopus.com Hande, A., Puranik, K., Priyadharshini, R., Thavareesan, S., & Chakravarthi, B. R. (2021). Evaluating pretrained transformer-based models for COVID-19 fake news detection. Paper presented at the Proceedings - 5th International Conference on Computing Methodologies and Communication, ICCMC 2021, 766-772. doi:10.1109/ICCMC51019.2021.9418446 Retrieved from www.scopus.com Hegde, S. U., Hande, A., Priyadarshini, R., Thavareesan, S., & Chakravarthi, B. R. (2021). UVCE-IIITT@DravidianLangTech-EACL2021: Tamil troll meme classification: You need to pay more attention. Paper presented at the Proceedings of the 1st Workshop on Speech and Language Technologies for Dravidian Languages, DravidianLangTech 2021 at 16th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2021, 180-186. Retrieved from www.scopus.com Imankulova, A., Dabre, R., Fujita, A., & Imamura, K. (2019). Exploiting out-of-domain parallel data through multilingual transfer learning for low-resource neural machine translation. Proceedings of Machine Translation Summit XVII Volume 1: Research Track, , 128-139. Retrieved from www.scopus.com Jada, P. K., Reddy, D. S., Hande, A., Priyadharshini, R., Sakuntharaj, R., & Chakravarthi, B. R. (2021). IIITT at CASE 2021 task 1: Leveraging pretrained language models for multilingual protest detection. Paper presented at the 4th Workshop on Challenges and Applications of Automated Extraction of Socio-Political Events from Text, CASE 2021 - Proceedings, 98-104. Retrieved from www.scopus.com Jadhav, S. A. (0000). Marathi to English Neural Machine Translation with Near Perfect Corpus and Transformers, Retrieved from www.scopus.com Jadhav, S. A. (0000). Marathi to English Neural Machine Translation with Near Perfect Corpus and Transformers, Retrieved from www.scopus.com Johnson, M., Schuster, M., Le, Q. V., Krikun, M., Wu, Y., Chen, Z., . . . Corrado, G. (2016). Google’s multilingual neural machine translation system: Enabling zero-shot translation. Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation, Retrieved from www.scopus.com Kalchbrenner, N., & Blunsom, P. (2013). Recurrent continuous translation models. Paper presented at the EMNLP 2013 - 2013 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference, 1700-1709. Retrieved from www.scopus.com Krishnamurthy, P. (2019). Development of telugu-tamil transfer-based machine translation system: An improvization using divergence index. Journal of Intelligent Systems, 28(3), 493-504. doi:10.1515/jisys-2018-0214 Krishnamurthy, P. (2015). Development of telugu-tamil transfer-based machine translation system: With special reference to divergence index. Proceedings of the 1st Deep Machine Translation Workshop, , 48-54. Retrieved from www.scopus.com Kumar, A., Mundotiya, R. K., & Singh, A. K. (2020). Unsupervised approach for zero-shot experiments: Bhojpuri-hindi and magahi-Hindi@LoResMT 2020. Proceedings of the 3rd Workshop on Technologies for MT of Low Resource Languages, , 43-46. Retrieved from www.scopus.com Lynn, T., Scannell, K., & Maguire, E. (2015). Minority Language Twitter: Part-of-Speech Tagging and Analysis of Irish Tweets, Retrieved from www.scopus.com Micikevicius, P., Narang, S., Alben, J., Diamos, G., Elsen, E., Garcia, D., . . . Venkatesh, G. (2017). Mixed precision training. Mixed Precision Training, Retrieved from www.scopus.com Ojha, A. K., Liu, C. -., Kann, K., Ortega, J., Shatam, S., & Fransen, T. (2021). Findings of the LoResMT 2021 shared task on COVID and sign language for low-resource languages. Paper presented at the Proceedings of the 4th Workshop on Technologies for Machine Translation of Low-Resource Languages, LoResMT 2021, 114-123. Retrieved from www.scopus.com Ott, M., Edunov, S., Baevski, A., Fan, A., Gross, S., Ng, N., . . . Auli, M. (2019). Fairseq: A fast, extensible toolkit for sequence modeling. Paper presented at the NAACL HLT 2019 - 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Proceedings of the Demonstrations Session, 48-53. Retrieved from www.scopus.com Papineni, K., Roukos, S., Ward, T., & Zhu, W. -. (2002). BLEU: A method for automatic evaluation of machine translation. Paper presented at the Proceedings of the Annual Meeting of the Association for Computational Linguistics, , 2002-July 311-318. Retrieved from www.scopus.com Parameswari, K., Sreenivasulu, N. V., Uma Maheshwar Rao, G., & Christopher, M. (2012). Development of telugu-tamil bidirectional MT system: A special focus on case divergence. Proceedings of 11Th International Tamil Internet Conference, , 180-191. Retrieved from www.scopus.com Pathak, A., & Pakray, P. (2019). Neural machine translation for indian languages. Journal of Intelligent Systems, 28(3), 465-477. doi:10.1515/jisys-2018-0065 Philip, J., Siripragada, S., Namboodiri, V. P., & Jawahar, C. V. (2020). Revisiting low resource status of indian languages in machine translation. Paper presented at the ACM International Conference Proceeding Series, 178-187. doi:10.1145/3430984.3431026 Retrieved from www.scopus.com Poncelas, A., Shterionov, D., Way, A., de Buy Wenniger, G. M., & Passban, P. (2018). Investigating Backtranslation in Neural Machine Translation, Retrieved from www.scopus.com Popović, M. (2015). Chrf: Character n-gram f-score for automatic mt evaluation. Paper presented at the 10th Workshop on Statistical Machine Translation, WMT 2015 at the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015 - Proceedings, 392-395. Retrieved from www.scopus.com Puranik, K., Hande, A., Priyadharshini, R., Thavareesan, S., & Chakravarthi, B. R. (2021). IIITT@LT-EDIEACL2021-hope speech detection: There is always hope in transformers. Proceedings of the First Workshop on Language Technology for Equality, Diversity and Inclusion, , 98-106. Retrieved from www.scopus.com Ramesh, G., Doddapaneni, S., Bheemaraj, A., Jobanputra, M., Raghavan, A., Sharma, A., . . . Khapra, M. S. (2021). Samanantar: The largest publicly available parallel corpora collection for 11 indic languages. Samanantar: The Largest Publicly Available Parallel Corpora Collection for 11 Indic Languages, Retrieved from www.scopus.com Revanuru, K., Turlapaty, K., & Rao, S. (2017). Neural machine translation of indian languages. Paper presented at the ACM International Conference Proceeding Series, 11-20. doi:10.1145/3140107.3140111 Retrieved from www.scopus.com Sarveswaran, K., Dias, G., & Butt, M. (2021). ThamizhiMorph: A morphological parser for the tamil language. Machine Translation, 35(1), 37-70. doi:10.1007/s10590-021-09261-5 Scannell, K. P. (2006). Machine translation for closely related language pairs. Proceedings of the LREC 2006 Workshop on Strategies for Developing Machine Translation for Minority Languages, , 103-107. Retrieved from www.scopus.com Scannell, K. P. (2007). The crúbadán project: Corpus building for under-resourced languages. Building and Exploring Web Corpora: Proceedings of the 3rd Web as Corpus Workshop, 4, 5-15. Retrieved from www.scopus.com Senthil Kumar, B., Thenmozhi, D., & Kayalvizhi, S. (2020). Tamil paraphrase detection using encoder-decoder neural networks doi:10.1007/978-3-030-63467-4_3 Retrieved from www.scopus.com Snover, M., Dorr, B., Schwartz, R., Micciulla, L., & Makhoul, J. (2006). A study of translation edit rate with targeted human annotation. Proceedings of Association for Machine Translation in the Americas, 200(6) Retrieved from www.scopus.com Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence to sequence learning with neural networks. Sequence to Sequence Learning with Neural Networks, Retrieved from www.scopus.com Thenmozhi, D., Senthil Kumar, B., & Aravindan, C. (2018). Deep learning approach to english-tamil and hindi-tamil verb phrase translations. Paper presented at the CEUR Workshop Proceedings, , 2266 323-331. Retrieved from www.scopus.com Tiedemann, J., & Thottingal, S. (2020). OPUS-MT - building open translation services for the world. Paper presented at the Proceedings of the 22nd Annual Conference of the European Association for Machine Translation, EAMT 2020, 479-480. Retrieved from www.scopus.com Tiedemann, J. (2020). The Tatoeba Translation Challenge-Realistic Data Sets for Low Resource and Multilingual MT, Retrieved from www.scopus.com Torregrosa, D., Pasricha, N., Masoud, M., Chakravarthi, B. R., Alonso, J., Casas, N., & Arcan, M. (2019). Leveraging rule-based machine translation knowledge for under-resourced neural machine translation models. Proceedings of Machine Translation Summit XVII Volume 2: Translator, Project and User Tracks, , 125-133. Retrieved from www.scopus.com Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., . . . Polosukhin, I. (2017). Attention is all you need. Attention is all You Need, , 6000-6010. Retrieved from www.scopus.com Vázquez, R., Raganato, A., Tiedemann, J., & Creutz, M. (2019). Multilingual nmt with a language-independent attention bridge. Paper presented at the ACL 2019 - 4th Workshop on Representation Learning for NLP, RepL4NLP 2019 - Proceedings of the Workshop, 33-39. Retrieved from www.scopus.com Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue, C., Moi, A., . . . Rush, A. M. (2020). Transformers: State-of-the-art natural language processing. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, , 38-45. Retrieved from www.scopus.com Wu, H., & Wang, H. (2007). Pivot language approach for phrase-based statistical machine translation. Paper presented at the ACL 2007 - Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics, 856-863. Retrieved from www.scopus.com Yasaswini, K., Puranik, K., Hande, A., Priyadarshini, R., Thavareesan, S., & Chakravarthi, B. R. (2021). IIITT@DravidianLangTech-EACL2021: Transfer learning for offensive language detection in dravidian languages. Paper presented at the Proceedings of the 1st Workshop on Speech and Language Technologies for Dravidian Languages, DravidianLangTech 2021 at 16th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2021, 187-194. Retrieved from www.scopus.com Zhang, Z. (2019). Improved adam optimizer for deep neural networks. Paper presented at the 2018 IEEE/ACM 26th International Symposium on Quality of Service, IWQoS 2018, doi:10.1109/IWQoS.2018.8624183 Retrieved from www.scopus.com Zoph, B., & Knight, K. (2016). Multi-source neural translation. Paper presented at the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2016 - Proceedings of the Conference, 30-34. doi:10.18653/v1/n16-1004 Retrieved from www.scopus.com |
This material may be protected under Copyright Act which governs the making of photocopies or reproductions of copyrighted materials. You may use the digitized material for private study, scholarship, or research. |