Training Phrase-Based Machine Translation Models on the Cloud: Open Source Machine Translation Toolkit Chaski
Open Access
- 1 January 2010
- journal article
- Published by Charles University in Prague, Karolinum Press in The Prague Bulletin of Mathematical Linguistics
- Vol. 93 (1) , 37-46
- https://doi.org/10.2478/v10108-010-0004-8
Abstract
Training Phrase-Based Machine Translation Models on the Cloud: Open Source Machine Translation Toolkit Chaski: In this paper we present an opensource machine translation toolkit Chaski which is capable of training phrase-based machine translation models on Hadoop clusters. The toolkit provides a full training pipeline including distributed word alignment, word clustering and phrase extraction. The toolkit also provides an extended error-tolerance mechanism over standard Hadoop error-tolerance framework. The paper will describe the underlying methodology and the design of the system, together with instructions of how to run the system on Hadoop clusters.Keywords
This publication has 1 reference indexed in Scilit:
- MapReduceCommunications of the ACM, 2008