A bootstrapping method for extracting bilingual text pairs

Abstract
This paper proposes a method for extracting bilingual text pairs from a comparable corpus. The basic idea of the method is to apply bootstrapping to an existing corpus-based cross-language information retrieval (CLIR) approach. We conducted preliminary tests with English and Japanese bilingual corpora. The bootstrapping method led to much better results for the task of extracting translation pairs compared with a corpus-based CLIR method without boot-strapping, and the extracted translation pairs could be useful training data for improving results of the corpus-based CLIR method.

This publication has 0 references indexed in Scilit: