Unsupervised Extraction of False Friends from Parallel Bi-Texts Using the Web as a Corpus (RANLP 2009)

Book cover
Scientific article: False friends are pairs of words in two languages that are perceived as similar, but have different meanings, e.g., Gift in German means poison in English. In this paper, we present several unsupervised algorithms for acquiring such pairs from a sentence-aligned bi-text. First, we try different ways of exploiting simple statistics about monolingual word occurrences and cross-lingual word co-occurrences in the bi-text. Second, using methods from statistical machine translation, we induce word alignments in an unsupervised way, from which we estimate lexical translation probabilities, which we use to measure cross-lingual semantic similarity. Third, we experiment with a semantic similarity measure that uses the Web as a corpus to extract local contexts from text snippets returned by a search engine, and a bilingual glossary of known word translation pairs, used as bridges. Finally, all measures are combined and applied to the task of identifying likely false friends. The evaluation for Russian and Bulgarian shows a significant improvement over previously-proposed algorithms.
add to favoritesadd

Users who have this book

Users who want this book

What readers are saying

What do you think? Write your own comment on this book!

write a comment

What do you think? Write your own comment on this book

Info about the book

Series:

Unknown

ISBN:

1405133791

Rating:

2.5/5 (1)

Your rating:

0/5

Languge:

English

Do you want to exchange books? It’s EASY!

Get registered and find other users who want to give their favourite books to good hands!