Corpus construction.

Texts are automatically extracted from the Internet and we produce corpora using tools developed in our team. The corpora can be monolingual or parallel.

Technical features

Tools developed at Elhuyar enable us to detect bilingual documents in the Internet and align them sentence by sentence.

Success stories

Elhuyar’s web corpus.