Elhuyar intelligent machine translation.
Speech recognition service in Basque and Spanish
Spelling and grammar checker for Basque
Online platform for the creation of specialized dictionaries.
Online dictionaries: Basque<>Spanish, Basque<>French, Basque<>English
Web for consulting bilingual dictionaries automatically built by pivot-techniques.
Web to query two large corpora automatically compiled from the web, one Basque and one parallel Spanish-Basque.
This website offers the possibility of searching for word s or terms in Basque on the web, with the results shown as corpus queries in context.
Search engine for Basque, the only one that allows you to limit the results to Basque.
ElhPolar_es
Spanish polarity lexicon.
ElhPolar_eu
Basque polarity lexicon.
Basque Opinon Dataset
Polarity annotated Basque sentences.
BEC2016 opinion dataset
Basque regional election campaign 2016 opinion dataset - BEC2016. 25.000 Tweets with entity level polarity annotations (pos|neg).
Behagunea Opinion dataset
Tweet collection about the DSS2016 Cultural capital project. Tweets annotated with polarity at message level (pos|neg|neu) in Basque (3000) and Spanish (4754).
EliXa polarity classification models (EliXa 1.0.x)
Models for polarity classification, trained over cultural domain (Behagunea) tweets.
Previous versions: v 0.9.x
EliXa resources (EliXa 1.0.x <=)
language specific resources: polarity lexicons and other resources for text normalization. We currently provide such resources for 4 languages; Basque (eu), Spanish (es), English (en) and French (fr). Also includes pos tagging models for ixa-pipe-pos tool.
Previous versions: v 0.9.x ( Ixa-pipes pos models not included)
Ixa-Pipes models for EliXa 0.9.x
Ixa-Pipes models used for lemmatization and POS tagging (1.5.0) by EliXa 0.9.x as default models.
Basque-English Parallel corpus
Basque-English parallel corpus automatically gathered using the PaCo2 tool.
Basque-Spanish Parallel corpus
Basque-Spanish parallel corpus automatically gathered using the PaCo2 tool. It contains 640K segments.
Elhuyar web corpus
Corpus of 186M tokens in Basque. Automatically crawled and cleaned from the Web.
Ref: Leturia, I. 2014. The Web as a Corpus of Basque. PhD Thesis. Faculty of Informatics, UPV/EHU. Donostia.
FMTODelh dataset
Basque version of the Facebook Multilingual Task Oriented Dataset (López de Lacalle et al., 2020). Train and Dev sets have been translated using NMT. Test set has been manually translated.
SNIPSeu dataset
SNIPS Dataset (Coucke A. et al., 2018) test set manually translated for Basque (López de Lacalle et al., 2021)
BHTC dataset
Basque Headlines Document Classification (BHTC) dataset. Collection containing 12,403 headlines extracted from the weekly newspaper Argia with topic annotations. Used for document classification task (Agerri et al., 2020).
GEC-elh-eu dataset
Grammatical Error Correction (GEC) dataset for Basque. 9 million synthetic sentence pairs (incorrect - correct) as train dataset. For evaluation synthethic examples (6,000) and manual revised examples (672) are provided. If you use it, cite (Beloki et al., 2020) paper.
Here you will find all the open source software we publish.
© 2018, Elhuyar - ht@elhuyar.eus - 943363040 - Legal advice - Cookie policy