Open semantic analysis: The case of word level semantics in Danish

Finn Årup Nielsen, Lars Kai Hansen

AbstractThe present research is motivated by the need for accessible and efficient tools for automated semantic analysis in Danish. We are inter-
ested in tools that are completely open, so they can be used by a critical public, in public administration, non-governmental organizations and businesses. We describe data-driven models for Danish semantic relatedness, word intrusion and sentiment prediction. Open Danish corpora were assembled and unsupervised learning implemented for explicit semantic analysis and with Gensim’s Word2vec model. We evaluate the performance of the two models on three different annotated word datasets. We test the semantic representations’ alignment with single word sentiment using supervised learning. We find that logistic regression and large random forests perform well with Word2vec features.
KeywordsDanish, embedding, word2vec
TypeConference paper [With referee]
Conference8th Language & Technology Conference: Human Language Technologies as a Challenge for Computer Science and Linguistics
Year2017    Month November
Electronic version(s)[pdf]
BibTeX data [bibtex]
IMM Group(s)Intelligent Signal Processing

Back  ::  IMM Publications