| |
|
|
|
|
(2004) Gaustad, Tanja
The main research question I try to answer in the my thesis is which linguistic knowledge sources are most useful for word sense disambiguation (WSD), more specifically word sense disambiguation of Dutch. The goal of the project was to develop a tool which is able to automatically determine the meaning of a particular ambiguous word in context, a so called word sense disambiguation system. In order to achieve this, I make use of the information contained in the context, namely the words surrounding the ambiguous word, and additional underlying information (such as syntactic class and structure) to build a statistical language model. This model is then used to determine the meaning of examples of that particular ambiguous word in new contexts.
My results on the (unseen) Senseval-2 test data show that adding structural syntactic information in the form of dependency relations instead of PoS of the context leads to an error-rate reduction of 8% for the word form model. Furthermore, the lemma-based approach (introduced in this thesis) outperforms the word form-based approach independently of the features included in the model. We can observe an error rate reduction of 10% with regard to the lemma-based model including PoS in context, and a reduction of 6% of errors with regard to the best model based on word forms.
Comparing the results on the test data to results obtained with a different system, using Memory-Based Learning (MBL) as a classification algorithm, both the word form-based classifiers and the lemma-based classifiers from my system produce higher accuracy. The lemma-based model actually leads to an error rate reduction of 10% if compared to the MBL WSD system.
In my maximum entropy system, especially the addition of deep linguistic knowledge greatly improves accuracy. In combination with an approach taking advantage of morphological information, the lemma-based approach, the best results for WSD of Dutch on the Senseval-2 data set are obtained. Our system achieves significantly higher disambiguation accuracy than any results for Dutch that have been reported in the literature up to now and is thus state-of-the-art for Dutch WSD.
Gebruik a.u.b. deze link om te verwijzen naar dit
document:
http://irs.ub.rug.nl/ppn/269125787 |
Meer informatie in de catalogus
Meer informatie in Picarta
Afdrukken op bestelling.
|
|
| |
| To top
|
| |
© 2003-2007 RUG : De Rijksuniversiteit Groningen heeft de rechten van deze repository. Alle rechten voorbehouden. Powered by WildFire
| |