Wikibibliographie ENCYCLEN

WIKINDX Resources

Proceedings Article: BibTeX citation key:  gabrilovich.576
Gabrilovich Evgeniy & Markovitch Shaul (2006). « Overcoming the Brittleness Bottleneck using Wikipedia : Enhancing Text Categorization with Encyclopedic Knowledge ». In Proceedings of the 21st AAAI Conference on Artificial Intelligence, Boston, MA 2006 210, p. 1301–1306.
Added by: Laure Endrizzi 2006-11-22 23:03:23    Last edited by: Laure Endrizzi 2007-11-13 16:19:53
Categories: 4. interfaces et modes de consultation
Keywords: visualisation, web sémantique, Wikipedia
Creators: Gabrilovich, Markovitch
Publisher: (Boston, MA)
Collection: Proceedings of the 21st AAAI Conference on Artificial Intelligence

Number of views:  1199
Popularity index:  10%

 
Abstract
When humans approach the task of text categorization, they interpret the specific wording of the document in the much larger context of their background knowledge and experience. On the other hand, state-of-the-art information retrieval systems are quite emph{brittle}---they traditionally represent documents as bags of words, and are restricted to learning from individual word occurrences in the (necessarily limited) training set. For instance, given the sentence ``Wal-Mart supply chain goes real time'', how can a text categorization system know that Wal-Mart manages its stock with RFID technology? And having read that ``Ciprofloxacin belongs to the quinolones group'', how on earth can a machine know that the drug mentioned is an antibiotic produced by Bayer? We propose to enrich document representation through automatic use of a vast compendium of human knowledge---an encyclopedia. We apply machine learning techniques to Wikipedia, the largest encyclopedia to date, which surpasses in scope many conventional encyclopedias and provides a cornucopia of world knowledge. Each Wikipedia article represents a emph{concept}, and documents to be categorized are represented in the rich feature space of words and relevant Wikipedia concepts. Empirical results confirm that this knowledge-intensive representation brings text categorization to a qualitatively new level of performance across a diverse collection of datasets.
Added by: Laure Endrizzi    Last edited by: Laure Endrizzi

 
Further information may be found at:

 
>

 

wikindx  v3.4.7 ©2006 VST v 1.0     |     Total Resources:  611     |     Database queries:  37     |     Script execution:  0.30253 secs

 


École normale supérieure de Lyon
Institut français de l'Éducation
Veille et Analyses
15 parvis René-Descartes BP 7000 . 69342 Lyon cedex 07
Standard : +33 (0)4 72 76 61 00
Télécopie : +33 (0)4 72 76 61 93