Wikibibliographie ENCYCLEN

WIKINDX Resources

Journal Article: ID no. (ISBN etc.):  0163-5840 BibTeX citation key:  denoyer.680
Denoyer Ludovic & Gallinari Patrick (2006). « The Wikipedia XML corpus ». SIGIR Forum, vol. 40, n° 1, p. 64–69.
Added by: Laure Endrizzi 2007-11-25 15:27:33    Last edited by: Laure Endrizzi 2007-11-25 15:45:04
Categories: 4. interfaces et modes de consultation
Keywords: documents structurés, extraction d'information, interface, recherche d'information, Wikipedia
Creators: Denoyer, Gallinari
Collection: SIGIR Forum

Number of views:  2203
Popularity index:  18.4%

 
Abstract
Wikipedia is a well know free content, multilingual encyclopedia written collaboratively by contributors around the world. Anybody can edit an article using a wiki markup language that offers a simplified alternative to HTML. This encyclopedia is composed of millions of articles in different languages.
Content-oriented XML retrieval is an area of Information Retrieval (IR) research that is receiving an increasing interest. There already exists a very active community in the IR/ XML domain which started to work on XML search engines and XML textual data. This community is mainly organized since 2002 around the INEX initiative (INitiative for the Evaluation of XML Retrieval) which is funded by the DELOS network of excellence on Digital Libraries.
In this article, we describe a set of XML collections based on Wikipedia. These collections can be used in a large variety of XML IR/Machine Learning tasks like ad-hoc retrieval, categorization, clustering or structure mapping. These corpora are currently used for both, INEX 2006 and the XML Document Mining Challenge. The article provides a description of the corpus.
Added by: Laure Endrizzi

 
Further information may be found at:
http://doi.acm.org/10.1145/1147197.1147210

 
Notes

Added by: Laure Endrizzi    Last edited by: Laure Endrizzi

 
>

 

wikindx  v3.4.7 ©2006 VST v 1.0     |     Total Resources:  611     |     Database queries:  39     |     Script execution:  0.29254 secs

 


École normale supérieure de Lyon
Institut français de l'Éducation
Veille et Analyses
15 parvis René-Descartes BP 7000 . 69342 Lyon cedex 07
Standard : +33 (0)4 72 76 61 00
Télécopie : +33 (0)4 72 76 61 93