Wikibibliographie ENCYCLEN

WIKINDX Resources

Journal Article: BibTeX citation key:  esser.260
Esser Wolfram M. (2004). « Fault-tolerant Fulltext Search for Large Multilingual Scientific Text Corpora ». Journal of Digital Information, vol. 6, n° 1.
Added by: Laure Endrizzi 2005-11-10 11:18:59    Last edited by: Laure Endrizzi 2005-11-10 11:18:59
Categories: 4. interfaces et modes de consultation
Creators: Esser
Collection: Journal of Digital Information

Number of views:  752
Popularity index:  6.27%

In the work reported here, we present a new way of performing fault-tolerant fulltext retrieval on large text corpora, such as scientific encyclopedias. The weighted pattern morphing (WPM) technique introduced in this paper overcomes disadvantages of both the popular edit distance measure and the Soundex code approaches, yet keeping their flexibility. This algorithm handles phonetic similarities; common typing errors such as omission or transposition of letters, and inconsistent usage of abbreviations and hyphenation. After showing how WPM can be implemented efficiently, we present a novel method of how the weights of the internal penalty matrix can be automatically adjusted for even better results. Though the described technique can be applied without prior knowledge of actual user patterns, re-examination with a large number of online-user's patterns proves the portability of this fine-tuning approach. We further show how shifting the penalty matrix from one language to another can be accomplished. The described WPM technique is integrated into a large commercial pharmaceutical encyclopedia CDROM, an online dermatological encyclopedia, and an online-reference encyclopedia of parasitology research, thus also proving its "road capability".
Added by: Laure Endrizzi    Last edited by: Laure Endrizzi

Further information may be found at:



wikindx  v3.4.7 ©2006 VST v 1.0     |     Total Resources:  611     |     Database queries:  31     |     Script execution:  0.34406 secs


École normale supérieure de Lyon
Institut français de l'Éducation
Veille et Analyses
15 parvis René-Descartes BP 7000 . 69342 Lyon cedex 07
Standard : +33 (0)4 72 76 61 00
Télécopie : +33 (0)4 72 76 61 93