Theory and Application of Text-representing Centroids

Typ: Fortschritt-Berichte VDI
Erscheinungsdatum: 17.04.2019
Reihe: 10
Band Nummer: 863
Autor: Herwig Unger, Mario M. Kubek
ISBN: 978-3-18-386310-5
ISSN: 0178-9627
Erscheinungsjahr: 2019
Anzahl Seiten: 152
Produktart: Buch

Produktbeschreibung

Centroid terms are single, descriptive words that semantically and topically characterise text documents and thus can act as their very compact representation in automated text processing tasks that strongly rely on the semantic similarity of texts. Algorithms to classify and cluster them make use of this information. In this book, the novel, brain- and physicsinspired concept of centroid terms is introduced and deeply discussed. Furthermore, their unique properties and practical usage in major natural language processing and text mining tasks are covered. In this regard, a new graph-based method for their fast calculation is presented as well. In contrast to methods relying on the bag-of-words model, the derived centroid distance measure can uncover
a topical relationship between texts even when their wording differs. As centroid terms can also represent short texts, the presented first fully integrated, P2P-based web search engine, called “WebEngine”, therefore makes heavy use of centroid terms when interpreting queries and forwarding them to peers with matching documents, represented by their own centroid terms.

Contents
Is a `Librarian of the Web‘ really needed?
H. Unger . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Centroid Terms as Text Representatives
M. M. Kubek and H. Unger . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Spreading Activation: A Fast Calculation Method for Text Centroids
M. M. Kubek, T. Bohme and H. Unger . . . . . . . . . . . . . . . . . . . . . . . . 27
Empiric Experiments with Text-representing Centroids
M. M. Kubek, T. Bohme and H. Unger . . . . . . . . . . . . . . . . . . . . . . . . 39
Towards a Librarian of the Web
M. M. Kubek and Herwig Unger . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
A Concept Supporting a Resilient, Fault-tolerant and Decentralised Search
H. Unger and M. M. Kubek . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
An Associative Ring Memory to Support Decentralised Search
H. Unger and M. M. Kubek . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
The WebEngine { A Fully Integrated, Decentralised Web Search Engine
M. M. Kubek and H. Unger . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
On Evolving Text Centroids
H. Unger and M. M. Kubek . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
Addendum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
Authors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140

Keywords: Centroids, Application, Text Processing, Text Centroid, Co-occurrence Graph, Spreading Activation, Text Categorisation, Librarian of the Web, P2P-system, Decentralised Search, WebEngine, Web Search Engine

