Please use this identifier to cite or link to this item: doi:10.22028/D291-24950
Title: Using IR techniques for text classification in document analysis
Author(s): Hoch, Rainer
Language: English
Year of Publication: 1994
OPUS Source: Kaiserslautern ; Saarbrücken : DFKI, 1994
SWD key words: Künstliche Intelligenz
DDC notations: 004 Computer science, internet
Publikation type: Report
Abstract: This paper presents the INFOCLAS system applying statistical methods of information retrieval for the classification of German business letters into corresponding message types such as order, offer, enclosure, etc. INFOCLAS is a first step towards the understanding of documents proceeding to a classification-driven extraction of information. The system is composed of two main modules: the central indexer (extraction and weighting of indexing terms) and the classifier (classification of business letters into given types). The system employs several knowledge sources including a letter database, word frequency statistics for German, lists of message type specific words, morphological knowledge as well as the underlying document structure. As output, the system evaluates a set of weighted hypotheses about the type of the actual letter. Classification of documents allow the automatic distribution or archiving of letters and is also an excellent starting point for higher-level document analysis.
Link to this record: urn:nbn:de:bsz:291-scidok-37268
Series name: Research report / Deutsches Forschungszentrum für Künstliche Intelligenz [ISSN 0946-008x]
Series volume: 94-19
Date of registration: 30-Jun-2011
Faculty: SE - Sonstige Einrichtungen
Department: SE - DFKI Deutsches Forschungszentrum für Künstliche Intelligenz
Collections:SciDok - Der Wissenschaftsserver der Universität des Saarlandes

Files for this record:
File Description SizeFormat 
RR_94_19.pdf60,88 kBAdobe PDFView/Open

Items in SciDok are protected by copyright, with all rights reserved, unless otherwise indicated.