Please use this identifier to cite or link to this item: doi:10.22028/D291-37016
Volltext verfügbar? / Dokumentlieferung
Title: Developing a Legal Form Classification Extraction Approach for Company Entity Matching : Benchmark of Rule-Based and Machine Learning Approaches
Author(s): Kruse, Felix
Awick, Jan-Philipp
Marx Gómez, Jorge
Loos, Peter
Language: English
Title: Business information systems
Volume: 1
Startpage: 13
Endpage: 26
Publisher/Platform: TIB Open Publishing
Year of Publication: 2021
Free key words: Record Linkage
Company Entity Matching
Data Integration
Data Quality
Data Preparation
DDC notations: 650 Management
Publikation type: Journal Article
Abstract: This paper explores the data integration process step record linkage. Thereby we focus on the entity company. For the integration of company data, the company name is a crucial attribute, which often includes the legal form. This legal form is not concise and consistent represented among different data sources, which leads to considerable data quality problems for the further process steps in record linkage. To solve these problems, we classify and ex-tract the legal form from the attribute company name. For this purpose, we iteratively developed four different approaches and compared them in a benchmark. The best approach is a hybrid approach combining a rule set and a supervised machine learning model. With our developed hybrid approach, any company data sets from research or business can be processed. Thus, the data quality for subsequent data processing steps such as record linkage can be improved. Furthermore, our approach can be adapted to solve the same data quality problems in other attributes.
DOI of the first publication: 10.52825/bis.v1i.44
URL of the first publication: https://www.tib-op.org/ojs/index.php/bis/article/view/44
Link to this record: urn:nbn:de:bsz:291--ds-370165
hdl:20.500.11880/33604
http://dx.doi.org/10.22028/D291-37016
ISSN: 2747-9986
Date of registration: 8-Aug-2022
Faculty: HW - Fakultät für Empirische Humanwissenschaften und Wirtschaftswissenschaft
Department: HW - Wirtschaftswissenschaft
Professorship: HW - Prof. Dr. Peter Loos
Collections:SciDok - Der Wissenschaftsserver der Universität des Saarlandes

Files for this record:
There are no files associated with this item.


Items in SciDok are protected by copyright, with all rights reserved, unless otherwise indicated.