Bitte benutzen Sie diese Referenz, um auf diese Ressource zu verweisen:
Volltext verfügbar? / Dokumentlieferung
doi:10.22028/D291-37016
Titel: | Developing a Legal Form Classification Extraction Approach for Company Entity Matching : Benchmark of Rule-Based and Machine Learning Approaches |
VerfasserIn: | Kruse, Felix Awick, Jan-Philipp Marx Gómez, Jorge Loos, Peter |
Sprache: | Englisch |
Titel: | Business information systems |
Bandnummer: | 1 |
Startseite: | 13 |
Endseite: | 26 |
Verlag/Plattform: | TIB Open Publishing |
Erscheinungsjahr: | 2021 |
Freie Schlagwörter: | Record Linkage Company Entity Matching Data Integration Data Quality Data Preparation |
DDC-Sachgruppe: | 650 Management |
Dokumenttyp: | Journalartikel / Zeitschriftenartikel |
Abstract: | This paper explores the data integration process step record linkage. Thereby we focus on the entity company. For the integration of company data, the company name is a crucial attribute, which often includes the legal form. This legal form is not concise and consistent represented among different data sources, which leads to considerable data quality problems for the further process steps in record linkage. To solve these problems, we classify and ex-tract the legal form from the attribute company name. For this purpose, we iteratively developed four different approaches and compared them in a benchmark. The best approach is a hybrid approach combining a rule set and a supervised machine learning model. With our developed hybrid approach, any company data sets from research or business can be processed. Thus, the data quality for subsequent data processing steps such as record linkage can be improved. Furthermore, our approach can be adapted to solve the same data quality problems in other attributes. |
DOI der Erstveröffentlichung: | 10.52825/bis.v1i.44 |
URL der Erstveröffentlichung: | https://www.tib-op.org/ojs/index.php/bis/article/view/44 |
Link zu diesem Datensatz: | urn:nbn:de:bsz:291--ds-370165 hdl:20.500.11880/33604 http://dx.doi.org/10.22028/D291-37016 |
ISSN: | 2747-9986 |
Datum des Eintrags: | 8-Aug-2022 |
Fakultät: | HW - Fakultät für Empirische Humanwissenschaften und Wirtschaftswissenschaft |
Fachrichtung: | HW - Wirtschaftswissenschaft |
Professur: | HW - Prof. Dr. Peter Loos |
Sammlung: | SciDok - Der Wissenschaftsserver der Universität des Saarlandes |
Dateien zu diesem Datensatz:
Es gibt keine Dateien zu dieser Ressource.
Alle Ressourcen in diesem Repository sind urheberrechtlich geschützt.