Paper
1 August 2022 MyETL: a flexible and efficient data quality profiling framework
Nianfeng Weng, Jianjun Cao, Guoquan Jiang
Author Affiliations +
Proceedings Volume 12257, 4th International Conference on Information Science, Electrical, and Automation Engineering (ISEAE 2022); 122570S (2022) https://doi.org/10.1117/12.2640107
Event: 4th International Conference on Information Science, Electrical, and Automation Engineering (ISEAE 2022), 2022, Guangzhou, China
Abstract
Data quality is very important in data centric environments. The data quality profiling task is to filter out data records which violate domain semantic rules. The data quality profiling framework is expected to be flexible to represent complex domain semantics and efficient to tackle large amount of data records. We propose a data quality profiling framework, named MyETL, which is conceived from ETL paradigm to fulfil these requirements. A directed acyclic graph is employed to represent domain semantic rules in design phase. Then the graph is optimized by a topology optimization procedure. At last, the graph is mapped to threads and memory objects and scheduled to execution. As implemented based on OSGi framework, MyETL is constructed by bundles and can be extended for convenient.
© (2022) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Nianfeng Weng, Jianjun Cao, and Guoquan Jiang "MyETL: a flexible and efficient data quality profiling framework", Proc. SPIE 12257, 4th International Conference on Information Science, Electrical, and Automation Engineering (ISEAE 2022), 122570S (1 August 2022); https://doi.org/10.1117/12.2640107
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Profiling

Optimization (mathematics)

Databases

Data processing

RELATED CONTENT

Proof of concept to secure the quality of research data
Proceedings of SPIE (March 04 2022)
A spatial query scheduler in a distributed environment
Proceedings of SPIE (November 03 2008)
User profiling in WWW network
Proceedings of SPIE (February 23 2005)
A fuzzy neural network for intelligent data processing
Proceedings of SPIE (March 28 2005)

Back to Top