Data Quality Management for Big Data Applications

khaleel, Majida; Hamad, Murtadha

Please use this identifier to cite or link to this item: http://localhost:8080/xmlui/handle/123456789/5313

Full metadata record

DC Field	Value	Language
dc.contributor.author	khaleel, Majida	-
dc.contributor.author	Hamad, Murtadha	-
dc.date.accessioned	2022-10-22T20:34:10Z	-
dc.date.available	2022-10-22T20:34:10Z	-
dc.date.issued	2019-01-01	-
dc.identifier.uri	http://localhost:8080/xmlui/handle/123456789/5313	-
dc.description.abstract	Currently, as a result of the continuous increase of data, one of the key issues is the development of systems and applications to deal with storage, management and processing of big numbers of data. These data are found in unstructured ways. Data management with traditional approaches is inappropriate because of the large and complex data sizes. Hadoop is a suitable solution for the continuous increase in data sizes. The important characteristics of the Hadoop are distributed processing, high storage space, and easy administration. Hadoop is better known for distributed file systems. In this paper, we have proposed techniques and algorithms that deal with big data including data collecting, data preprocessing, algorithms for data cleaning, A Technique for Converting Unstructured Data to Structured Data using metadata, distributed data file system (fragmentation algorithm) and Quality assurance algorithms by using the model is the statistical model to evaluate the highest educational institutions. We concluded that Metadata accelerates query response required and facilitates query execution, metadata will be content for reports, fields and descriptions. Total time access for three complex queries in distributed processing it is 00: 03: 00 per second while in nondistributed processing it is at 00: 15: 77 per second, average is approximately five minutes per second. Quality assurance note values (T-test) is 0.239 and values (T-dis) is 1.96, as a result of dealing with scientific sets and humanities sets. In the comparison law, it can be deduced that if the t-test is smaller than the t-dis; so there is no difference between the mean of the scientific and humanities samples, the values of C.V for both scientific is (8.585) and humanities sets is (7.427), using the law of homogeneity know whether any sets are more homogeneous whenever the value of a small C.V was more homogeneous however the humanity set is more homogeneity	en_US
dc.language.iso	en	en_US
dc.publisher	IEEE	en_US
dc.subject	Big Data,	en_US
dc.subject	data quality,	en_US
dc.subject	unstructured Data Distributed data file system,	en_US
dc.subject	statistical model	en_US
dc.title	Data Quality Management for Big Data Applications	en_US
dc.type	Article	en_US
Appears in Collections:	قسم علوم الحاسبات

Files in This Item:

File	Description	Size	Format
112.pdf		306.87 kB	Adobe PDF	View/Open

Show simple item record