Please use this identifier to cite or link to this item: http://localhost:8080/xmlui/handle/123456789/8480
Full metadata record
DC FieldValueLanguage
dc.contributor.authorAbd Al Lattif, Dalia-
dc.contributor.authorHamad, Murtadha-
dc.date.accessioned2022-11-12T17:38:19Z-
dc.date.available2022-11-12T17:38:19Z-
dc.date.issued2021-01-01-
dc.identifier.urihttp://localhost:8080/xmlui/handle/123456789/8480-
dc.description.abstractThe continues increasing in data that produced from different online systems and applications, has led to a fundamental problem related to how can managing and handling large volume of data. However, the most important point is the unstructured data storage method as it represents most of the data via internet management using the traditional methods is not suitable due to the availability of large and complex data. Hence, Hadoop was the suitable solution for the continuous increasing in data volumes and complexity, as well as dealing with and analyzing it as it is from any source, speed, size or quantity. In this thesis, a system for analyzing big data is proposed. This system has the ability to identify repeated words (keywords) in a large number of image –based files( pdf) and files based text that have been scanned by (optical character recognition device or the so-called OCR). The system supports decision-making and developing the archiving process by providing an important entity to respond to keyword-based inquiries. The goal of the project is to work on the (brain) of the (OCR) device using artificial intelligence (AI) techniques and mining techniques by making use of its ability to scan, read, analyze and convert texts and paper images into ASCII codes while at the same time solve its problems of inability to identify words one by one but rather reading full texts in addition to his inability to convert unstructured data into structured, and thus developing its capabilities to facilitate use of it in the business environment (business intelligence) and making his work close to the work of (Hadoop). This is done by making the device capable of identifying the required words by III adding rectangles around each word and giving it the ability to convert unstructured data into structured using the LSTM algorithmen_US
dc.language.isoenen_US
dc.publisherUniversity of Anbaren_US
dc.subject(OCR)en_US
dc.subjectmining techniquesen_US
dc.subjectartificial intelligence (AI) techniquesen_US
dc.subjectASCII codesen_US
dc.subjectLSTM algorithm.en_US
dc.titleOCR Improvement for Unstructured Big Data Integrationen_US
dc.typeTechnical Reporten_US
Appears in Collections:قسم علوم الحاسبات

Files in This Item:
File Description SizeFormat 
داليا.pdf4.25 MBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.