Please use this identifier to cite or link to this item: http://localhost:8080/xmlui/handle/123456789/8480
Title: OCR Improvement for Unstructured Big Data Integration
Authors: Abd Al Lattif, Dalia
Hamad, Murtadha
Keywords: (OCR)
mining techniques
artificial intelligence (AI) techniques
ASCII codes
LSTM algorithm.
Issue Date: 1-Jan-2021
Publisher: University of Anbar
Abstract: The continues increasing in data that produced from different online systems and applications, has led to a fundamental problem related to how can managing and handling large volume of data. However, the most important point is the unstructured data storage method as it represents most of the data via internet management using the traditional methods is not suitable due to the availability of large and complex data. Hence, Hadoop was the suitable solution for the continuous increasing in data volumes and complexity, as well as dealing with and analyzing it as it is from any source, speed, size or quantity. In this thesis, a system for analyzing big data is proposed. This system has the ability to identify repeated words (keywords) in a large number of image –based files( pdf) and files based text that have been scanned by (optical character recognition device or the so-called OCR). The system supports decision-making and developing the archiving process by providing an important entity to respond to keyword-based inquiries. The goal of the project is to work on the (brain) of the (OCR) device using artificial intelligence (AI) techniques and mining techniques by making use of its ability to scan, read, analyze and convert texts and paper images into ASCII codes while at the same time solve its problems of inability to identify words one by one but rather reading full texts in addition to his inability to convert unstructured data into structured, and thus developing its capabilities to facilitate use of it in the business environment (business intelligence) and making his work close to the work of (Hadoop). This is done by making the device capable of identifying the required words by III adding rectangles around each word and giving it the ability to convert unstructured data into structured using the LSTM algorithm
URI: http://localhost:8080/xmlui/handle/123456789/8480
Appears in Collections:قسم علوم الحاسبات

Files in This Item:
File Description SizeFormat 
داليا.pdf4.25 MBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.