Please use this identifier to cite or link to this item:
http://localhost:8080/xmlui/handle/123456789/8480
Title: | OCR Improvement for Unstructured Big Data Integration |
Authors: | Abd Al Lattif, Dalia Hamad, Murtadha |
Keywords: | (OCR) mining techniques artificial intelligence (AI) techniques ASCII codes LSTM algorithm. |
Issue Date: | 1-Jan-2021 |
Publisher: | University of Anbar |
Abstract: | The continues increasing in data that produced from different online systems and applications, has led to a fundamental problem related to how can managing and handling large volume of data. However, the most important point is the unstructured data storage method as it represents most of the data via internet management using the traditional methods is not suitable due to the availability of large and complex data. Hence, Hadoop was the suitable solution for the continuous increasing in data volumes and complexity, as well as dealing with and analyzing it as it is from any source, speed, size or quantity. In this thesis, a system for analyzing big data is proposed. This system has the ability to identify repeated words (keywords) in a large number of image –based files( pdf) and files based text that have been scanned by (optical character recognition device or the so-called OCR). The system supports decision-making and developing the archiving process by providing an important entity to respond to keyword-based inquiries. The goal of the project is to work on the (brain) of the (OCR) device using artificial intelligence (AI) techniques and mining techniques by making use of its ability to scan, read, analyze and convert texts and paper images into ASCII codes while at the same time solve its problems of inability to identify words one by one but rather reading full texts in addition to his inability to convert unstructured data into structured, and thus developing its capabilities to facilitate use of it in the business environment (business intelligence) and making his work close to the work of (Hadoop). This is done by making the device capable of identifying the required words by III adding rectangles around each word and giving it the ability to convert unstructured data into structured using the LSTM algorithm |
URI: | http://localhost:8080/xmlui/handle/123456789/8480 |
Appears in Collections: | قسم علوم الحاسبات |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.