USING THE METHOD OF INFORMATION COMPRESSION IN A SEMANTIC INFORMATION RETRIEVAL

Author(s):  V.A. Kudinov, Dr., Prof., Kursk State Agricultural Academy named after prof. I.I. Ivanova, Kursk, Russia, kudinovva@yandex.ru

Nay Lin, Kursk State University, Kursk, Russia, naylynn16@gmail.com

Issue:  Volume 46, № 1

Rubric:  Infocommunication technologies

Annotation:  In the modern world, the number of people using the Internet is increasing. At the same time, the volume of information on the Internet is increasing. However, the information obtained is semantically ambiguous, since there is a problem of semantic ambiguity of words. To solve this problem, use Ontologies (semantic databases). In addition, there is another problem. Due to the increase in the amount of information, more memory is needed to store it, but the processing time of such data is significantly increased. There are difficulties for information retrieval (IP). In order to solve this problem, data compression is used in IP problems. This article offers a new model of semantic information retrieval using the method of compression based on the End Tagged Dense Code-ETDC code. To compress the Wiki article, the ETDC method is used, which provides a compression ratio of 25%. To build a terminological dictionary file, only simple texts are needed. Coding – the process is very simple and easy to implement programmatically. Therefore, the encoding and decoding time is shorter than in the Huffman method.

Keywords:  marked Huffman code, ETDC, wordnet, concept extension, ontology, text compression, Boyer – Moore algorithm

Full text (PDF):  Download

Downloads count:  318