King Saud University Repository >
King Saud University >
Science Colleges >
College of Computer and Information Sciences >
College of Computer and Information Sciences >

Please use this identifier to cite or link to this item: http://hdl.handle.net/123456789/15177

Title: A pattern matching approach for redundancy detection in bi-lingual and mono-lingual Corpora,
Authors: Muneer Ahmad
Hassan Mathkour
Keywords: Key Words: Bi-Chains, Corpora, DSDR, Mono-Chains, Sequences
Issue Date: 2009
Publisher: IMECS
Abstract: The Bi-Lingual and Mono-Lingual Corpora Information relating to numerous Languages may be duplicated. This leads to slow and inaccurate search results from Bi-Lingual and Mono-Lingual databases. It is essential to structure the Sequences in a fashion that reduces the redundant sequence structure so that the analysis of Bi-Lingual and Mono-Lingual Corpora structure is accurate to help in analyzing the features of certain complex and subjective languages. The detection will lead to the selection of right solution from large Corpora's. In this paper, we present an algorithm (we call it DSDR) that operates on a set of Bi-Lingual and Mono-Lingual Corpora and iterates in the same set to find all possible duplications present in the set. Once the duplications are found, the DSDR removes duplicated Chains and refreshes the databases resulting in remarkable reductions in the sizes of the databases. In addition, the speed of searches of certain Chains from Bi-Lingual and Mono-Lingual Corpora becomes quite fast and accurate.
URI: http://hdl.handle.net/123456789/15177
Appears in Collections:College of Computer and Information Sciences

Files in This Item:

File Description SizeFormat
Dr.Hassan mathkour-5-conf.docx14.92 kBMicrosoft Word XMLView/Open

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.


DSpace Software Copyright © 2002-2009 MIT and Hewlett-Packard - Feedback