|
DSpace at King Saud University >
King Saud University >
COLLEGES >
Science Colleges >
College of Computer and Information Sciences >
College of Computer and Information Sciences >
Please use this identifier to cite or link to this item:
http://hdl.handle.net/123456789/15177
|
| Title: | A pattern matching approach for redundancy detection in bi-lingual and mono-lingual Corpora, |
| Authors: | Muneer Ahmad Hassan Mathkour |
| Keywords: | Key Words: Bi-Chains, Corpora, DSDR, Mono-Chains, Sequences |
| Issue Date: | 2009 |
| Publisher: | IMECS |
| Abstract: | The Bi-Lingual and Mono-Lingual Corpora Information relating to numerous Languages may be duplicated. This leads to slow and inaccurate search results from Bi-Lingual and Mono-Lingual databases. It is essential to structure the Sequences in a fashion that reduces the redundant sequence structure so that the analysis of Bi-Lingual and Mono-Lingual Corpora structure is accurate to help in analyzing the features of certain complex and subjective languages. The detection will lead to the selection of right solution from large Corpora's. In this paper, we present an algorithm (we call it DSDR) that operates on a set of Bi-Lingual and Mono-Lingual Corpora and iterates in the same set to find all possible duplications present in the set. Once the duplications are found, the DSDR removes duplicated Chains and refreshes the databases resulting in remarkable reductions in the sizes of the databases. In addition, the speed of searches of certain Chains from Bi-Lingual and Mono-Lingual Corpora becomes quite fast and accurate. |
| URI: | http://hdl.handle.net/123456789/15177 |
| Appears in Collections: | College of Computer and Information Sciences
|
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.
|