|
DSpace at King Saud University >
King Saud University >
COLLEGES >
Science Colleges >
College of Computer and Information Sciences >
College of Computer and Information Sciences >
Please use this identifier to cite or link to this item:
http://hdl.handle.net/123456789/15761
|
| Title: | Web 2.0 Content Extraction |
| Authors: | Mohannmad Waqar |
| Issue Date: | 2010 |
| Publisher: | International Conference for Internet Technology and Secured Transactions (ICITST-2010) in London, UK |
| Abstract: | This paper presents a simple, efficient and extendable solution for content extraction from web 2.0. Web 2.0 is perceived as the second generation of the web technologies. Web 2.0 has undoubtedly made significant impact in enriching the end-user experience and allowing programmers to write more interactive desktop-like applications for the web. However, it has also introduced some new issues for researchers in the field information retrieval and has made the job of information retrieval from web more difficult, time consuming and challenging. Web pages contain lot of clutter besides the original article. To extract the main content several methods have been developed. However, these methods were originally designed based on the traditional model of the web, and would fail to work on web 2.0 content. Due to evident popularity of web 2.0, the volume of the web 2.0 content on the Web will rise sharply in the coming years. In this paper we propose a new solution to this problem, based upon open source components, which will make the job of web 2.0 content extraction more efficient and will reduce the utilization of precious system resources. The paper also presents a high level logical design for the implementation of such system though available open source components. |
| URI: | http://hdl.handle.net/123456789/15761 |
| Appears in Collections: | College of Computer and Information Sciences
|
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.
|