SURVEY ON WEB SPAM AND ITS UNDERLYING PRINCIPLES
DOI:
https://doi.org/10.53555/eijse.v5i4.34Keywords:
Click spam, Search engine, Spamming, Fradulent clicks, URLAbstract
Search engines have become a de facto place to start information acquisition on the Internet. Although due to web spam phenomenon, search results are not always as fine as they are expected. Moreover, spam evolves that makes the problem of providing high quality search even more challenging. Over the last decade research on information retrieval has gained a lot of interest both from academics and industry. In this paper, systematic review of web spam detection techniques and underlying principles are presented. Existing algorithms are categorized into three categories based on the type of information they use: content-based methods, methods based upon links and methods based on non-traditional data such as user behavior for e.g. clicks and image spam is given. A brief survey on various spam forms is provided. Finally, the underlying principles are summarized.
References
. K. Chellapilla and A. Maykov. “A taxonomy of javascript redirection spam” In Proceedings of the 3rd international workshop on Adversarial information retrieval on the web, AIRWeb’07, Canada, 2007.
. J. Abernethy, O. Chapelle, C. Castillo, J. Abernethy, O. Chapelle, and C. Castillo. WITCH: A new approach to web spam detection. In Proceedings of the 4th International Workshop on Adversarial Information, 2008.
. M. R. Henzinger, R. Motwani, and C. Silverstein. Challenges in web search engines. SIGIR,2002.
. Z. Dou, R. Song, X. Yuan, and J.-R. Wen. Are clickthrough data adequate for learning web search rankings? Information and knowledge management, 2008.
. N. Immorlica, K. Jain, M. Mahdian, and K. Talwar. Click Fraud Resistant Methods for Learning Click- Through Rates. Technical report, Microsoft Research, Redmond, 2006.
. Nikita Spirin, Jiawei Han, Survey on Web Spam Detection: Principles and Algorithms Department of Computer Science, SIGKDD Explorations Volume 13, Issue 2, 2011
. S. Chakrabarti. Mining the Web: Discovering Knowledge from Hypertext Data. Morgan Kaufmann, 2002.
. R. Bhattacharjee and A. Goel. Algorithms and Incentives for Robust Ranking. Technical report, Stanford University, 2006.
. M. Najork. Introduction to Web spam detection, 2006.
. S. Nomura, S. Oyama, T. Hayamizu, and T. Ishida. Analysis and improvement of hits algorithm for detecting web communities, Japan, 35, Nov. 2004
. R. Lempel and S. Moran. SALSA: “The stochastic approach for link-structure analysis”. ACM Trans. Inf. System, April 2001
. C. Castillo and B. D. Davison. “Adversarial web search: Found. Trends” , 4, May 2011.
. B. Wu and B.D. Davison. “Detecting semantic cloaking on the web.” In Proceedings of the International Conference on World Wide Web, WWW’06, Edinburgh, Scotland, 2006.
. K.K. Arthi M.Sc1, Dr. V.Thiagarasu “A Study on Web Spam Classification and Algorithms” International Journal of Computer Trends and Technology (IJCTT). volume 4 , Sep 2013
. Dhanraj S; Karthi keyani, V. “A study on e-mail image spam the we (at the 14th International World Wide Web Conference) chiba, filtering techniques” IEEE, Salem. ISBN- 978-1-4673-5843-9, 2013 Japan, 2005..
. ZoltanGyongi; Hector Garcia-Molina.,”Web Spam Taxonomy ”,.First International workshop on Adversarial Information Retrieval on the We (at the 14th International World Wide Web Conference) chiba, Japan, 2005.
Downloads
Published
Issue
Section
License
Copyright (c) 2019 EPH - International Journal of Science And Engineering (ISSN: 2454 - 2016)
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.