Académique Documents
Professionnel Documents
Culture Documents
Intelligent phishing detection system for e-banking using fuzzy data mining
Maher Aburrous a,*, M.A. Hossain a, Keshav Dahal a, Fadi Thabtah b
a
Department of Computing University of Bradford, Bradford, UK
b
MIS Department Philadelphia University Amman, Jordan
a r t i c l e i n f o a b s t r a c t
Keywords: Detecting and identifying any phishing websites in real-time, particularly for e-banking, is really a com-
Phishing plex and dynamic problem involving many factors and criteria. Because of the subjective considerations
Fuzzy logic and the ambiguities involved in the detection, fuzzy data mining techniques can be an effective tool in
Data mining assessing and identifying phishing websites for e-banking since it offers a more natural way of dealing
Classification
with quality factors rather than exact values. In this paper, we present novel approach to overcome
Association
Apriori
the ‘fuzziness’ in the e-banking phishing website assessment and propose an intelligent resilient and
E-banking risk assessment effective model for detecting e-banking phishing websites. The proposed model is based on fuzzy logic
combined with data mining algorithms to characterize the e-banking phishing website factors and to
investigate its techniques by classifying the phishing types and defining six e-banking phishing website
attack criteria’s with a layer structure. Our experimental results showed the significance and importance
of the e-banking phishing website criteria (URL & Domain Identity) represented by layer one and the var-
ious influence of the phishing characteristic on the final e-banking phishing website rate.
Ó 2010 Elsevier Ltd. All rights reserved.
1. Introduction lem with each other for which there is no known single silver
bullet to entirely solve it. The motivation behind this study is to
E-banking phishing websites are forged websites that are cre- create a resilient and effective method that uses fuzzy data mining
ated by malicious people to mimic real e-banking websites. Most algorithms and tools to detect e-banking phishing websites in an
of these kinds of web pages have high visual similarities to scam automated manner.
their victims. Some of these web pages look exactly like the real DM approaches such as neural networks, rule induction, and
ones. Unwary Internet users may be easily deceived by this kind decision trees can be a useful addition to the fuzzy logic model.
of scam. Victims of e-banking phishing websites may expose their It can deliver answers to business questions that traditionally were
bank account, password, credit card number, or other important too time-consuming to resolve such as, ‘‘Which are most important
information to the phishing web page owners. The impact is the e-banking phishing website characteristic indicators and why?” by
breach of information security through the compromise of confi- analyzing massive databases and historical data for training
dential data, and the victims may finally suffer losses of money purposes.
or other kinds. Phishing is a relatively new Internet crime in com- The paper is organized as follows: Section 2 presents the litera-
parison with other forms, e.g., virus and hacking. More and more ture review and related work, Section 3 shows the theory and
phishing web pages have been found in recent years in an acceler- methodology of the proposed fuzzy based data mining approach
ative way (Fu, Wenyin, & Deng, 2006). The word phishing from the for the phishing website risk assessment model. Section 4 intro-
phrase ‘‘website phishing” is a variation on the word ‘‘fishing”. The duces the system design and implementation with the overall fuz-
idea is that bait is thrown out with the hopes that a user will grab it zy data mining inference rules. Section 5 reveals the experiments
and bite into it just like the fish. In most cases, bait is either an e- and results of the fuzzy data mining e-banking phishing website
mail or an instant messaging site, which will take the user to hos- risk assessment model and then conclusions and future work are
tile phishing websites (James, 2006). given in Section 6.
E-banking phishing website is a very complex issue to under-
stand and to analyze, since it is joining technical and social prob- 2. Literature review and related work
0957-4174/$ - see front matter Ó 2010 Elsevier Ltd. All rights reserved.
doi:10.1016/j.eswa.2010.04.044
7914 M. Aburrous et al. / Expert Systems with Applications 37 (2010) 7913–7921
preventing such attacks is an important step towards defending automatic way rather than involving too many human efforts.
against e-banking phishing website attacks, there are several Their method first decomposes the web pages (in HTML) into sali-
promising defending approaches to this problem reported earlier. ent (visually distinguishable) block regions. The visual similarity
In this section, we briefly survey existing anti-phishing solutions between two web pages is then evaluated in three metrics: block
and list of the related works. One approach is to stop phishing at level similarity, layout similarity, and overall style similarity,
the e-mail level (Adida, Hohenberge, & Rivest, 2005), since most which are based on the matching of the salient block regions (Fu
current phishing attacks use broadcast e-mail (spam) to lure vic- et al., 2006).
tims to a phishing website (Wu, Miller, & Garfinkel, 2006a). An-
other approach is to use security toolbars. The phishing filter in 2.2. Main characteristics of e-banking phishing websites
IE7 (Sharif, 2006) is a toolbar approach with more features such
as blocking the user’s activity with a detected phishing site. A third Evolving with the anti-phishing techniques, various phishing
approach is to visually differentiate the phishing sites from the techniques and more complicated and hard-to-detect methods
spoofed legitimate sites. Dynamic Security Skins (Dhamija & Tygar, are used by phishers. The most straightforward way for a phisher
2005) proposes to use a randomly generated visual hash to cus- to defraud people is to make the phishing web pages similar to
tomize the browser window or web form elements to indicate their targets. Actually, there are many characteristics and factors
the successfully authenticated sites. A fourth approach is two-fac- that can distinguish the original legitimate website from the forged
tor authentication, which ensures that the user not only knows a e-banking phishing website like Spelling errors, Long URL address
secret but also presents a security token (FDIC, 2004). However, and Abnormal DNS record. The full list is shown in Table 1 which
this approach is a server-side solution. Phishing can still happen will be used later on our analysis and methodology study.
at sites that do not support two-factor authentication. Sensitive
information that is not related to a specific site, e.g., credit card
2.3. Why using fuzzy logic and data mining?
information and SSN (Social Security Number), cannot be protected
by this approach either (Wu, Miller, & Little, 2006b).
FL has been used for decades in the engineering sciences to
Many industrial anti-phishing products use toolbars in web
embed expert input into computer models for a broad range of
browsers, but some researchers have shown that security tool bars
do not effectively prevent phishing attacks. Bridges and Vaughn Table 1
(2001), Dhamija and Tygar (2005) proposed a scheme that utilises Components and layers of e-banking phishing website criteria.
a cryptographic identity-verification method that lets remote web
Criteria N Component Layer No.
servers prove their identities. However, this proposal requires
changes to the entire web infrastructure (both servers and clients), URL & Domain 1 Using the IP address Layer 1
Identity 2 Abnormal request URL
so it can succeed only if the entire industry supports it. In Liu, (Weight = 0.3) 3 Abnormal URL of Anchor
Deng, Huang, & Fu (2006), the authors proposed a tool to model 4 Abnormal DNS record Sub
and describe phishing by visualizing and quantifying a given site’s weight = 0.3
threat, but this method still would not provide an anti-phishing 5 Abnormal URL
solution. Another approach is to employ certification, e.g., Micro- Security & 1 Using SSL certificate
soft spam privacy (Microsoft, 2004; Microsoft Corp., 2005; Olsen, Encryption
2 Certification authority Layer 2
2004; Perez, 2003; WholeSecurity Web Caller, 2005). A recent
(Weight = 0.2) 3 Abnormal Cookie
and particularly promising solution was proposed in Herzberg 4 Distinguished Names
and Gbara (2004), which combines the technique of standard cer- Certificate (DN)
tificates with a visual indication of correct certification; a site- Source Code & 1 Redirect pages
dependent logo indicating that the certificate was valid would be Java script
displayed in a trusted credentials area of the browser. A variant 2 Straddling attack
of web credential is to use a database or list published by a trusted (Weight = 0.2) 3 Pharming attack Sub
weight = 0.4
party, where known phishing websites are blacklisted. For exam- 4 Using onMouseOver to hide
ple, Netcraft anti-phishing toolbar (Netcraft, 2004) prevents phish- the Link
ing attacks by utilising a centralized blacklist of current phishing 5 Server Form Handler (SFH)
URLs. Other examples include websense, McAfee’s anti-phishing Page Style & 1 Spelling errors
filter, Netcraft anti-phishing system, Cloudmark SafetyBar, and Contents
Microsoft Phishing Filter (Pan & Ding, 2006). The weaknesses of 2 Copying website
(Weight = 0.1) 3 Using forms with ‘‘Submit” Layer 3
this approach are its poor scalability and its timeliness. Note that
button
phishing sites are cheap and easy to build and their average life- 4 Using Pop-Ups windows
time is only a few days. APWG provides a solution directory at 5 Disabling Right-Click
Anti-Phishing Working Group (2007) which contains most of the Web Address Bar 1 Long URL address
major anti-phishing companies in the world. However, an auto- 2 Replacing similar characters
matic anti-phishing method is seldom reported. The typical tech- for URL
nologies of anti-phishing from the user interface aspect are done (Weight = 0.1) 3 Adding a prefix or suffix
4 Using the @ Symbol to Sub
by Dhamija and Tygar (2005) and Wu et al., 2006b. They proposed Confuse weight = 0.3
methods that need web page creators to follow certain rules to cre- 5 Using Hexadecimal Character
ate web pages, either by adding dynamic skin to web pages or add- Codes
ing sensitive information location attributes to HTML code. Social Human 1 Much emphasis on security
However, it is difficult to convince all web page creators to follow Factor and response
the rules (Fu et al., 2006). In Fu et al. (2006), Liu, Huang, Liu, Zhang, (Weight = 0.1) 2 Public generic salutation
3 Buying Time to Access
& Deng, 2005; Liu et al., 2006), the DOM-based (Wood, 2005) visual
Accounts
similarity of web pages is oriented, and the concept of visual ap-
Total 1
proach to phishing detection was first introduced. Through this ap-
weight
proach, a phishing web page can be detected and reported in an
M. Aburrous et al. / Expert Systems with Applications 37 (2010) 7913–7921 7915
and the output is a number. This step was done using Centroid substantial fraction of users and suggest that alternative ap-
technique (Han & Karypis, 2000) since it is a commonly used proaches are needed (Dhamija, Tygar, & Hearst, 2006).
method. The output is e-banking phishing website risk rate
and is defined in fuzzy sets like ‘very phishy’ to ‘Very legitimate’. 3.3. Mining e-banking phishing websites challenges
The fuzzy output set is then defuzzified to arrive at a scalar
value. There are a number of challenges posed by doing post hoc clas-
sification of e-banking phishing websites. Most of these challenges
only apply to the e-banking phishing websites data and materialize
3.2. Data sets and experimental results
as a form of information, which has the net effect of increasing the
false negative rate. The age of the data set is the most significant
Two publicly available data sets were used to test our imple-
problem, which is particularly relevant with the phishing corpus.
mentation: the ‘‘phishtank” from the phishtank.com (http://
E-banking phishing websites are short lived, often lasting only in
www.phishtank.com/phish_archive.php, 2008) which is consid-
the order of 48 h. Some of our features can therefore not be ex-
ered one of the primary phishing-report collators both the 2007
tracted from older websites, making our tests difficult. The average
and 2008 collections, for a total of approximately 606 e-banking
phishing site stays live for approximately 2.25 days (Putting an end
phishing websites. The PhishTank database records the URL for
to account-hijacking identity theft, 2004). Furthermore, the pro-
the suspected website that has been reported, the time of that re-
cess of transforming the original e-banking phishing website ar-
port, and sometimes further detail such as the screenshots of the
chives into record feature data sets is not without error. It
website and is publicly available. The Anti-Phishing Working
requires the use of heuristics at several steps. Thus, high accuracy
Group (APWG) that maintains a ‘‘Phishing Archive” describing
from the data mining algorithms cannot be expected. However, the
phishing attacks dating back to September 2007 (Anti-Phishing
evidence supporting the golden nuggets comes from a number dif-
Working Group, 2007). We performed a cognitive walkthrough
ferent algorithms and feature sets, and we believe it is compelling
on the approximately 1006 sample attacks within this archive.
(Fette, Sadeh, & Tomasic, 2006).
We used a series of short scripts to programmatically extract the
above features and store these in an excel sheet for quick reference
3.4. Utilisation of different DM classification algorithms
as shown in Fig. 3.
Our goal is to gather information about the strategies that are
The practical part of this study utilises five different common
used by attackers and to formulate hypotheses about classifying
DM algorithms (C4.5, Ripper, Part, Prism, and CBA). Our choice of
and categorizing of all different e-banking phishing attacks tech-
these methods is based on the different strategies they use in
niques. By thoroughly investigating these phishing attacks, we
learning rules from data sets (Misch, 2006). The C4.5 algorithm
have created a data set containing information regarding what dif-
(Quinlan, 1996) employs divide and conquer approach, and the
ferent techniques have been used and how the usage of these tech-
RIPPER algorithm uses separate and conquer approach. The choice
niques has changed over time. By investigating these information,
of PART algorithm is based on the fact that it combines both ap-
we have found some interesting techniques depending on the main
proaches to generate a set of rules. It adapts separate and conquer
perception that phishers know that most users do not know how to
to generate a set of rules and uses divide and conquer to build par-
check the security and often assume that sites requesting sensitive
tial decision trees. The way PART builds and prunes partial decision
information are secure which makes very difficult for them to see
tree is similar to the C4.5 implementation with a difference which
the difference between authentic security and mimicked security
can be explained as follows: C4.5 generates one decision tree and
features (Persson, 2007). We also found that some visual deception
uses pruning techniques to simplify it; each path from the root
attacks can fool even the most sophisticated users. These results
node to one of the leaves in the tree represents a rule. On the other
illustrate that standard security indicators are not effective for a
hand, PART avoids the simplification process by building up partial
decision trees and choosing only one path in each one of them to
derive a rule. Once the rule is generated, all instances associated
with it and the partial tree will be discarded. PRISM is a classifica-
tion rule which can only deal with nominal attributes and does not
do any pruning. It implements a top–down (general to specific)
sequential-covering algorithm that employs a simple accuracy-
based metric to pick an appropriate rule antecedent during rule
construction. Finally, CBA algorithm employs association rule min-
ing (Liu et al., 1998) to learn the classifier and then adds a pruning
and prediction steps. This results in a classification approach
named associative classification (Fadi, Peter, & Peng (2005); Thab-
tah, Cowling, & Peng, 2004).
We recorded the prediction accuracy of the considered classifi-
cation approaches we used in this study in Tables 2 and 3. The
overall summary output can be interpreted as: Web Address Bar
and URL Domain Identity are the major important criteria for iden-
tifying and detecting e-banking phishing website. Such as if one or
both of them are genuine, then most likely the website is legiti-
mate, and on the other hand if it is Fraud, then the website is most
likely phishing. The classification rules did not just showed the sig-
nificance roll of the Web Address Bar criteria and URL Domain
Identity criteria but showed also the magnitude value of some
other e-banking phishing website criteria like Security & Encryp-
tion criteria comparing to the others. We have used 10-fold-
Fig. 3. Excel sheet of the e-banking phishing main extracted features. cross-validation as a testing mode which evaluating the derived
M. Aburrous et al. / Expert Systems with Applications 37 (2010) 7913–7921 7917
Table 4
Sample of the Rule base 1 structure and entries for URL & Domain Identity criteria.
Rule (Component 1) Using (Component 2) (Component 3) (Component 4) (Component 5) URL & Domain identity criteria
# the IP address Abnormal Req. URL Abnormal URL anchor Abnormal DNS record Abnormal URL phishing risk (layer 1)
1 Low Low Low Low Low Genuine
2 Low Low Low Low Mod. Genuine
3 Low Low Mod. Mod. Mod. Doubtful
4 Low Low Low Mod. high Doubtful
5 Low Low Mod. Mod. high Fraud
6 Mod. Mod. Mod. Low high Fraud
7 Mod. Low high Mod. high Fraud
8 high Mod. Low Mod. Low Doubtful
9 Low Mod. Low Low Mod. Fraud
10 high Mod. high high Low Fraud
Contents, Web Address Bar, and Social Human Factor which is Rule Page Style & Web Social Human Phishing risk
the output from layer 3, and one output. The system structure Contents Address Bar Factor (layer 3)
for Page Style & Contents criteria is the joining of its five compo- 1 Genuine Genuine Doubtful Legal
nents (Spelling errors, Copying website, Using forms with ‘‘Sub- 2 Genuine Doubtful Fraud Uncertain
mit” button, Using Pop-Ups windows, and Disabling Right-Click) 3 Genuine Fraud Doubtful Uncertain
4 Doubtful Doubtful Genuine Uncertain
using Rule base 1, which produces Page Style & Contents criteria.
5 Doubtful Doubtful Doubtful Uncertain
The system structure for Web Address Bar criteria is the joining of 6 Doubtful Fraud Doubtful Fake
its five components (Long URL address, Replacing similar charac- 7 Doubtful Genuine Genuine Legal
ters for URL, Adding a prefix or suffix, Using the @ Symbol to Con- 8 Fraud Doubtful Doubtful Uncertain
fuse, and Using Hexadecimal Character Codes) using Rule base 1, 9 Fraud Fraud Fraud Fake
Table 8
Five highest (10) for layer 1 and all others lowest (0).
Table 9
Five middle (5) inputs for layer 1 and layer 2 and highest (10) inputs for layer 3.
Table 10
Five middle (5) inputs for layer 1 and all others lowest (0) inputs.
result clearly shows that even if some of the e-banking phishing Bridges, S. M., & Vaughn, R. B. (2001). Fuzzy data mining and genetic algorithms
applied to intrusion detection. Department of Computer Science Mississippi State
website characteristics or layers are not very clear or not definite,
University, White Paper.
the website can still be phishy and forged, and users should be Cendrowska, J. (1987). PRISM: An algorithm for inducing modular rules.
aware when dealing with it especially when other phishing charac- International Journal of Man–Machine Studies, 27(4), 349–370.
teristics or layers are obvious and clear. Ciesielski Vic, & Lalani Anand (in press). Data mining of web access logs from an
academic web site. In Proceedings of the third international conference on hybrid
Table 10 shows that when the layer 1 of the e-banking phishing intelligent systems (HIS’03): Design and Application of Hybrid Intelligent Systems
website risk criteria (URL & Domain Identity) has middle (5) input (pp. 1034–1043). IOS Press.
values which indicate Mod. phishing indicator and all other Layers Dhamija, R., & Tygar, J. D. (2005). The battle against phishing: Dynamic security
skins. In Proc. Symp. Usable Privacy and Security.
have the value of zero input values which indicate Low phishing Dhamija Rachna, Tygar, J. D., & Hearst Marti (2006). Why phishing works. Harvard
indicator, the final e-banking phishing website risk rating will be University, White Paper.
reasonably low (39%) representing a [legitimate website], which Fadi, T., Peter, C. & Peng, Y. (2005). MCAR: Multi-class classification based on
association rule. In IEEE international conference on computer systems and
means that there is Good guarantee that the website is legitimate applications (pp. 127–133).
website. This result clearly shows that even if some of the e-bank- Fayyad, U. M. (1998). Mining databases: Towards algorithms for knowledge
ing phishing website characteristics or layers are noticed or ob- discovery. Data Engineering Bulletin, 21(1), 39–48.
FDIC (2004). Putting an end to account – hijacking identity theft. Available from
served that does not mean at all that the website is phishy or http://www.fdic.gov/consumers/consumer/idtheftstudy/identity_theft.pdf.
forged, but it can be safe and secured especially when other phish- Fette Ian, Sadeh Norman, & Tomasic Anthony (2006). Learning to detect phishing e-
ing characteristics or layers are not noticeable, visible or detect- mails. Institute for Software Research International. CMU-ISRI-06–112,
Fu, A. Y., Wenyin, L., & Deng, X. (2006). Detecting phishing web pages with visual
able. The results also indicate that the worst e-banking phishing
similarity assessment based on earth mover’s distance (EMD). IEEE Transactions
website rate (all three layers have 10 input value) equals 83.7% on Dependable and Secure Computing, 3(4).
representing [Very phishy Website], and the best e-banking phish- Han, E., & Karypis, G. (2000). Centroid-based document classification: Analysis and
ing website rate (all three layers have 0 input value) is 16.4% experimental results. Principles of Data Mining and Knowledge Discovery,
424–431.
representing [Very legitimate website] rather than a full range, Herzberg, A., & Gbara, A. (2004). Protecting naive web users. Draft of July, 18.
i.e. 0–100, because of the fuzzification process. Ho, C. Y., Ling, B. W., & Reiss, J. D. (2006). Fuzzy impulsive control of high-order
interpolative low-pass sigma–delta modulators. IEEE Transactions on Circuits
and Systems–I: Regular Papers, 53(10).
Available from http://www.phishtank.com/phish_archive.php (2008).
6. Conclusion and future work James, L. (2006). Phishing exposed. Tech target article sponsored by: Sunbelt
software. Available from searchexchange.com.
The fuzzy data mining e-banking phishing website model Kantardzic, & Mehmed (2003). Data mining: Concepts, models, methods, and
algorithms. John Wiley & Sons. ISBN 0471228524. OCLC 50055336.
showed significance and importance of the e-banking phishing Liu, M., Chen, D., & Wu, C. (2002). The continuity of Mamdani method. International
website criteria (URL & Domain Identity) represented by layer 1. Conference on Machine Learning and Cybernetics, 3, 1680–1682.
It also showed that even if some of the e-banking phishing website Liu Bing, Hsu Wynne, & Ma Yiming (1998). Integrating classification and
association rule mining. In Proceedings of the fourth international conference on
characteristics or layers are not very clear or not definite, the web- knowledge discovery and data mining (KDD-98, Plenary Presentation). New York,
site can still be phishy especially when other phishing characteris- USA.
tics or layers are obvious and clear. On the other hand, even if some Liu, W., Huang, G., Liu, X., Zhang, M., & Deng, X. (2005). Phishing Web Page
Detection. In Proc. eighth int’l conf. documents analysis and recognition (pp. 560–
of the e-banking phishing website characteristics or layers are no-
564).
ticed or observed, it does not mean at all that the website is phishy, Liu, W., Deng, X., Huang, G., & Fu, A. Y. (2006). An anti-phishing strategy based on
but it can be safe and secured especially when other phishing char- visual similarity assessment, published by the IEEE computer society 1089–
acteristics or layers are not noticeable, visible, or detectable. 7801/06 IEEE. Internet Computing IEEE.
Microsoft (2004). Available from http://www.microsoft.com/mscorp/twc/privacy/
Our first goal was to determine whether we could find any gold- spam.
en nuggets in the e-banking phishing website archive data using Microsoft Corp. (2005). Microsoft phishing filter: A new approach to building trust
classification algorithms. In this, major rules discovered were in- in e-commerce content. White Paper.
Misch Sebastian, (2006). Content negotiatio in internet mail. Diploma thesis.
serted into the fuzzy rule engine to help giving exact phishing rate University of Applied Sciences Cologne, Mat. No.: 7042524.
output. A major issue in using data mining algorithms is the prep- Netcraft (2004). Available from http://toolbar.netcraft.com.
aration of the feature sets to be used. Finding the ‘‘right” feature set Olsen, S. (2004). AOL tests caller ID for e-mail. CNET News.com.
Pan, Y., & Ding, X. (2006). Anomaly based web phishing page detection. In
is a difficult problem and requires some intuition regarding the Proceedings of the 22nd annual computer security applications conference
goal of data mining exercise. We are not convinced that we have (ACSAC’06). Computer Society.
used the best feature sets and we think that there is more work Perez, J. C. (2003). Yahoo airs antispam initiative. Available from ComputerWeekly.
com.
to be done in this area. Moreover, there are number of emerging Persson Anders (2007). Exploring phishing attacks and countermeasures. Master
technologies that could greatly assist phishing classification that Thesis in Computer Science, Thesis No: MCS-2007:18.
we have not considered. However, we believe that using features Peter, Lyman, & Varian, Hal R. (2003). How much information. Available from http://
www.sims.berkeley.edu/how-much-info-2003. Retrieved on 2008-12-17.
such as those presented here can significantly help with detecting
Putting an end to account-hijacking identity theft, FDIC, Tech. Rep. (2004). [Online].
this class of e-banking phishing websites. The classification Available from http://www.fdic.gov/consumers/consumer/idtheftstudy/
approaches are promising. The training and classification experi- identitytheft.pdf.
ments have proven that it is possible to improve the categorization Quinlan, J. R. (1996). Improved use of continuous attributes in c4.5. Journal of
Artificial Intelligence Research, 4, 77–90.
process. The progress of detecting e-banking phishing websites is Scheffer, T. (2001). Finding association rules that trade support optimally against
really very interesting with a never ending possibility of algo- confidence. In Proc. of the 5th European conf. on principles and practice of
rithms variations when it is combined with each other. knowledge discovery in databases (PKDD’01) (pp. 424–435). Springer-Verlag:
Freiburg, Germany.
Shah, S. (2003). Measuring operational risks using fuzzy logic modeling. Article, Towers
Perrin.
References Sharif, T. (2006). Phishing filter in IE7. Available from http://blogs.msdn.com/ie/
archive/2005/09/09/463204.aspx.
Adida, B., Hohenberger, S., & Rivest, R. (2005). Lightweight encryption for e-mail. In Thabtah, F., Cowling, P., & Peng, Y. (2004). A new multi-class, multi-label associative
USENIX steps to reducing unwanted traffic on the internet workshop (SRUTI). classification approach . In The 4th international conference on data mining
Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules in (IVFM’04). Brighton, UK.
large databases. In Proc. international conference on very large databases (pp. WEKA – University of Waikato, New Zealand, EN (2006). Weka – Data Mining with
478–499). Morgan Kaufmann, Los Altos, CA: Santiage, Chile. Open Source Machine Learning Software in Java. Available from http://
Anti-Phishing Working Group (2007). Phishing Activity Trends Report. Available www.cs.waikato.ac.nz/ml/weka (2006/01/31).
from http://antiphishing.org/reports/apwg_report_sep2007_final.pdf. WholeSecurity Web Caller-ID (2005). Available from www.wholesecurity.com.
M. Aburrous et al. / Expert Systems with Applications 37 (2010) 7913–7921 7921
Witten, I. H., & Frank, E. (2005). Data mining: Practical machine learning tools and Wu, M., Miller, R. C., & Garfinkel, S. L. (2006a). Do Security Toolbars Actually Prevent
techniques (3rd ed.). San Francisco, CA: Morgan Kaufmann. Phishing Attacks?. CHI
Wood, L. (2005). Document object model level 1 specification. Available from http:// Wu, M., Miller, R. C., & Little, G. (2006b). Web wallet: Preventing phishing attacks by
www.w3.org. revealing user intentions. MIT Computer Science and Artificial Intelligence Lab.