Vous êtes sur la page 1sur 53

Semester: VIII

Branch: I.T.

Project Title: Cantina Antiphishing

INTRODUCTION

Toc H Institute of Science & Technology Arakkunnam 682 313

Page No.1

Semester: VIII

Branch: I.T.

Project Title: Cantina Antiphishing

1. INTRODUCTION
In phishing, an automated form of social engineering, criminals use the Internet to fraudulently extract sensitive information from businesses and individuals, often by impersonating legitimate web sites. The potential for high rewards (e.g., through access to bank accounts and credit card numbers), the ease of sending forged email messages impersonating legitimate authorities, and the difficulty law enforcement has in pursuing the criminals has resulted in a surge of phishing attacks: estimates suggest that phishing affected 1.2 million U.S. Citizens and cost businesses billions of dollars in 2004 alone. Phishing also leads to additional business losses due to consumer fear. Anecdotal evidence suggests that an increasing number of people shy away from Internet commerce due to the threat of identity fraud, despite the tendency of US companies to assume the risk for fraud. Also, many users now default to distrusting any email they receive from financial institutions Current phishing attacks are still relatively modest in sophistication and have substantial room for improvement, as we discuss in Section 2.2. Thus, the research community and corporations need to make a concentrated effort to combat the increasingly severe economic consequences of phishing. Unfortunately, as we discuss in Section 8, current anti-phishing techniques do not offer adequate safeguards for ordinary users. We present three main contributions in this paper. First, we propose several design principles needed to counter phishing attacks: 1) sidestep the arms race, 2) provide mutual authentication. Phishing attacks succeed by exploiting a users inability to disti nguish legitimate sites from spoofed sites. Most prior research focuses on assisting the user in making this distinction; however, users must make the right security decision every time. Unfortunately, humans are ill-suited for performing the security checks necessary for secure site identification, and a single mistake may result in a total compromise of the users online account. Fundamentally, users should be authenticated using information that they cannot readily reveal to malicious parties. Placing less reliance on the user during the authentication process will enhance security and eliminate many forms of fraud. We propose using a trusted device to
Toc H Institute of Science & Technology Arakkunnam 682 313

Page No.2

Semester: VIII

Branch: I.T.

Project Title: Cantina Antiphishing

perform mutual authentication that eliminates reliance on perfect user behavior, thwarts Man-in-the-Middle attacks after setup, and protects a users account even in the presence of keyloggers and most forms of spyware.

We advocate the following set of design principles for anti-phishing tools. Many anti-phishing approaches face the same problem as anti-spam solutions: incremental solutions only provoke an ongoing arms race between researchers and adversaries. This typically gives the advantage to the attackers, since researchers are permanently stuck on the defensive. Instead, we need to research fundamental approaches for preventing phishing. Most anti-phishing techniques strive to prevent phishing attacks by providing better authentication of the server. However, phishing actually exploits authentication failures on both the client and the server side. Initially, a phishing attack exploits the users inability to properly authenticate a server before transmitting sensitive data.

However, a second authentication failure occurs when the server allows the phisher to use the captured data to login as the victim. A complete anti-phishing solution must address both of these failures: clients should have strong guarantees that they are communicating with the intended recipient, and servers should have similarly strong guarantees that the client requesting service has a legitimate claim to the accounts it attempts to access. Reduce reliance on users. The majority of current phishing countermeasures rely on users to assist in the detection of phishing sites and make decisions as to whether to continue are in many ways unsuited to authenticating others or themselves to others. As a result, we must move towards protocols that reduce human involvement or introduce additional information that cannot readily be revealed. These mechanisms add security without relying on perfectly correct user behaviour, thus bringing security to a larger audience. Avoid dependence on the browsers interface. The majority of current antiphishing approaches propose modifications to the browser interface. Unfortunately, the browser interface is inherently insecure and can be easily circumvented by embedded JavaScript applications that mimic the trusted browser elements.
Toc H Institute of Science & Technology Arakkunnam 682 313

Page No.3

Semester: VIII

Branch: I.T.

Project Title: Cantina Antiphishing

LITERATURE SURVEY

Toc H Institute of Science & Technology Arakkunnam 682 313

Page No.4

Semester: VIII

Branch: I.T.

Project Title: Cantina Antiphishing

2. LITERATURE SURVEY
Recently, there has been a dramatic increase in phishing, a kind of attack in which victims are tricked by spoofed emails and fraudulent web sites into giving up personal information. Phishing is a rapidly growing problem, with 9,255 unique phishing sites reported in June of 2006 alone [1]. It is unknown precisely how much phishing costs each year since impacted industries are reluctant to release figures; estimates range from $1 billion to 2.8 billion per year. To respond to this threat, software vendors and companies have released a variety of anti-phishing toolbars. For example, eBay offers a free toolbar that can positively identify eBay-owned sites, and Google offers a free toolbar aimed at identifying any fraudulent site [2]. As of September 2006, the free software download site download.com, listed 84 antiphishing toolbars. However, when we conducted an evaluation of ten anti-phishing tools for a previous study, we found that only one tool could consistently detect more than 60% of phishing web sites without a high rate of false positives [3]. Thus, we argue that there is a strong need for better automated detection algorithms. In this paper, we present the design, implementation, and evaluation of CANTINA, 1 a novel content-based approach for detecting phishing web sites. CANTINA examines the content of a web page to determine whether it is legitimate or not, in contrast to other approaches that look at surface characteristics of a web page, for example the URL and its domain name. CANTINA makes use of the wellknown TF-IDF (term frequency/inverse document frequency) algorithm used in information retrieval [4], and more specifically, the Robust Hyperlinks algorithm previously developed by Phelps and Wilensky for overcoming broken hyperlinks. Our results show that CANTINA is quite good at detecting phishing sites, detecting 9497% of phishing sites. \We also show that we can use CANTINA in conjunction with heuristics used by other tools to reduce false positives (incorrectly labeling legitimate web sites as phishing), while lowering phish detection rates only slightly. We present a summary evaluation, comparing CANTINA to two popular anti-phishing toolbars that are representative of the most effective tools for detecting phishing sites currently available. Our experiments show that CANTINA has comparable or better

Toc H Institute of Science & Technology Arakkunnam 682 313

Page No.5

Semester: VIII

Branch: I.T.

Project Title: Cantina Antiphishing

performance to SpoofGuard (a heuristic-based anti-phishing tool) with far fewer false positives, and does about as well as NetCraft (a blacklist and heuristic-based antiphishing toolbar). Finally, we show that CANTINA combined with heuristics is effective at detecting phishing URLs in users' actual email, and that its most frequent mistake is labeling spam-related URLs as phishing. A number of studies have examined the reasons that people fall for phishing attacks. For example, Downs et al have described the results of an interview and role-playing study aimed at understanding why people fall for phishing emails and what cues they look for to avoid such attacks. In a different study, Dhamija et al. showed that a large number of people cannot differentiate between legitimate and phishing web sites, even when they are made aware that their ability to identify phishing attacks is being tested. Finally, Wu et al. studied three simulated antiphishing toolbars to determine how effective they were at preventing users from visiting web sites the toolbars had determined to be fraudulent. They found that many study participants ignored the toolbar security indicators and instead used the sites content to decide whether or not it was a scam.

Educating People about Phishing Attacks


Anti-phishing education has focused on online training materials, testing, and situated learning. Online training materials have been by government organizations , non-profits and businesses. These materials explain what phishing is and provide tips to prevent users from falling for phishing attacks. Testing is used to demonstrate how susceptible people are to phishing attacks and educate them on how to avoid them. For example, Mail Frontier has a web site containing screenshots of potential phishing emails. Users are scored based on how well they can identify which emails are legitimate and which are not. A third approach uses situated learning, where users are sent phishing emails to test users vulnerability of falling for attacks. At the end of the study, users are given materials that inform them about phishing attacks. This approach has been used in studies conducted by Indiana University in training students , West Point in instructing cadets and a New York State Office in educating employees. The New York study showed an improvement in the participants behavior in identifying phishing over those who were given a pamphlet containing the
Toc H Institute of Science & Technology Arakkunnam 682 313

Page No.6

Semester: VIII

Branch: I.T.

Project Title: Cantina Antiphishing

information on how to combat phishing. In previous work, we developed an emailbased approach to train people how to identify and avoid phishing attacks, demonstrating that the existing practice of sending security notices is ineffective, while a story-based approach using a comic strip format was surprisingly effective in teaching people about phishing.

Anti-Phishing User Interfaces


Other research has focused on the development of better user interfaces for anti-phishing tools. Some work looks at helping users determine if they are interacting with a trusted site. For example, Ye et al. and Dhamija and Tygar have developed prototype user interfaces showing trusted paths that help users verify that their browser has made a secure connection to a trusted site. Herzberg and Gbara have developed TrustBar, a browser add-on that uses logos and warnings to help users distinguish trusted and untrusted web sites. Other work has looked at how to facilitate logins, eliminating the need for end-users to identify whether a site is legitimate or not. For example, PwdHash transparently converts a user's password into a domain-specific password by sending only a one way hash of the password and domain-name. Thus, even if a user falls for a phishing site, the phishers would not see the correct password. The Lucent Personal Web Assistant and Password Multiplier used similar approaches to protect people. PassPet is a browser extension that makes it easier to login to known web sites, simply by pressing a single button. PassPet requires people to memorize only one password, and like PwdHash, generates a unique password for each site. Web Wallet is web browser extension designed to prevent users from sending personal data to the fake page. Web Wallet prevents people from typing personal information directly into a web site, instead requiring them to type a special keystroke to log into Web Wallet and then select their intended web site. Our work in this paper is orthogonal to this previous work, in that our algorithms could be used in conjunction with better user interfaces to provide a more effective solution. As Wu and Miller demonstrated, an anti-phishing toolbar could identify all fraudulent web sites without any false positives, but if it has usability problems, users might still fall victim to fraud.
Toc H Institute of Science & Technology Arakkunnam 682 313

Page No.7

Semester: VIII

Branch: I.T.

Project Title: Cantina Antiphishing

Automated Detection of Phishing


Anti-phishing services are now provided by Internet service providers, built into mail servers and clients, built into web browsers, and available as web browser toolbar. However, these services and tools do not effectively protect against all phishing attacks, as attackers and tool developers are engaged in a continuous arms race. Anti-phishing tools use two major methods for detecting phishing sites. The first is to use heuristics to judge whether a page has phishing characteristics. For example, some heuristics used by the SpoofGuard toolbar include checking the host name, checking the URL for common spoofing techniques, and checking against previously seen images. The second method is to use a blacklist that lists reported phishing URLs. For example, Cloudmark [5] relies on user ratings to maintain their blacklist. Some toolbars, such as Netcraft [6], seem to use a combination of heuristics plus a blacklist with URLs that are verified by paid employees. Both methods have pros and cons. For example, heuristics can detect phishing attacks as soon as they are launched, without the need to wait for blacklists to be updated. However, attackers may be able to design their attacks to avoid heuristic detection. In addition, heuristic approaches often produce false positives (incorrectly labeling a legitimate site as phishing). Blacklists may have a higher level of accuracy, but generally require human intervention and verification, which may consume a great deal of resources. At a recent Anti-Phishing Working Group meeting, it was reported that phishers are starting to use one-time URLs, which direct someone to a phishing site the first time the URL is used, but direct people to the legitimate site afterwards. This and other new phishing tactics significantly complicate the process of compiling a blacklist, and can reduce blacklists effectiveness. Our work with CANTINA focuses on developing and evaluating a new heuristic based on TF-IDF, a popular information retrieval algorithm. CANTINA not only makes use of surface level characteristics (as is done by other toolbars), but also analyzes the text-based content of a page itself. These heuristics were drawn primarily from SpoofGuard and from PILFER, an algorithm for detecting phishing emails [7].

Toc H Institute of Science & Technology Arakkunnam 682 313

Page No.8

Semester: VIII

Branch: I.T.

Project Title: Cantina Antiphishing

A Content-based approach for detecting phishing websites


CANTINA makes use of TF-IDF for detecting phishing sites. TFIDF is a wellknown information retrieval algorithm that can be used for comparing and classifying documents, as well as retrieving documents from a large corpus. In this section, we first review how TF-IDF works. We then introduce an application of TF-IDF called Robust Hyperlinks. Finally, we describe how we adapted Robust Hyperlinks for detecting phishing web sites.

How TDF/IF works


TF-IDF is an algorithm often used in information retrieval and text mining. TFIDF yields a weight that measures how important a word is to a document in a corpus. The importance increases proportionally to the number of times a word appears in the document, but is offset by the frequency of the word in the corpus. The term frequency (TF) is simply the number of times a given term appears in a specific document. This count is usually normalized to prevent a bias towards longer documents (which may have a higher term frequency regardless of the actual importance of that term in the document) to give a measure of the importance of the term within the particular document. The inverse document frequency (IDF) is a measure of the general importance of the term. Roughly speaking, the IDF measures how common a term is across an entire collection of documents. Thus, a term has a high TF-IDF weight by having a high term frequency in a given document (i.e. a word is common in a document) and a low document frequency in the whole collection of documents (i.e. is relatively uncommon in other documents)[8].

Robust Hyperlinks
Phelps and Wilensky developed the idea of Robust Hyperlinks to overcome the problem of broken links. The basic idea is to provide a number of alternative, independent descriptions of networked resources, that is, URLs. Specifically, Phelps and Wilensky proposed adding a small number of well-chosen terms, which they called a lexical signature, to URLs. An example of such a modified signature might be: When locating a web page, one could first try the basic URL. If the resource
Toc H Institute of Science & Technology Arakkunnam 682 313

Page No.9

Semester: VIII

Branch: I.T.

Project Title: Cantina Antiphishing

cannot be found, one could then supply the signature terms to a search engine to locate the document whose signature most closely matches that in the robust hyperlink. A key issue here is how to create signatures that have appropriate properties. First, signatures should be effective in picking out few documents. Second, subsequent changes to a document should have minimal impact on signature effectiveness. Third, the addition of new documents should have minimal impact on previous signature effectiveness. Finally, the effectiveness of the signature should be largely search-engine-independent. To meet these requirements, Phelps and Wilensky proposed using TF-IDF to generate lexical signatures. Specifically, they proposed calculating the TF-IDF value for each word in a document, and then selecting the words with highest value. The rationale here is that term frequency provides robustness (repeated words are less likely to all be deleted), while inverse document frequency provides rarity across a set of documents, minimizing the chance that another document will be added with the same term. Their preliminary empirical results suggest that lexical signatures of about five terms are sufficient to determine a web resource virtually uniquely, out of the more than one billion pages on the web[9]. Their experiments also showed that searching on lexical signatures often yielded a unique document, namely the desired document. In those few cases in which more than one document is returned, the desired document is among the highest ranked.

Adapting TF-IDF for detecting Phishing


The first is that criminals often create phishing sites by copying and then modifying a legitimate sites web pages so that personal information is redirected to the criminals rather than to the legitimate site. The second observation is that phishing sites often contain brand names and other terms that are common on a given web page but relatively rare across the web, leading us to hypothesize that, again, Robust Hyperlinks could be applied to find the owner of those brands[10]. Roughly, CANTINA works as follows: Given a web page, calculate the TF-IDF scores of each term on that web page.
Toc H Institute of Science & Technology Arakkunnam 682 313

Page No.10

Semester: VIII

Branch: I.T.

Project Title: Cantina Antiphishing

Generate a lexical signature by taking the five terms with highest TF-IDF weights. Feed this lexical signature to a search engine, which in our case is Google. If the domain name of the current web page matches the domain name of the N top search results, we consider it to be a legitimate web site. Otherwise, we consider it a phishing site.

It is also worth pointing out that, according to the Anti-Phishing Working Group (APWG), the average time that a phishing site stays online is 4.5 days. Our experiences show that sometimes it is on the order of hours. Furthermore, we argue that phishing web pages will have a low Google Page Rank due to a lack of links pointing to the scam. These two factors combined suggest that a phishing scam will rarely, if ever, be highly ranked. At the end of this paper, however, we discuss some ways of possibly subverting CANTINA. In an earlier implementation, we discovered that TF-IDF alone yields a fair number of false positives, labeling legitimate sites as phishing. To address this problem, we also add the current domain name to the lexical signature. For example, if the page is at http://www.ebay.com/xxxxx, then we add the term eBay to the lexical signature (even if it is already there). The rationale here is that if a page is legitimate, the domain name itself usually can best identify itself (e.g., ebay.com, paypal.com, bankofamerica.com).On the other hand, if the suspected page is phishing, no matter what we add onto its content, Google will not return it. Another design decision was what to do if Google returns zero search results. This sometimes happens because added domain names are sometimes meaningless (for example, u-s-j.be). To address this problem, if Google fails to return any result, we now label the suspected site as phishing (initially we labeled it as unknown). We refer to this as the Zero results Means Phishing heuristic (ZMP). This heuristic has the potential to increase false positives (incorrectly labeling a legitimate site as phishing), but our early experiments strongly suggest that when combined with adding the domain name to the lexical signature, this approach can reduce false positives while not impacting true positives.

Toc H Institute of Science & Technology Arakkunnam 682 313

Page No.11

Semester: VIII

Branch: I.T.

Project Title: Cantina Antiphishing

We developed our larger set of heuristics based on related work, drawing primarily from SpoofGuard and PILFER. We implemented each heuristic to return either -1 if it looks like a phishing page or +1 otherwise. Heuristics include:

Age of Domain
This heuristic checks the age of the domain name. Many phishing sites have domains that are registered only a few days before phishing emails are sent out. We use a WHOIS search to implement this heuristic. This heuristic measures the number of months from when the domain name was first registered. If the page has been registered longer than 12 months, the heuristic will return +1, deeming it as legitimate, and otherwise returns -1, deeming it as phishing. If the WHOIS server cannot find the domain, the heuristic will simply return -1, deeming it as a phishing page. The Netcraft and SpoofGuard toolbars use a similar heuristic based on the time since a domain name was registered. Note that this heuristic does not account for phishing sites based on existing web sites where criminals have broken into the web server, nor does it account for phishing sites hosted on otherwise legitimate domains, for example in space provided by an ISP for personal homepages.

Known Images
This heuristic checks whether a page contains inconsistent well-known logos. For example, if a page contains eBay logos but is not on an eBay domain, then this heuristic labels the site as a probable phishing page. Currently we store nine popular logos locally, including eBay, PayPal, Citibank, Bank of America, Fifth Third Bank, Barclays Bank, ANZ Bank, Chase Bank, and WellsFargo Bank. Eight of these nine legitimate sites are included in the PhishTank.com list of Top 10 Identified Targets. A similar heuristic is used by the SpoofGuard toolbar.

Suspicious URL
This heuristic checks if a pages URL contains an at (@) or a dash (-) in the domain name. An @ symbol in a URL causes the string to the left to be disregarded, with the string on the right treated as the actual URL for retrieving the page.
Toc H Institute of Science & Technology Arakkunnam 682 313

Page No.12

Semester: VIII

Branch: I.T.

Project Title: Cantina Antiphishing

Combined with the limited size of the browser address bar, this makes it possible to write URLs that appear legitimate within the address bar, but actually cause the browser to retrieve a different page. This heuristic is used by Mozilla FireFox. Dashes are also rarely used by legitimate sites, so we use this as another heuristic. SpoofGuard checks for both at symbols and dashes in URLs.

Suspicious Links
This heuristic applies the URL check above to all the links on the page. If any link on a page fails this URL check, then the page is labeled as a possible phishing scam. This heuristic is also used by SpoofGuard.

IP Address
This heuristic checks if a pages domain name is an IP address. This heuristic is also used in PILFER [16].

Dots in URL
This heuristic checks the number of dots in a pages URL. We found that phishing pages tend to use many dots in their URLs but legitimate sites usually do not. Currently, this heuristic labels a page as phish if there are 5 or more dots. This heuristic is also used in PILFER [16].

Forms
This heuristic checks if a page contains any HTML text entry forms asking for personal data from people, such as password and credit card number. We scan the HTML for <input> tags that accept text and are accompanied by labels such as credit card and password. Most phishing pages contain such forms asking for personal data, otherwise the criminals risk not getting the personal information they want.

Toc H Institute of Science & Technology Arakkunnam 682 313

Page No.13

Semester: VIII

Branch: I.T.

Project Title: Cantina Antiphishing

PROBLEM IDENTIFICATION AND DEFINING OBJECTIVES OF THE PROJECT

Toc H Institute of Science & Technology Arakkunnam 682 313

Page No.14

Semester: VIII

Branch: I.T.

Project Title: Cantina Antiphishing

3. PROBLEM IDENTIFICATION AND DEFINING OBJECTIVES OF THE PROJECT


Objectives of proposed system :
We enumerate the goals of an anti-phishing technique, arranged in decreasing order of protection and generality: 1. Ensure that a users data only goes to the intended recipient. 2. Prevent a users data from reaching an untrustworthy recipient. 3. Prevent an attacker from abusing a users data. 4. Prevent an attacker from modifying a users account. 5. Prevent an attacker from viewing a users account.

1. Ensure that a users data only goes to the intended recipient


One of the main objectives is authentication. Authentication means only the authenticated users should send or receive the data. Here, in our project, it should be ensured that the user data should be received only by the authenticated recipient.

2. Prevent a users data from reaching an untrustworthy recipient


Along with authentication it is equally important to ensure that the user data does not reach an untrustworthy recipient. As all the personal datas are highly confidential, it is very important that the data does not go in the hands of an untrustworthy recipient.

3. Prevent an attacker from abusing a users data


The attacker must be prevented from exploiting the user data. Misusing of user data can be a cause to many problems. So, the attacker must be stopped from abusing the user data.

4. Prevent an attacker from modifying a users account


The attacker must be prevented from modifying the user account. Modifying of user account can be a cause to many problems. So, the attacker must be stopped from modifying the user account.
Toc H Institute of Science & Technology Arakkunnam 682 313

Page No.15

Semester: VIII

Branch: I.T.

Project Title: Cantina Antiphishing

5. Prevent an attacker from viewing a users account


The attacker must be prevented from viewing the user account. Most of the confidential letters or data of user will be there in his/her account so, viewing all the data can be a cause to many problems. So, the attacker must be stopped from viewing the user account.

Toc H Institute of Science & Technology Arakkunnam 682 313

Page No.16

Semester: VIII

Branch: I.T.

Project Title: Cantina Antiphishing

PROBLEM ANALYSIS AND DESIGN

Toc H Institute of Science & Technology Arakkunnam 682 313

Page No.17

Semester: VIII

Branch: I.T.

Project Title: Cantina Antiphishing

4. PROBLEM ANALYSIS AND DESIGN

4.1. Existing System


Phishing attacks succeed by exploiting a users inability to distinguish legitimate sites from spoofed sites. Most prior research focuses on assisting the user in making this distinction; however, users must make the right security decision every time. Unfortunately, humans are ill-suited for performing the security checks necessary for secure site identification, and a single mistake may result in a total compromise of the users online account. Fundamentally, users should be authenticated using information that they cannot readily reveal to malicious parties. Placing less reliance on the user during the authentication process will enhance security and eliminate many forms of fraud. We propose using a trusted device to perform mutual authentication that eliminates reliance on perfect user behavior, thwarts Man-in-the-Middle attacks after setup, and protects a users account even in the presence of key loggers and most forms of spyware.

4.2. Proposed System


We advocate the following set of design principles for anti-phishing tools. Many anti-phishing approaches face the same problem as anti-spam solutions: incremental solutions only provoke an ongoing arms race between researchers and adversaries. This typically gives the advantage to the attackers, since researchers are permanently stuck on the defensive. As soon as researchers introduce an improvement, attackers analyse it and develop a new twist on their current attacks that allows them to evade the new defenses. Instead, we need to research fundamental approaches for preventing phishing. . Most anti-phishing techniques strive to prevent phishing attacks by providing better authentication of the server. However, phishing actually exploits authentication failures on both the client and the server side. Initially, a phishing attack exploits the users inability to properly authenticate a server before transmitting sensitive data. However, a second authentication failure occurs when the server allows the phisher to use the captured data to login as the victim. A complete anti-phishing
Toc H Institute of Science & Technology Arakkunnam 682 313

Page No.18

Semester: VIII

Branch: I.T.

Project Title: Cantina Antiphishing

solution must address both of these failures: clients should have strong guarantees that they are communicating with the intended recipient, and servers should have similarly strong guarantees that the client requesting service has a legitimate claim to the accounts it attempts to access. Reduce reliance on users. The majority of current phishing countermeasures rely on users to assist in the detection of phishing sites and make decisions as to whether to continue are in many ways unsuited to

authenticating others or themselves to others. As a result, we must move towards protocols that reduce human involvement or introduce additional information that cannot readily be revealed. These mechanisms add security without relying on perfectly correct user behaviour, thus bringing security to a larger audience. Avoid dependence on the browsers interface. The majority of current antiphishing approaches propose modifications to the browser interface. Unfortunately, the browser interface is inherently insecure and can be easily circumvented by embedded JavaScript applications that mimic the trusted browser elements. Software Specification : Front end Back end Operating System IDE

: : : :

Java SQL Server Windows7 Visual Studio

Hardware Specification :

Processor System Bus RAM HDD Display

: : : : :

intel i3 32 BIT 3 GB 40 GB SVGA Color

Toc H Institute of Science & Technology Arakkunnam 682 313

Page No.19

Semester: VIII

Branch: I.T.

Project Title: Cantina Antiphishing

4.3.MODULES
Proxy Server It is a server that acts as an intermediary for requests from clients seeking resources from other servers. A client connects to the proxy server, requesting some service, such as a file, connection, web page, or other resource, available from a different server. The proxy server evaluates the request according to its filtering rules.

Server (Web Application) This system consists of server and client application.

4.3.1. User Management And login


This modules deals with the registration of local users and management of users. The management includes Delete , View and Edit details of local users. The Administrator have the previlage of Delete and View Users. The registration and Edit details are the privileges of local users. The system Authentication is achieved through the login process. Only the registered user can login into the system. So that the outsiders cant acess the system. While signing into the system the users should provide a username and password which is already chosen during the registration. If the username and password is not exist, the user cant login. It means that the user is not registered. Administrator username and password are already saved in database. After logging in the users(Administrator, User) can change their password.

Report This module can be done by the administrator of the system. While the user is trying to access a site(browsing a site), first the request will be accepted by the administrator. After getting the url, the administrator will check whether the site is a phishing site or not. If he found that the site is a hacker site, he will alert the user by giving option for continue / discontinue from the page.
Toc H Institute of Science & Technology Arakkunnam 682 313

Page No.20

Semester: VIII

Branch: I.T.

Project Title: Cantina Antiphishing

4.3.2. Anti-Phishing System


This is the main module in this system. By using the following techniques the administrator can found the site is a phishing site or not.

Age of Domain This heuristic checks the age of the domain name. Many phishing sites have domains that are registered only a few days before phishing emails are sent out. We use a WHOIS search to implement this heuristic. This heuristic measures the number of months from when the domain name was first registered. If the page has been registered longer than 12 months, the heuristic will return +1, deeming it as legitimate, and otherwise returns -1, deeming it as phishing. If the WHOIS server cannot find the domain, the heuristic will simply return -1, deeming it as a phishing page.

Suspicious URL This heuristic checks if a pages URL contains an at (@) or a dash ( -) in the domain name. An @ symbol in a URL causes the string to the left to be disregarded, with the string on the right treated as the actual URL for retrieving the page. Suspicious Links This heuristic applies the URL check above to all the links on the page. If any link on a page fails this URL check, then the page is labeled as a possible phishing scam. This heuristic is also used by SpoofGuard. IP Address This heuristic checks if a pages domain name is an IP address. This heuristic is also used in PILFER. If any domain name as like an IP address indicate that the site is a phishing page. Dots in URL This heuristic checks the number of dots in a pages URL. We found that phishing pages tend to use many dots in their URLs but legitimate sites usually do not. Currently, this heuristic labels a page as phish if there are 5 or more dots.
Toc H Institute of Science & Technology Arakkunnam 682 313

Page No.21

Semester: VIII

Branch: I.T.

Project Title: Cantina Antiphishing

4.4 SYSTEM DESIGN

Figure 4.4 System Architecture

4.5 USE CASE DIAGRAM

Figure 4.5 Use Case Diagram

Toc H Institute of Science & Technology Arakkunnam 682 313

Page No.22

Semester: VIII

Branch: I.T.

Project Title: Cantina Antiphishing

The above use case diagram shows the connection of client and server to the modules. The client and server has the connection to all the modules. The client and server has separate functions for each module.

4.6. HIGH LEVEL DESIGN

Figure 4.6 High level design

Toc H Institute of Science & Technology Arakkunnam 682 313

Page No.23

Semester: VIII

Branch: I.T.

Project Title: Cantina Antiphishing

4.7. TABLES
Login :
Field Uname Password Status Usertype Data type Varchar(50) Varchar(50) Int Varchar(50) Constraints Not_Null Not_Null Not_Null Not_Null

Fig: 4.7.1.Login table This table is used to login to the homepage.

Register :
Field FName LName ID Data type Varchar(50) Varchar(50) Int Constraints Not_Null Not_Null Not_Null Fig: 4.7.2.Registration table This table is used for registration. URL category :

Field url Ip adres ID

Data type Varchar(50) Int Int

Constraints Not_Null Not_Null Not_Null

Fig: 4.7.3.URL Category table This table is used to store the IP address of the system, and URL. Based on this the sites accessed are categorized into the gray list and white list.
Toc H Institute of Science & Technology Arakkunnam 682 313

Page No.24

Semester: VIII

Branch: I.T.

Project Title: Cantina Antiphishing

4.8. ER-DIAGRAM

Figure 4.8 ER-Diagram

The ER diagram contains three entities login, register and url. Each entity has its own attributes.

Toc H Institute of Science & Technology Arakkunnam 682 313

Page No.25

Semester: VIII

Branch: I.T.

Project Title: Cantina Antiphishing

4.9. DATA FLOW DIAGRAM

Toc H Institute of Science & Technology Arakkunnam 682 313

Page No.26

Semester: VIII

Branch: I.T.

Project Title: Cantina Antiphishing

Toc H Institute of Science & Technology Arakkunnam 682 313

Page No.27

Semester: VIII

Branch: I.T.

Project Title: Cantina Antiphishing

Figure 4.9.Data flow diagram

Toc H Institute of Science & Technology Arakkunnam 682 313

Page No.28

Semester: VIII

Branch: I.T.

Project Title: Cantina Antiphishing

IMPLEMENTATION

Toc H Institute of Science & Technology Arakkunnam 682 313

Page No.29

Semester: VIII

Branch: I.T.

Project Title: Cantina Antiphishing

5.SOURCE CODES
5.1.Login form <center> <div> <form action="../LoginServlet" method="post"> <table border="0"> <tr> <th colspan="2">Login</th> </tr> <% String f = request.getParameter("f"); if (f == null) { f = ""; } %> <tr> <td colspan="2">&nbsp; <% if (f.equals("1")) { %> <font color="red">Login Failed.</font> <% (f.equals("2")) { %> <font color="red">Blocked Account.</font> <% } %> </td> </tr>

} else if

<tr> <td>UserName:</td> <td><input type="text" name="username"/></td> </tr> <tr> <td>Password:</td> <td><input type="password" name="password"/></td> </tr> <tr> <td colspan="2" style="text-align: center;"><input type="submit" value="Login"/></td> </tr>
Toc H Institute of Science & Technology Arakkunnam 682 313

Page No.30

Semester: VIII

Branch: I.T.

Project Title: Cantina Antiphishing

</table> </form> </div> </center> </li> </ul> </div> <!-- end #sidebar --> <div style="clear: both;">&nbsp;</div> </div> <!-- end #page --> <div id="footer"> <p>Copyright (c) 2011 www.antiphishing.com.</p> </div> <!-- end #footer --> </body> </html>

5.2.Registration

<% String firstName = ""; String lastName = ""; String userName = ""; String Email = ""; String Mobile = ""; String Address = ""; Object firstNameObj = session.getAttribute("fname"); Object lastNameObj = session.getAttribute("lname"); Object userNameObj = session.getAttribute("uname"); Object EmailObj = session.getAttribute("email"); Object MobileObj = session.getAttribute("mobile"); Object AddressObj = session.getAttribute("address"); if (firstNameObj != null) { firstName = firstNameObj.toString(); } if (lastNameObj != null) { lastName = lastNameObj.toString(); } if (userNameObj != null) { userName = userNameObj.toString(); }
Toc H Institute of Science & Technology Arakkunnam 682 313

Page No.31

Semester: VIII

Branch: I.T.

Project Title: Cantina Antiphishing

if (EmailObj != null) { Email = EmailObj.toString(); } if (MobileObj != null) { Mobile = MobileObj.toString(); } if (AddressObj != null) { Address = AddressObj.toString(); }

%> <form action="../RegistrationActionServlet" method="post"> <table border="0"> <tr> <th colspan="2"><h3><b>Registration</b></h3></th> </tr> <% String f = request.getParameter("f"); if (f == null) { f = ""; } %> <tr> <td colspan="2">&nbsp; <% if (f.equals("1")) { %> <font color="red">Enter All Fields!!!!</font> <% (f.equals("2")) { %> <font color="red">Password and confirm Password should be same!!!</font> <% (f.equals("3")) { %> <font color="red">Username not available!!!</font> <% } %> </td> </tr> <tr> <td>First Name</td>
Toc H Institute of Science & Technology Arakkunnam 682 313

} else if

} else if

Page No.32

Semester: VIII

Branch: I.T.

Project Title: Cantina Antiphishing

<td><input type="text" name="firstname" style="width: 170px;" value="<%=firstName%>"/></td> </tr> <tr> <td>Last Name</td> <td><input type="text" name="lastname" style="width: 170px;" value="<%=lastName%>"/></td> </tr> <tr> <td>Login Name</td> <td><input type="text" name="loginname" style="width: 170px;" value="<%=userName%>"/></td> </tr> <tr> <td>Password</td> <td><input type="password" name="password" style="width: 170px;"/></td> </tr> <tr> <td>Confirm Password</td> <td><input type="password" name="confirmpassword" style="width: 170px;"/></td> </tr> <tr> <td>E-Mail</td> <td><input type="text" name="email" style="width: 170px;" value="<%=Email%>"/></td> </tr> <tr> <td>Mobile</td> <td><input type="text" name="mobile" style="width: 170px;" value="<%=Mobile%>"/></td> </tr> <tr> <td>Address</td> <td><textarea name="address" style="width: 170px;" value="<%=Address%>" rows="5"></textarea> </td> </tr> <tr> <td colspan="2" style="text-align: center;"><input type="submit" value="Save"/></td> </tr> </table> </form> </div> </center>
Toc H Institute of Science & Technology Arakkunnam 682 313

Page No.33

Semester: VIII

Branch: I.T.

Project Title: Cantina Antiphishing

</div> </div> </div><!-- end #content --> <div id="sidebar"> </div> <!-- end #sidebar --> <div style="clear: both;">&nbsp;</div> </div> <!-- end #page --> <div id="footer"> <p>Copyright (c) 2011 www.antiphishing.com.</p> </div> <!-- end #footer --> </body> </html> 5.3.client application

public class SystemTraySupport { private String icon = "icon/icon.png"; private String applicationName; private PopupMenu popup; private TrayIcon trayIcon; public void setPopup(PopupMenu popup) { this.popup = popup; } public void setIcon(String icon) { this.icon = icon; } public SystemTraySupport(String applicationName) { this.applicationName = applicationName; popup = new PopupMenu(); initSystemTray(); } public void addMenuListener(String menuItemName, ActionListener listener) { MenuItem menuItem = new MenuItem(menuItemName); menuItem.addActionListener(listener); popup.add(menuItem); }
Toc H Institute of Science & Technology Arakkunnam 682 313

Page No.34

Semester: VIII

Branch: I.T.

Project Title: Cantina Antiphishing

public void showInfoMessage(String message) { trayIcon.displayMessage(applicationName, TrayIcon.MessageType.INFO); } public void showErrorMessage(String message) { trayIcon.displayMessage(applicationName, TrayIcon.MessageType.ERROR); } public void showMessage(String message) { trayIcon.displayMessage(applicationName, TrayIcon.MessageType.NONE); } public void showWarningMessage(String message) { trayIcon.displayMessage(applicationName, TrayIcon.MessageType.WARNING); } private void initSystemTray() { if (SystemTray.isSupported()) { SystemTray tray = SystemTray.getSystemTray(); Image image = Toolkit.getDefaultToolkit().getImage(icon);

message,

message,

message,

message,

trayIcon = new TrayIcon(image, applicationName + " Running...", popup); trayIcon.setImageAutoSize(true); try { tray.add(trayIcon); } catch (Exception e) { System.err.println("TrayIcon could not be added."); } } else { System.out.println("System Tray is not supported"); } } }

Toc H Institute of Science & Technology Arakkunnam 682 313

Page No.35

Semester: VIII

Branch: I.T.

Project Title: Cantina Antiphishing

5.4.Server application public class NetworkProperties { private final String DIRECTORY = "setting"; private final String FILE_NAME = "network_config.xml"; //"database.properties"; private static NetworkProperties networkProperties; private String internetGatewayIP; private int internetGatewayPort; private String webServerIP; private int webServerPort; public String getInternetGatewayIP() { return internetGatewayIP; } public void setInternetGatewayIP(String internetGatewayIP) { this.internetGatewayIP = internetGatewayIP; } public int getInternetGatewayPort() { return internetGatewayPort; } public void setInternetGatewayPort(int internetGatewayPort) { this.internetGatewayPort = internetGatewayPort; } public String getWebServerIP() { return webServerIP; } public void setWebServerIP(String webServerIP) { this.webServerIP = webServerIP; } public int getWebServerPort() { return webServerPort; } public void setWebServerPort(int webServerPort) { this.webServerPort = webServerPort; } public NetworkProperties() { if (!loadPropertiesFormXMLFile()) { createBlankPropertyXMLFile();
Toc H Institute of Science & Technology Arakkunnam 682 313

Page No.36

Semester: VIII

Branch: I.T.

Project Title: Cantina Antiphishing

} } public static NetworkProperties getInstance() { if (networkProperties == null) { networkProperties = new NetworkProperties(); } return networkProperties; } public void save() { Properties properties = new Properties(); properties.setProperty("InternetGatewayIP", internetGatewayIP); properties.setProperty("InternetGatewayPort", Integer.toString(internetGatewayPort)); properties.setProperty("WebServerIP", webServerIP); properties.setProperty("WebServerPort", Integer.toString(webServerPort)); File file = new File(DIRECTORY); if (!file.exists()) { file.mkdirs(); } file = new File(file, FILE_NAME); FileOutputStream out = null; try { out = new FileOutputStream(file); properties.storeToXML(out, null); } catch (IOException e) { e.printStackTrace(); } finally { try { out.close(); } catch (Exception e) { } } } private boolean loadPropertiesFormXMLFile() { FileInputStream in = null; try { File file = new File(DIRECTORY, FILE_NAME); in = new FileInputStream(file); Properties properties = new Properties(); properties.loadFromXML(in); internetGatewayIP = properties.getProperty("InternetGatewayIP").trim();

Toc H Institute of Science & Technology Arakkunnam 682 313

Page No.37

Semester: VIII

Branch: I.T.

Project Title: Cantina Antiphishing

internetGatewayPort Integer.parseInt(properties.getProperty("InternetGatewayPort").trim()); webServerPort Integer.parseInt(properties.getProperty("WebServerPort").trim()); webServerIP = properties.getProperty("WebServerIP").trim(); return true; } catch (IOException e) { e.printStackTrace(); return false; } finally { try { in.close(); } catch (Exception e) { } } } private void createBlankPropertyXMLFile() { internetGatewayIP = "0.0.0.0"; internetGatewayPort = 0; webServerPort = 0; webServerIP = "0.0.0.0"; save(); } public static void main(String[] args) { NetworkProperties pro = NetworkProperties.getInstance(); System.out.println(pro.getInternetGatewayPort()); System.out.println(pro.getWebServerPort()); System.out.println(pro.getInternetGatewayIP()); System.out.println(pro.getWebServerIP()); pro.save(); } }

= =

Toc H Institute of Science & Technology Arakkunnam 682 313

Page No.38

Semester: VIII

Branch: I.T.

Project Title: Cantina Antiphishing

TESTING

Toc H Institute of Science & Technology Arakkunnam 682 313

Page No.39

Semester: VIII

Branch: I.T.

Project Title: Cantina Antiphishing

6. TESTING
Testing performs a critical role for quality assurance and for the reliability of the software. Testing forms is the first step in determining the errors in the program. Different levels of testing are used in the testing process, each level of testing aims to test different aspects of the software. The basic levels used are : unit testing and integration testing. There are several strategies that are used in the system. Unit testing Integration testing

6.1 UNIT TESTING


Unit testing is the first level of testing. It is a software verification and validation method in which a programmer tests if individual units of source code are fit to use. The unit testing is performed by the programmer prior to integration of individual units or modules into a larger system. The testing is carried out in the coding stage itself and each module is found to be working satisfactorily as regards to the expected output from the module.

6.2 INTEGRATION TESTING


In this testing we test our project as a whole, combining the whole modules. Integration testing sis a systematic technique for constructing the program structure while conducting tests to uncover errors associated with the interfacing between modules. The individual units are combined with other units to make sure that necessary communication, links and data sharing occur properly.

Toc H Institute of Science & Technology Arakkunnam 682 313

Page No.40

Semester: VIII

Branch: I.T.

Project Title: Cantina Antiphishing

6.1.1 TEST CASE ADMIN LOGIN


Tc No 1. UNIT INPUT EXPECTED OUTPUT OUTPUT Login Username Checks the form and password of the database and redirects to OBTAINED Error message STAT US Failure Incorrect usernam e or REASON REME DY Enter the correct

password userna me and passw ord

administra the admin tor. home. Else shows error message box 2. Login Username Checks the form and password database and redirects to the admin home. Else shows error message box Redirected to the admin home Succe ss

Table: 6.1.1: Admin Login

Toc H Institute of Science & Technology Arakkunnam 682 313

Page No.41

Semester: VIII

Branch: I.T.

Project Title: Cantina Antiphishing

6.1.2 TEST CASE CANTINA WEBSIDE


Tc No 1. Login No input form required. UNIT INPUT EXPECTED OUTPUT OUTPUT OBTAINED STAT US Failure Code error. REASON REME DY Verify and correct the code.

Redirects to No the blind home page within ten seconds. Else remains in the login form. redirection.

2.

Login No input form required.

Redirects to Redirected the blind home page within ten seconds. Else remains in the login form. to the blind home page.

Succe ss

Table: 6.1.2: Cantina Web side

Toc H Institute of Science & Technology Arakkunnam 682 313

Page No.42

Semester: VIII

Branch: I.T.

Project Title: Cantina Antiphishing

6.2

INTEGRATION TESTING

6.2.1. TEST CASE CLIENT APPLICATION


Tc No UNIT INPUT EXPECTE D OUTPUT OUTPU T OBTAIN ED 1. Downl oad access Enter the username and password. The contents pertaining to the option are made available to the user. 2. File Setup Click the link to download the software. The software should be download ed successful ly. Jar file could not be downloa ded. Failure Setup fail error. Proper setup of jar file. No further navigati on. Failure The user has been blocked by administra tor. Unbloc k the user. STATUS REASON REME DY

Table: 6.2.1: Client Application

Toc H Institute of Science & Technology Arakkunnam 682 313

Page No.43

Semester: VIII

Branch: I.T.

Project Title: Cantina Antiphishing

7. FUTURE SCOPE
In the coming months, the Anti-Phishing Working Group will be working with technology partners, government regulators, and leaders in industries being victimized by phishing attacks to sculpt the most efficient and elegant digital signing and authentication solution that can be employed by large user communities. We are sending calls to action to all stakeholders in the hopes that by year end, a consensus will be formed among them as to how the effected industries can establish a digital signing and authentication regime that can be deployed post haste to end phishing attacks, preclude regulatory adventurism and, ultimately, to establish a technological foundation for a broader spam-proof electronic mailing infrastructure.

Toc H Institute of Science & Technology Arakkunnam 682 313

Page No.44

Semester: VIII

Branch: I.T.

Project Title: Cantina Antiphishing

CONCLUSIONS

Toc H Institute of Science & Technology Arakkunnam 682 313

Page No.45

Semester: VIII

Branch: I.T.

Project Title: Cantina Antiphishing

8. CONCLUSIONS
CANTINA, a novel content-based approach for detecting phishing web sites. CANTINA takes Robust Hyperlinks, an idea for overcoming page not found problems using the well-known Term Frequency / Inverse Document Frequency (TF-IDF) algorithm, and applies it to anti-phishing. We described our implementation of CANTINA, and discussed some simple heuristics that can be applied to reduce false positives. We also presented an evaluation of CANTINA, showing that the pure TFIDF approach can catch about 97% phishing sites with about 6% false positives, and after combining some simple heuristics we are able to catch about 90% of phishing sites with only 1% false positives.

Toc H Institute of Science & Technology Arakkunnam 682 313

Page No.46

Semester: VIII

Branch: I.T.

Project Title: Cantina Antiphishing

SCREENSHOTS

Toc H Institute of Science & Technology Arakkunnam 682 313

Page No.47

Semester: VIII

Branch: I.T.

Project Title: Cantina Antiphishing

Login Form

Toc H Institute of Science & Technology Arakkunnam 682 313

Page No.48

Semester: VIII

Branch: I.T.

Project Title: Cantina Antiphishing

Admin Home Page

Toc H Institute of Science & Technology Arakkunnam 682 313

Page No.49

Semester: VIII

Branch: I.T.

Project Title: Cantina Antiphishing

Users List

Toc H Institute of Science & Technology Arakkunnam 682 313

Page No.50

Semester: VIII

Branch: I.T.

Project Title: Cantina Antiphishing

Whitelist

Toc H Institute of Science & Technology Arakkunnam 682 313

Page No.51

Semester: VIII

Branch: I.T.

Project Title: Cantina Antiphishing

Blacklist

Toc H Institute of Science & Technology Arakkunnam 682 313

Page No.52

Semester: VIII

Branch: I.T.

Project Title: Cantina Antiphishing

REFERENCE
[1] 3Sharp, 3Sharp Study finds Internet Explorer 7 Edges Out Netcraft As Most Accurate for Anti-Phishing Protection. 6.http://www.3sharp.com/projects/antiphishing/

[2]

Anti-Phishing Working Group,

Phishing Activity Trends

Report. 2006.

Zttp://www.antiphishing.org/reports/ apwg_report_june_06.pdf

[3]

Anti-Phishing

Working

Group

(APWG).

Visited:

Nov

20,

2006.http://www.antiphishing.org/

[4] Chou, N., R. Ledesma, Y. Teraguchi, D. Boneh, and J.C. Mitchell. Client-Side Defense against Web-Based Identity Theft. In Proceedings of The 11th Annual Network and Distributed System Security Symposium (NDSS '04).

http://crypto.stanford.edu/SpoofGuard/webspoof.pdf

[5] Cloudmark Inc. Visited: Nov 20, 06.http://www.cloudmark.com/desktop/download/ [6] Robert T. Morris, A Weakness in the 4.2BSD UNIX TCP/IP Software. Computing Science Technical Report 117, AT&T Bell Laboratories, February 1985.

[7] Rod Rasmussen, Phishing Prevention: Making Yourself a Hard Target. Internet Identity / APWG (April 5, 2004).

[8] Blake Ross, Nick Miyake, Robert Ledesma, Dan Boneh and John C. Mitchell, A Simple Solution to the Unique Password Problem. [9] Z. Ye. Building Trusted Paths for Web Browsers. Masters Thesis. Department of Computer Science, Dartmouth College. May 2002

[10] Zishuang (Eileen) Ye and Sean W. Smith, Trusted Paths for Browsers, 11th Usenix Security Symposium, August 2002.
Toc H Institute of Science & Technology Arakkunnam 682 313

Page No.53

Vous aimerez peut-être aussi