Cantina Antiphishing

Semester: VIII
Branch: I.T.
Project Title: Cantina Antiphishing
INTRODUCTION
Toc H Institute of Science & Technology Arakkunnam 682 313
Page No.1
Semester: VIII
Branch: I.T.
1. INTRODUCTION
In phishing, an automated form of social engineering, criminals use the Internet to fraudulently extract sensitive information from businesses and individuals, often by impersonating legitimate web sites. The potential for high rewards (e.g., through access to bank accounts and credit card numbers), the ease of sending forged email messages impersonating legitimate authorities, and the difficulty law enforcement has in pursuing the criminals has resulted in a surge of phishing attacks: estimates suggest that phishing affected 1.2 million U.S. Citizens and cost businesses billions of dollars in 2004 alone. Phishing also leads to additional business losses due to consumer fear. Anecdotal evidence suggests that an increasing number of people shy away from Internet commerce due to the threat of identity fraud, despite the tendency of US companies to assume the risk for fraud. Also, many users now default to distrusting any email they receive from financial institutions Current phishing attacks are still relatively modest in sophistication and have substantial room for improvement, as we discuss in Section 2.2. Thus, the research community and corporations need to make a concentrated effort to combat the increasingly severe economic consequences of phishing. Unfortunately, as we discuss in Section 8, current anti-phishing techniques do not offer adequate safeguards for ordinary users. We present three main contributions in this paper. First, we propose several design principles needed to counter phishing attacks: 1) sidestep the arms race, 2) provide mutual authentication. Phishing attacks succeed by exploiting a users inability to disti nguish legitimate sites from spoofed sites. Most prior research focuses on assisting the user in making this distinction; however, users must make the right security decision every time. Unfortunately, humans are ill-suited for performing the security checks necessary for secure site identification, and a single mistake may result in a total compromise of the users online account. Fundamentally, users should be authenticated using information that they cannot readily reveal to malicious parties. Placing less reliance on the user during the authentication process will enhance security and eliminate many forms of fraud. We propose using a trusted device to
Page No.2
Semester: VIII
Branch: I.T.
perform mutual authentication that eliminates reliance on perfect user behavior, thwarts Man-in-the-Middle attacks after setup, and protects a users account even in the presence of keyloggers and most forms of spyware.
We advocate the following set of design principles for anti-phishing tools. Many anti-phishing approaches face the same problem as anti-spam solutions: incremental solutions only provoke an ongoing arms race between researchers and adversaries. This typically gives the advantage to the attackers, since researchers are permanently stuck on the defensive. Instead, we need to research fundamental approaches for preventing phishing. Most anti-phishing techniques strive to prevent phishing attacks by providing better authentication of the server. However, phishing actually exploits authentication failures on both the client and the server side. Initially, a phishing attack exploits the users inability to properly authenticate a server before transmitting sensitive data.
However, a second authentication failure occurs when the server allows the phisher to use the captured data to login as the victim. A complete anti-phishing solution must address both of these failures: clients should have strong guarantees that they are communicating with the intended recipient, and servers should have similarly strong guarantees that the client requesting service has a legitimate claim to the accounts it attempts to access. Reduce reliance on users. The majority of current phishing countermeasures rely on users to assist in the detection of phishing sites and make decisions as to whether to continue are in many ways unsuited to authenticating others or themselves to others. As a result, we must move towards protocols that reduce human involvement or introduce additional information that cannot readily be revealed. These mechanisms add security without relying on perfectly correct user behaviour, thus bringing security to a larger audience. Avoid dependence on the browsers interface. The majority of current antiphishing approaches propose modifications to the browser interface. Unfortunately, the browser interface is inherently insecure and can be easily circumvented by embedded JavaScript applications that mimic the trusted browser elements.
Page No.3
Semester: VIII
Branch: I.T.
LITERATURE SURVEY
Page No.4
Semester: VIII
Branch: I.T.
2. LITERATURE SURVEY
Recently, there has been a dramatic increase in phishing, a kind of attack in which victims are tricked by spoofed emails and fraudulent web sites into giving up personal information. Phishing is a rapidly growing problem, with 9,255 unique phishing sites reported in June of 2006 alone [1]. It is unknown precisely how much phishing costs each year since impacted industries are reluctant to release figures; estimates range from $1 billion to 2.8 billion per year. To respond to this threat, software vendors and companies have released a variety of anti-phishing toolbars. For example, eBay offers a free toolbar that can positively identify eBay-owned sites, and Google offers a free toolbar aimed at identifying any fraudulent site [2]. As of September 2006, the free software download site download.com, listed 84 antiphishing toolbars. However, when we conducted an evaluation of ten anti-phishing tools for a previous study, we found that only one tool could consistently detect more than 60% of phishing web sites without a high rate of false positives [3]. Thus, we argue that there is a strong need for better automated detection algorithms. In this paper, we present the design, implementation, and evaluation of CANTINA, 1 a novel content-based approach for detecting phishing web sites. CANTINA examines the content of a web page to determine whether it is legitimate or not, in contrast to other approaches that look at surface characteristics of a web page, for example the URL and its domain name. CANTINA makes use of the wellknown TF-IDF (term frequency/inverse document frequency) algorithm used in information retrieval [4], and more specifically, the Robust Hyperlinks algorithm previously developed by Phelps and Wilensky for overcoming broken hyperlinks. Our results show that CANTINA is quite good at detecting phishing sites, detecting 9497% of phishing sites. \We also show that we can use CANTINA in conjunction with heuristics used by other tools to reduce false positives (incorrectly labeling legitimate web sites as phishing), while lowering phish detection rates only slightly. We present a summary evaluation, comparing CANTINA to two popular anti-phishing toolbars that are representative of the most effective tools for detecting phishing sites currently available. Our experiments show that CANTINA has comparable or better
Page No.5
Semester: VIII
Branch: I.T.
performance to SpoofGuard (a heuristic-based anti-phishing tool) with far fewer false positives, and does about as well as NetCraft (a blacklist and heuristic-based antiphishing toolbar). Finally, we show that CANTINA combined with heuristics is effective at detecting phishing URLs in users' actual email, and that its most frequent mistake is labeling spam-related URLs as phishing. A number of studies have examined the reasons that people fall for phishing attacks. For example, Downs et al have described the results of an interview and role-playing study aimed at understanding why people fall for phishing emails and what cues they look for to avoid such attacks. In a different study, Dhamija et al. showed that a large number of people cannot differentiate between legitimate and phishing web sites, even when they are made aware that their ability to identify phishing attacks is being tested. Finally, Wu et al. studied three simulated antiphishing toolbars to determine how effective they were at preventing users from visiting web sites the toolbars had determined to be fraudulent. They found that many study participants ignored the toolbar security indicators and instead used the sites content to decide whether or not it was a scam.
Educating People about Phishing Attacks

Anti-phishing education has focused on online training materials, testing, and situated learning. Online training materials have been by government organizations , non-profits and businesses. These materials explain what phishing is and provide tips to prevent users from falling for phishing attacks. Testing is used to demonstrate how susceptible people are to phishing attacks and educate them on how to avoid them. For example, Mail Frontier has a web site containing screenshots of potential phishing emails. Users are scored based on how well they can identify which emails are legitimate and which are not. A third approach uses situated learning, where users are sent phishing emails to test users vulnerability of falling for attacks. At the end of the study, users are given materials that inform them about phishing attacks. This approach has been used in studies conducted by Indiana University in training students , West Point in instructing cadets and a New York State Office in educating employees. The New York study showed an improvement in the participants behavior in identifying phishing over those who were given a pamphlet containing the
Page No.6
Semester: VIII
Branch: I.T.
information on how to combat phishing. In previous work, we developed an emailbased approach to train people how to identify and avoid phishing attacks, demonstrating that the existing practice of sending security notices is ineffective, while a story-based approach using a comic strip format was surprisingly effective in teaching people about phishing.
Anti-Phishing User Interfaces

Other research has focused on the development of better user interfaces for anti-phishing tools. Some work looks at helping users determine if they are interacting with a trusted site. For example, Ye et al. and Dhamija and Tygar have developed prototype user interfaces showing trusted paths that help users verify that their browser has made a secure connection to a trusted site. Herzberg and Gbara have developed TrustBar, a browser add-on that uses logos and warnings to help users distinguish trusted and untrusted web sites. Other work has looked at how to facilitate logins, eliminating the need for end-users to identify whether a site is legitimate or not. For example, PwdHash transparently converts a user's password into a domain-specific password by sending only a one way hash of the password and domain-name. Thus, even if a user falls for a phishing site, the phishers would not see the correct password. The Lucent Personal Web Assistant and Password Multiplier used similar approaches to protect people. PassPet is a browser extension that makes it easier to login to known web sites, simply by pressing a single button. PassPet requires people to memorize only one password, and like PwdHash, generates a unique password for each site. Web Wallet is web browser extension designed to prevent users from sending personal data to the fake page. Web Wallet prevents people from typing personal information directly into a web site, instead requiring them to type a special keystroke to log into Web Wallet and then select their intended web site. Our work in this paper is orthogonal to this previous work, in that our algorithms could be used in conjunction with better user interfaces to provide a more effective solution. As Wu and Miller demonstrated, an anti-phishing toolbar could identify all fraudulent web sites without any false positives, but if it has usability problems, users might still fall victim to fraud.
Page No.7
Semester: VIII
Branch: I.T.
Automated Detection of Phishing

Anti-phishing services are now provided by Internet service providers, built into mail servers and clients, built into web browsers, and available as web browser toolbar. However, these services and tools do not effectively protect against all phishing attacks, as attackers and tool developers are engaged in a continuous arms race. Anti-phishing tools use two major methods for detecting phishing sites. The first is to use heuristics to judge whether a page has phishing characteristics. For example, some heuristics used by the SpoofGuard toolbar include checking the host name, checking the URL for common spoofing techniques, and checking against previously seen images. The second method is to use a blacklist that lists reported phishing URLs. For example, Cloudmark [5] relies on user ratings to maintain their blacklist. Some toolbars, such as Netcraft [6], seem to use a combination of heuristics plus a blacklist with URLs that are verified by paid employees. Both methods have pros and cons. For example, heuristics can detect phishing attacks as soon as they are launched, without the need to wait for blacklists to be updated. However, attackers may be able to design their attacks to avoid heuristic detection. In addition, heuristic approaches often produce false positives (incorrectly labeling a legitimate site as phishing). Blacklists may have a higher level of accuracy, but generally require human intervention and verification, which may consume a great deal of resources. At a recent Anti-Phishing Working Group meeting, it was reported that phishers are starting to use one-time URLs, which direct someone to a phishing site the first time the URL is used, but direct people to the legitimate site afterwards. This and other new phishing tactics significantly complicate the process of compiling a blacklist, and can reduce blacklists effectiveness. Our work with CANTINA focuses on developing and evaluating a new heuristic based on TF-IDF, a popular information retrieval algorithm. CANTINA not only makes use of surface level characteristics (as is done by other toolbars), but also analyzes the text-based content of a page itself. These heuristics were drawn primarily from SpoofGuard and from PILFER, an algorithm for detecting phishing emails [7].
Page No.8
Semester: VIII
Branch: I.T.
A Content-based approach for detecting phishing websites

CANTINA makes use of TF-IDF for detecting phishing sites. TFIDF is a wellknown information retrieval algorithm that can be used for comparing and classifying documents, as well as retrieving documents from a large corpus. In this section, we first review how TF-IDF works. We then introduce an application of TF-IDF called Robust Hyperlinks. Finally, we describe how we adapted Robust Hyperlinks for detecting phishing web sites.
How TDF/IF works

TF-IDF is an algorithm often used in information retrieval and text mining. TFIDF yields a weight that measures how important a word is to a document in a corpus. The importance increases proportionally to the number of times a word appears in the document, but is offset by the frequency of the word in the corpus. The term frequency (TF) is simply the number of times a given term appears in a specific document. This count is usually normalized to prevent a bias towards longer documents (which may have a higher term frequency regardless of the actual importance of that term in the document) to give a measure of the importance of the term within the particular document. The inverse document frequency (IDF) is a measure of the general importance of the term. Roughly speaking, the IDF measures how common a term is across an entire collection of documents. Thus, a term has a high TF-IDF weight by having a high term frequency in a given document (i.e. a word is common in a document) and a low document frequency in the whole collection of documents (i.e. is relatively uncommon in other documents)[8].
Robust Hyperlinks
Phelps and Wilensky developed the idea of Robust Hyperlinks to overcome the problem of broken links. The basic idea is to provide a number of alternative, independent descriptions of networked resources, that is, URLs. Specifically, Phelps and Wilensky proposed adding a small number of well-chosen terms, which they called a lexical signature, to URLs. An example of such a modified signature might be: When locating a web page, one could first try the basic URL. If the resource
Page No.9
Semester: VIII
Branch: I.T.
cannot be found, one could then supply the signature terms to a search engine to locate the document whose signature most closely matches that in the robust hyperlink. A key issue here is how to create signatures that have appropriate properties. First, signatures should be effective in picking out few documents. Second, subsequent changes to a document should have minimal impact on signature effectiveness. Third, the addition of new documents should have minimal impact on previous signature effectiveness. Finally, the effectiveness of the signature should be largely search-engine-independent. To meet these requirements, Phelps and Wilensky proposed using TF-IDF to generate lexical signatures. Specifically, they proposed calculating the TF-IDF value for each word in a document, and then selecting the words with highest value. The rationale here is that term frequency provides robustness (repeated words are less likely to all be deleted), while inverse document frequency provides rarity across a set of documents, minimizing the chance that another document will be added with the same term. Their preliminary empirical results suggest that lexical signatures of about five terms are sufficient to determine a web resource virtually uniquely, out of the more than one billion pages on the web[9]. Their experiments also showed that searching on lexical signatures often yielded a unique document, namely the desired document. In those few cases in which more than one document is returned, the desired document is among the highest ranked.
Adapting TF-IDF for detecting Phishing

The first is that criminals often create phishing sites by copying and then modifying a legitimate sites web pages so that personal information is redirected to the criminals rather than to the legitimate site. The second observation is that phishing sites often contain brand names and other terms that are common on a given web page but relatively rare across the web, leading us to hypothesize that, again, Robust Hyperlinks could be applied to find the owner of those brands[10]. Roughly, CANTINA works as follows: Given a web page, calculate the TF-IDF scores of each term on that web page.
Page No.10
Semester: VIII
Branch: I.T.
Generate a lexical signature by taking the five terms with highest TF-IDF weights. Feed this lexical signature to a search engine, which in our case is Google. If the domain name of the current web page matches the domain name of the N top search results, we consider it to be a legitimate web site. Otherwise, we consider it a phishing site.
It is also worth pointing out that, according to the Anti-Phishing Working Group (APWG), the average time that a phishing site stays online is 4.5 days. Our experiences show that sometimes it is on the order of hours. Furthermore, we argue that phishing web pages will have a low Google Page Rank due to a lack of links pointing to the scam. These two factors combined suggest that a phishing scam will rarely, if ever, be highly ranked. At the end of this paper, however, we discuss some ways of possibly subverting CANTINA. In an earlier implementation, we discovered that TF-IDF alone yields a fair number of false positives, labeling legitimate sites as phishing. To address this problem, we also add the current domain name to the lexical signature. For example, if the page is at http://www.ebay.com/xxxxx, then we add the term eBay to the lexical signature (even if it is already there). The rationale here is that if a page is legitimate, the domain name itself usually can best identify itself (e.g., ebay.com, paypal.com, bankofamerica.com).On the other hand, if the suspected page is phishing, no matter what we add onto its content, Google will not return it. Another design decision was what to do if Google returns zero search results. This sometimes happens because added domain names are sometimes meaningless (for example, u-s-j.be). To address this problem, if Google fails to return any result, we now label the suspected site as phishing (initially we labeled it as unknown). We refer to this as the Zero results Means Phishing heuristic (ZMP). This heuristic has the potential to increase false positives (incorrectly labeling a legitimate site as phishing), but our early experiments strongly suggest that when combined with adding the domain name to the lexical signature, this approach can reduce false positives while not impacting true positives.
Page No.11
Semester: VIII
Branch: I.T.
We developed our larger set of heuristics based on related work, drawing primarily from SpoofGuard and PILFER. We implemented each heuristic to return either -1 if it looks like a phishing page or +1 otherwise. Heuristics include:
Age of Domain
This heuristic checks the age of the domain name. Many phishing sites have domains that are registered only a few days before phishing emails are sent out. We use a WHOIS search to implement this heuristic. This heuristic measures the number of months from when the domain name was first registered. If the page has been registered longer than 12 months, the heuristic will return +1, deeming it as legitimate, and otherwise returns -1, deeming it as phishing. If the WHOIS server cannot find the domain, the heuristic will simply return -1, deeming it as a phishing page. The Netcraft and SpoofGuard toolbars use a similar heuristic based on the time since a domain name was registered. Note that this heuristic does not account for phishing sites based on existing web sites where criminals have broken into the web server, nor does it account for phishing sites hosted on otherwise legitimate domains, for example in space provided by an ISP for personal homepages.
Known Images
This heuristic checks whether a page contains inconsistent well-known logos. For example, if a page contains eBay logos but is not on an eBay domain, then this heuristic labels the site as a probable phishing page. Currently we store nine popular logos locally, including eBay, PayPal, Citibank, Bank of America, Fifth Third Bank, Barclays Bank, ANZ Bank, Chase Bank, and WellsFargo Bank. Eight of these nine legitimate sites are included in the PhishTank.com list of Top 10 Identified Targets. A similar heuristic is used by the SpoofGuard toolbar.
Suspicious URL
This heuristic checks if a pages URL contains an at (@) or a dash (-) in the domain name. An @ symbol in a URL causes the string to the left to be disregarded, with the string on the right treated as the actual URL for retrieving the page.
Page No.12
Semester: VIII
Branch: I.T.
Combined with the limited size of the browser address bar, this makes it possible to write URLs that appear legitimate within the address bar, but actually cause the browser to retrieve a different page. This heuristic is used by Mozilla FireFox. Dashes are also rarely used by legitimate sites, so we use this as another heuristic. SpoofGuard checks for both at symbols and dashes in URLs.
Suspicious Links
This heuristic applies the URL check above to all the links on the page. If any link on a page fails this URL check, then the page is labeled as a possible phishing scam. This heuristic is also used by SpoofGuard.
IP Address
This heuristic checks if a pages domain name is an IP address. This heuristic is also used in PILFER [16].
Dots in URL
This heuristic checks the number of dots in a pages URL. We found that phishing pages tend to use many dots in their URLs but legitimate sites usually do not. Currently, this heuristic labels a page as phish if there are 5 or more dots. This heuristic is also used in PILFER [16].
Forms
This heuristic checks if a page contains any HTML text entry forms asking for personal data from people, such as password and credit card number. We scan the HTML for <input> tags that accept text and are accompanied by labels such as credit card and password. Most phishing pages contain such forms asking for personal data, otherwise the criminals risk not getting the personal information they want.
Page No.13
Semester: VIII
Branch: I.T.
PROBLEM IDENTIFICATION AND DEFINING OBJECTIVES OF THE PROJECT
Page No.14
Semester: VIII
Branch: I.T.
3. PROBLEM IDENTIFICATION AND DEFINING OBJECTIVES OF THE PROJECT

Objectives of proposed system :
We enumerate the goals of an anti-phishing technique, arranged in decreasing order of protection and generality: 1. Ensure that a users data only goes to the intended recipient. 2. Prevent a users data from reaching an untrustworthy recipient. 3. Prevent an attacker from abusing a users data. 4. Prevent an attacker from modifying a users account. 5. Prevent an attacker from viewing a users account.
1. Ensure that a users data only goes to the intended recipient

One of the main objectives is authentication. Authentication means only the authenticated users should send or receive the data. Here, in our project, it should be ensured that the user data should be received only by the authenticated recipient.
2. Prevent a users data from reaching an untrustworthy recipient

Along with authentication it is equally important to ensure that the user data does not reach an untrustworthy recipient. As all the personal datas are highly confidential, it is very important that the data does not go in the hands of an untrustworthy recipient.
3. Prevent an attacker from abusing a users data

The attacker must be prevented from exploiting the user data. Misusing of user data can be a cause to many problems. So, the attacker must be stopped from abusing the user data.
4. Prevent an attacker from modifying a users account

The attacker must be prevented from modifying the user account. Modifying of user account can be a cause to many problems. So, the attacker must be stopped from modifying the user account.
Page No.15
Semester: VIII
Branch: I.T.
5. Prevent an attacker from viewing a users account

The attacker must be prevented from viewing the user account. Most of the confidential letters or data of user will be there in his/her account so, viewing all the data can be a cause to many problems. So, the attacker must be stopped from viewing the user account.
Page No.16
Semester: VIII
Branch: I.T.
PROBLEM ANALYSIS AND DESIGN
Page No.17
Semester: VIII
Branch: I.T.
4. PROBLEM ANALYSIS AND DESIGN
4.1. Existing System

Phishing attacks succeed by exploiting a users inability to distinguish legitimate sites from spoofed sites. Most prior research focuses on assisting the user in making this distinction; however, users must make the right security decision every time. Unfortunately, humans are ill-suited for performing the security checks necessary for secure site identification, and a single mistake may result in a total compromise of the users online account. Fundamentally, users should be authenticated using information that they cannot readily reveal to malicious parties. Placing less reliance on the user during the authentication process will enhance security and eliminate many forms of fraud. We propose using a trusted device to perform mutual authentication that eliminates reliance on perfect user behavior, thwarts Man-in-the-Middle attacks after setup, and protects a users account even in the presence of key loggers and most forms of spyware.
4.2. Proposed System

We advocate the following set of design principles for anti-phishing tools. Many anti-phishing approaches face the same problem as anti-spam solutions: incremental solutions only provoke an ongoing arms race between researchers and adversaries. This typically gives the advantage to the attackers, since researchers are permanently stuck on the defensive. As soon as researchers introduce an improvement, attackers analyse it and develop a new twist on their current attacks that allows them to evade the new defenses. Instead, we need to research fundamental approaches for preventing phishing. . Most anti-phishing techniques strive to prevent phishing attacks by providing better authentication of the server. However, phishing actually exploits authentication failures on both the client and the server side. Initially, a phishing attack exploits the users inability to properly authenticate a server before transmitting sensitive data. However, a second authentication failure occurs when the server allows the phisher to use the captured data to login as the victim. A complete anti-phishing
Page No.18
Semester: VIII
Branch: I.T.
solution must address both of these failures: clients should have strong guarantees that they are communicating with the intended recipient, and servers should have similarly strong guarantees that the client requesting service has a legitimate claim to the accounts it attempts to access. Reduce reliance on users. The majority of current phishing countermeasures rely on users to assist in the detection of phishing sites and make decisions as to whether to continue are in many ways unsuited to
authenticating others or themselves to others. As a result, we must move towards protocols that reduce human involvement or introduce additional information that cannot readily be revealed. These mechanisms add security without relying on perfectly correct user behaviour, thus bringing security to a larger audience. Avoid dependence on the browsers interface. The majority of current antiphishing approaches propose modifications to the browser interface. Unfortunately, the browser interface is inherently insecure and can be easily circumvented by embedded JavaScript applications that mimic the trusted browser elements. Software Specification : Front end Back end Operating System IDE
: : : :
Java SQL Server Windows7 Visual Studio
Hardware Specification :
Processor System Bus RAM HDD Display
: : : : :
intel i3 32 BIT 3 GB 40 GB SVGA Color
Page No.19
Semester: VIII
Branch: I.T.
4.3.MODULES
Proxy Server It is a server that acts as an intermediary for requests from clients seeking resources from other servers. A client connects to the proxy server, requesting some service, such as a file, connection, web page, or other resource, available from a different server. The proxy server evaluates the request according to its filtering rules.
Server (Web Application) This system consists of server and client application.
4.3.1. User Management And login

This modules deals with the registration of local users and management of users. The management includes Delete , View and Edit details of local users. The Administrator have the previlage of Delete and View Users. The registration and Edit details are the privileges of local users. The system Authentication is achieved through the login process. Only the registered user can login into the system. So that the outsiders cant acess the system. While signing into the system the users should provide a username and password which is already chosen during the registration. If the username and password is not exist, the user cant login. It means that the user is not registered. Administrator username and password are already saved in database. After logging in the users(Administrator, User) can change their password.
Report This module can be done by the administrator of the system. While the user is trying to access a site(browsing a site), first the request will be accepted by the administrator. After getting the url, the administrator will check whether the site is a phishing site or not. If he found that the site is a hacker site, he will alert the user by giving option for continue / discontinue from the page.
Page No.20
Semester: VIII
Branch: I.T.
4.3.2. Anti-Phishing System

This is the main module in this system. By using the following techniques the administrator can found the site is a phishing site or not.
Age of Domain This heuristic checks the age of the domain name. Many phishing sites have domains that are registered only a few days before phishing emails are sent out. We use a WHOIS search to implement this heuristic. This heuristic measures the number of months from when the domain name was first registered. If the page has been registered longer than 12 months, the heuristic will return +1, deeming it as legitimate, and otherwise returns -1, deeming it as phishing. If the WHOIS server cannot find the domain, the heuristic will simply return -1, deeming it as a phishing page.
Suspicious URL This heuristic checks if a pages URL contains an at (@) or a dash ( -) in the domain name. An @ symbol in a URL causes the string to the left to be disregarded, with the string on the right treated as the actual URL for retrieving the page. Suspicious Links This heuristic applies the URL check above to all the links on the page. If any link on a page fails this URL check, then the page is labeled as a possible phishing scam. This heuristic is also used by SpoofGuard. IP Address This heuristic checks if a pages domain name is an IP address. This heuristic is also used in PILFER. If any domain name as like an IP address indicate that the site is a phishing page. Dots in URL This heuristic checks the number of dots in a pages URL. We found that phishing pages tend to use many dots in their URLs but legitimate sites usually do not. Currently, this heuristic labels a page as phish if there are 5 or more dots.
Page No.21
Semester: VIII
Branch: I.T.
4.4 SYSTEM DESIGN
Figure 4.4 System Architecture
4.5 USE CASE DIAGRAM
Figure 4.5 Use Case Diagram
Page No.22
Semester: VIII
Branch: I.T.
The above use case diagram shows the connection of client and server to the modules. The client and server has the connection to all the modules. The client and server has separate functions for each module.
4.6. HIGH LEVEL DESIGN
Figure 4.6 High level design
Page No.23
Semester: VIII
Branch: I.T.
4.7. TABLES
Login :
Field Uname Password Status Usertype Data type Varchar(50) Varchar(50) Int Varchar(50) Constraints Not_Null Not_Null Not_Null Not_Null
Fig: 4.7.1.Login table This table is used to login to the homepage.
Register :
Field FName LName ID Data type Varchar(50) Varchar(50) Int Constraints Not_Null Not_Null Not_Null Fig: 4.7.2.Registration table This table is used for registration. URL category :
Field url Ip adres ID
Data type Varchar(50) Int Int
Constraints Not_Null Not_Null Not_Null
Fig: 4.7.3.URL Category table This table is used to store the IP address of the system, and URL. Based on this the sites accessed are categorized into the gray list and white list.
Page No.24
Semester: VIII
Branch: I.T.
4.8. ER-DIAGRAM
Figure 4.8 ER-Diagram
The ER diagram contains three entities login, register and url. Each entity has its own attributes.
Page No.25
Semester: VIII
Branch: I.T.
4.9. DATA FLOW DIAGRAM
Page No.26
Semester: VIII
Branch: I.T.
Page No.27
Semester: VIII
Branch: I.T.
Figure 4.9.Data flow diagram
Page No.28
Semester: VIII
Branch: I.T.
IMPLEMENTATION
Page No.29
Semester: VIII
Branch: I.T.
5.SOURCE CODES
5.1.Login form <center> <div> <form action="../LoginServlet" method="post"> <table border="0"> <tr> <th colspan="2">Login</th> </tr> <% String f = request.getParameter("f"); if (f == null) { f = ""; } %> <tr> <td colspan="2"> <% if (f.equals("1")) { %> <font color="red">Login Failed.</font> <% (f.equals("2")) { %> <font color="red">Blocked Account.</font> <% } %> </td> </tr>
} else if
<tr> <td>UserName:</td> <td><input type="text" name="username"/></td> </tr> <tr> <td>Password:</td> <td><input type="password" name="password"/></td> </tr> <tr> <td colspan="2" style="text-align: center;"><input type="submit" value="Login"/></td> </tr>
Page No.30
Semester: VIII
Branch: I.T.
</table> </form> </div> </center> </li> </ul> </div>  <div style="clear: both;"> </div> </div>  <div id="footer"> <p>Copyright (c) 2011 www.antiphishing.com.</p> </div>  </body> </html>
5.2.Registration
<% String firstName = ""; String lastName = ""; String userName = ""; String Email = ""; String Mobile = ""; String Address = ""; Object firstNameObj = session.getAttribute("fname"); Object lastNameObj = session.getAttribute("lname"); Object userNameObj = session.getAttribute("uname"); Object EmailObj = session.getAttribute("email"); Object MobileObj = session.getAttribute("mobile"); Object AddressObj = session.getAttribute("address"); if (firstNameObj != null) { firstName = firstNameObj.toString(); } if (lastNameObj != null) { lastName = lastNameObj.toString(); } if (userNameObj != null) { userName = userNameObj.toString(); }
Page No.31
Semester: VIII
Branch: I.T.
if (EmailObj != null) { Email = EmailObj.toString(); } if (MobileObj != null) { Mobile = MobileObj.toString(); } if (AddressObj != null) { Address = AddressObj.toString(); }
%> <form action="../RegistrationActionServlet" method="post"> <table border="0"> <tr> <th colspan="2"><h3><b>Registration</b></h3></th> </tr> <% String f = request.getParameter("f"); if (f == null) { f = ""; } %> <tr> <td colspan="2"> <% if (f.equals("1")) { %> <font color="red">Enter All Fields!!!!</font> <% (f.equals("2")) { %> <font color="red">Password and confirm Password should be same!!!</font> <% (f.equals("3")) { %> <font color="red">Username not available!!!</font> <% } %> </td> </tr> <tr> <td>First Name</td>
} else if
} else if
Page No.32
Semester: VIII
Branch: I.T.
<td><input type="text" name="firstname" style="width: 170px;" value="<%=firstName%>"/></td> </tr> <tr> <td>Last Name</td> <td><input type="text" name="lastname" style="width: 170px;" value="<%=lastName%>"/></td> </tr> <tr> <td>Login Name</td> <td><input type="text" name="loginname" style="width: 170px;" value="<%=userName%>"/></td> </tr> <tr> <td>Password</td> <td><input type="password" name="password" style="width: 170px;"/></td> </tr> <tr> <td>Confirm Password</td> <td><input type="password" name="confirmpassword" style="width: 170px;"/></td> </tr> <tr> <td>E-Mail</td> <td><input type="text" name="email" style="width: 170px;" value="<%=Email%>"/></td> </tr> <tr> <td>Mobile</td> <td><input type="text" name="mobile" style="width: 170px;" value="<%=Mobile%>"/></td> </tr> <tr> <td>Address</td> <td><textarea name="address" style="width: 170px;" value="<%=Address%>" rows="5"></textarea> </td> </tr> <tr> <td colspan="2" style="text-align: center;"><input type="submit" value="Save"/></td> </tr> </table> </form> </div> </center>
Page No.33
Semester: VIII
Branch: I.T.
</div> </div> </div> <div id="sidebar"> </div>  <div style="clear: both;"> </div> </div>  <div id="footer"> <p>Copyright (c) 2011 www.antiphishing.com.</p> </div>  </body> </html> 5.3.client application
public class SystemTraySupport { private String icon = "icon/icon.png"; private String applicationName; private PopupMenu popup; private TrayIcon trayIcon; public void setPopup(PopupMenu popup) { this.popup = popup; } public void setIcon(String icon) { this.icon = icon; } public SystemTraySupport(String applicationName) { this.applicationName = applicationName; popup = new PopupMenu(); initSystemTray(); } public void addMenuListener(String menuItemName, ActionListener listener) { MenuItem menuItem = new MenuItem(menuItemName); menuItem.addActionListener(listener); popup.add(menuItem); }
Page No.34
Semester: VIII
Branch: I.T.
public void showInfoMessage(String message) { trayIcon.displayMessage(applicationName, TrayIcon.MessageType.INFO); } public void showErrorMessage(String message) { trayIcon.displayMessage(applicationName, TrayIcon.MessageType.ERROR); } public void showMessage(String message) { trayIcon.displayMessage(applicationName, TrayIcon.MessageType.NONE); } public void showWarningMessage(String message) { trayIcon.displayMessage(applicationName, TrayIcon.MessageType.WARNING); } private void initSystemTray() { if (SystemTray.isSupported()) { SystemTray tray = SystemTray.getSystemTray(); Image image = Toolkit.getDefaultToolkit().getImage(icon);
message,
message,
message,
message,
trayIcon = new TrayIcon(image, applicationName + " Running...", popup); trayIcon.setImageAutoSize(true); try { tray.add(trayIcon); } catch (Exception e) { System.err.println("TrayIcon could not be added."); } } else { System.out.println("System Tray is not supported"); } } }
Page No.35
Semester: VIII
Branch: I.T.
5.4.Server application public class NetworkProperties { private final String DIRECTORY = "setting"; private final String FILE_NAME = "network_config.xml"; //"database.properties"; private static NetworkProperties networkProperties; private String internetGatewayIP; private int internetGatewayPort; private String webServerIP; private int webServerPort; public String getInternetGatewayIP() { return internetGatewayIP; } public void setInternetGatewayIP(String internetGatewayIP) { this.internetGatewayIP = internetGatewayIP; } public int getInternetGatewayPort() { return internetGatewayPort; } public void setInternetGatewayPort(int internetGatewayPort) { this.internetGatewayPort = internetGatewayPort; } public String getWebServerIP() { return webServerIP; } public void setWebServerIP(String webServerIP) { this.webServerIP = webServerIP; } public int getWebServerPort() { return webServerPort; } public void setWebServerPort(int webServerPort) { this.webServerPort = webServerPort; } public NetworkProperties() { if (!loadPropertiesFormXMLFile()) { createBlankPropertyXMLFile();
Page No.36
Semester: VIII
Branch: I.T.
} } public static NetworkProperties getInstance() { if (networkProperties == null) { networkProperties = new NetworkProperties(); } return networkProperties; } public void save() { Properties properties = new Properties(); properties.setProperty("InternetGatewayIP", internetGatewayIP); properties.setProperty("InternetGatewayPort", Integer.toString(internetGatewayPort)); properties.setProperty("WebServerIP", webServerIP); properties.setProperty("WebServerPort", Integer.toString(webServerPort)); File file = new File(DIRECTORY); if (!file.exists()) { file.mkdirs(); } file = new File(file, FILE_NAME); FileOutputStream out = null; try { out = new FileOutputStream(file); properties.storeToXML(out, null); } catch (IOException e) { e.printStackTrace(); } finally { try { out.close(); } catch (Exception e) { } } } private boolean loadPropertiesFormXMLFile() { FileInputStream in = null; try { File file = new File(DIRECTORY, FILE_NAME); in = new FileInputStream(file); Properties properties = new Properties(); properties.loadFromXML(in); internetGatewayIP = properties.getProperty("InternetGatewayIP").trim();
Page No.37
Semester: VIII
Branch: I.T.
internetGatewayPort Integer.parseInt(properties.getProperty("InternetGatewayPort").trim()); webServerPort Integer.parseInt(properties.getProperty("WebServerPort").trim()); webServerIP = properties.getProperty("WebServerIP").trim(); return true; } catch (IOException e) { e.printStackTrace(); return false; } finally { try { in.close(); } catch (Exception e) { } } } private void createBlankPropertyXMLFile() { internetGatewayIP = "0.0.0.0"; internetGatewayPort = 0; webServerPort = 0; webServerIP = "0.0.0.0"; save(); } public static void main(String[] args) { NetworkProperties pro = NetworkProperties.getInstance(); System.out.println(pro.getInternetGatewayPort()); System.out.println(pro.getWebServerPort()); System.out.println(pro.getInternetGatewayIP()); System.out.println(pro.getWebServerIP()); pro.save(); } }
= =
Page No.38
Semester: VIII
Branch: I.T.
TESTING
Page No.39
Semester: VIII
Branch: I.T.
6. TESTING
Testing performs a critical role for quality assurance and for the reliability of the software. Testing forms is the first step in determining the errors in the program. Different levels of testing are used in the testing process, each level of testing aims to test different aspects of the software. The basic levels used are : unit testing and integration testing. There are several strategies that are used in the system. Unit testing Integration testing
6.1 UNIT TESTING

Unit testing is the first level of testing. It is a software verification and validation method in which a programmer tests if individual units of source code are fit to use. The unit testing is performed by the programmer prior to integration of individual units or modules into a larger system. The testing is carried out in the coding stage itself and each module is found to be working satisfactorily as regards to the expected output from the module.
6.2 INTEGRATION TESTING

In this testing we test our project as a whole, combining the whole modules. Integration testing sis a systematic technique for constructing the program structure while conducting tests to uncover errors associated with the interfacing between modules. The individual units are combined with other units to make sure that necessary communication, links and data sharing occur properly.
Page No.40
Semester: VIII
Branch: I.T.
6.1.1 TEST CASE ADMIN LOGIN

Tc No 1. UNIT INPUT EXPECTED OUTPUT OUTPUT Login Username Checks the form and password of the database and redirects to OBTAINED Error message STAT US Failure Incorrect usernam e or REASON REME DY Enter the correct
password userna me and passw ord
administra the admin tor. home. Else shows error message box 2. Login Username Checks the form and password database and redirects to the admin home. Else shows error message box Redirected to the admin home Succe ss
Table: 6.1.1: Admin Login
Page No.41
Semester: VIII
Branch: I.T.
6.1.2 TEST CASE CANTINA WEBSIDE

Tc No 1. Login No input form required. UNIT INPUT EXPECTED OUTPUT OUTPUT OBTAINED STAT US Failure Code error. REASON REME DY Verify and correct the code.
Redirects to No the blind home page within ten seconds. Else remains in the login form. redirection.
2.
Login No input form required.
Redirects to Redirected the blind home page within ten seconds. Else remains in the login form. to the blind home page.
Succe ss
Table: 6.1.2: Cantina Web side
Page No.42
Semester: VIII
Branch: I.T.
6.2
INTEGRATION TESTING
6.2.1. TEST CASE CLIENT APPLICATION

Tc No UNIT INPUT EXPECTE D OUTPUT OUTPU T OBTAIN ED 1. Downl oad access Enter the username and password. The contents pertaining to the option are made available to the user. 2. File Setup Click the link to download the software. The software should be download ed successful ly. Jar file could not be downloa ded. Failure Setup fail error. Proper setup of jar file. No further navigati on. Failure The user has been blocked by administra tor. Unbloc k the user. STATUS REASON REME DY
Table: 6.2.1: Client Application
Page No.43
Semester: VIII
Branch: I.T.
7. FUTURE SCOPE
In the coming months, the Anti-Phishing Working Group will be working with technology partners, government regulators, and leaders in industries being victimized by phishing attacks to sculpt the most efficient and elegant digital signing and authentication solution that can be employed by large user communities. We are sending calls to action to all stakeholders in the hopes that by year end, a consensus will be formed among them as to how the effected industries can establish a digital signing and authentication regime that can be deployed post haste to end phishing attacks, preclude regulatory adventurism and, ultimately, to establish a technological foundation for a broader spam-proof electronic mailing infrastructure.
Page No.44
Semester: VIII
Branch: I.T.
CONCLUSIONS
Page No.45
Semester: VIII
Branch: I.T.
8. CONCLUSIONS
CANTINA, a novel content-based approach for detecting phishing web sites. CANTINA takes Robust Hyperlinks, an idea for overcoming page not found problems using the well-known Term Frequency / Inverse Document Frequency (TF-IDF) algorithm, and applies it to anti-phishing. We described our implementation of CANTINA, and discussed some simple heuristics that can be applied to reduce false positives. We also presented an evaluation of CANTINA, showing that the pure TFIDF approach can catch about 97% phishing sites with about 6% false positives, and after combining some simple heuristics we are able to catch about 90% of phishing sites with only 1% false positives.
Page No.46
Semester: VIII
Branch: I.T.
SCREENSHOTS
Page No.47
Semester: VIII
Branch: I.T.
Login Form
Page No.48
Semester: VIII
Branch: I.T.
Admin Home Page
Page No.49
Semester: VIII
Branch: I.T.
Users List
Page No.50
Semester: VIII
Branch: I.T.
Whitelist
Page No.51
Semester: VIII
Branch: I.T.
Blacklist
Page No.52
Semester: VIII
Branch: I.T.
REFERENCE
[1] 3Sharp, 3Sharp Study finds Internet Explorer 7 Edges Out Netcraft As Most Accurate for Anti-Phishing Protection. 6.http://www.3sharp.com/projects/antiphishing/
[2]
Anti-Phishing Working Group,
Phishing Activity Trends
Report. 2006.
Zttp://www.antiphishing.org/reports/ apwg_report_june_06.pdf
[3]
Anti-Phishing
Working
Group
(APWG).
Visited:
Nov
20,
2006.http://www.antiphishing.org/
[4] Chou, N., R. Ledesma, Y. Teraguchi, D. Boneh, and J.C. Mitchell. Client-Side Defense against Web-Based Identity Theft. In Proceedings of The 11th Annual Network and Distributed System Security Symposium (NDSS '04).
http://crypto.stanford.edu/SpoofGuard/webspoof.pdf
[5] Cloudmark Inc. Visited: Nov 20, 06.http://www.cloudmark.com/desktop/download/ [6] Robert T. Morris, A Weakness in the 4.2BSD UNIX TCP/IP Software. Computing Science Technical Report 117, AT&T Bell Laboratories, February 1985.
[7] Rod Rasmussen, Phishing Prevention: Making Yourself a Hard Target. Internet Identity / APWG (April 5, 2004).
[8] Blake Ross, Nick Miyake, Robert Ledesma, Dan Boneh and John C. Mitchell, A Simple Solution to the Unique Password Problem. [9] Z. Ye. Building Trusted Paths for Web Browsers. Masters Thesis. Department of Computer Science, Dartmouth College. May 2002
[10] Zishuang (Eileen) Ye and Sean W. Smith, Trusted Paths for Browsers, 11th Usenix Security Symposium, August 2002.
Page No.53

Cantina Antiphishing

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Cantina Antiphishing

Transféré par

Droits d'auteur :

Formats disponibles

Semester: VIII

Project Title: Cantina Antiphishing

Toc H Institute of Science & Technology Arakkunnam 682 313

Project Title: Cantina Antiphishing

Project Title: Cantina Antiphishing

Project Title: Cantina Antiphishing

Toc H Institute of Science & Technology Arakkunnam 682 313

Project Title: Cantina Antiphishing

Toc H Institute of Science & Technology Arakkunnam 682 313

Project Title: Cantina Antiphishing

Educating People about Phishing Attacks

Project Title: Cantina Antiphishing

Anti-Phishing User Interfaces

Project Title: Cantina Antiphishing

Automated Detection of Phishing

Toc H Institute of Science & Technology Arakkunnam 682 313

Project Title: Cantina Antiphishing

A Content-based approach for detecting phishing websites

How TDF/IF works

Project Title: Cantina Antiphishing

Adapting TF-IDF for detecting Phishing

Project Title: Cantina Antiphishing

Toc H Institute of Science & Technology Arakkunnam 682 313

Project Title: Cantina Antiphishing

Project Title: Cantina Antiphishing

Toc H Institute of Science & Technology Arakkunnam 682 313

Project Title: Cantina Antiphishing

PROBLEM IDENTIFICATION AND DEFINING OBJECTIVES OF THE PROJECT

Toc H Institute of Science & Technology Arakkunnam 682 313

Project Title: Cantina Antiphishing

3. PROBLEM IDENTIFICATION AND DEFINING OBJECTIVES OF THE PROJECT

1. Ensure that a users data only goes to the intended recipient

2. Prevent a users data from reaching an untrustworthy recipient

3. Prevent an attacker from abusing a users data

4. Prevent an attacker from modifying a users account

Project Title: Cantina Antiphishing

5. Prevent an attacker from viewing a users account

Toc H Institute of Science & Technology Arakkunnam 682 313

Project Title: Cantina Antiphishing

PROBLEM ANALYSIS AND DESIGN

Toc H Institute of Science & Technology Arakkunnam 682 313

Project Title: Cantina Antiphishing

4. PROBLEM ANALYSIS AND DESIGN

4.1. Existing System

4.2. Proposed System

Project Title: Cantina Antiphishing

Java SQL Server Windows7 Visual Studio

Processor System Bus RAM HDD Display

intel i3 32 BIT 3 GB 40 GB SVGA Color

Toc H Institute of Science & Technology Arakkunnam 682 313

Project Title: Cantina Antiphishing

4.3.1. User Management And login

Project Title: Cantina Antiphishing

4.3.2. Anti-Phishing System

Project Title: Cantina Antiphishing

4.4 SYSTEM DESIGN

Figure 4.4 System Architecture

4.5 USE CASE DIAGRAM

Figure 4.5 Use Case Diagram

Toc H Institute of Science & Technology Arakkunnam 682 313

Project Title: Cantina Antiphishing

4.6. HIGH LEVEL DESIGN

Figure 4.6 High level design

Toc H Institute of Science & Technology Arakkunnam 682 313