Vous êtes sur la page 1sur 63

Intelligent Phishing Website Detection System Using Fuzzy Techniques

A PROJECT REPORT
Submitted by in the partial fulfillment for the award of the degree Of

BACHELOR OF TECHNOLOGY
In

ANNA UNIVERSITY: CHENNAI 600025 APRIL 2009

ANNA UNIVERSITY: CHENNAI 600025

BONAFIDE CERTIFICATE

Certified that this project report Intelligent Phishing Website


Detection System using Fuzzy Techniques is the bonafide

work of who carried out the project work under my supervision.

Submitted for the project viva-voce held on ______________________

(INTERNAL EXAMINER)

(EXTERNAL EXAMINER)

CERTIFICATE OF EVALUATION
And prevention of phishing attacks

The report of the project work submitted by the above students in partial fulfillment for the award of Bachelor of Technology degree in Information Technology of Anna University were evaluated and confirmed to be reports of the work done by the above students and then evaluated.

(INTERNAL EXAMINER)

(EXTERNAL EXAMINER)

ACKNOWLEDGEMENT

ABSTRACT

PHISHING websites are forged web pages that are created by malicious people to mimic web pages of real websites. Most of these kinds of Web pages have high visual similarities to scam their victims. Some of these kinds of Web pages look exactly like the real ones. Unwary Internet users may be easily deceived by this kind of scam. Victims of phishing Web pages may expose their bank account, password, credit card number, or other important information to the phishing Web page owners.

TABLE OF CONTENTS

CHAPTER

TITLE

PAGE NO.

LIST OF FIGURES LIST OF ABBREVATIONS 1 INTRODUCTION


1.1

About the Project

SYSTEM ANALYSIS
2.1

Existing system Proposed system 2.2.1 Classification of hyperlinks In the Phishing emails
2.2.2

2.2

Fuzzy Logic

2.3

Feasibility Study

REQUIREMENTS SPECIFICATION 3.1 Introduction 3.2 Hardware and Software specification 3.2.1Hardware Requirement 3.2.2 Software Requirement 3.2.3 Technologies Used

3.2.4 Database 3.3 Technologies Used 3.3.1 Java 3.3.1.1 3.3.1.2 3.3.1.3 3.3.2 3.3.2.1 3.3.3 3.3.4 3.3.5 4 Working of Java THE JAVA PROGRAMMING LANGUAGE Java Platform Java Servlet Advantages of java servlets Java Server Pages APACHE TOMCAT SERVER JAVA MAIL API

SYSTEM DESIGN 4.1 Data Flow Diagram 4.2 Sequence Flow Diagram 4.3 Activity Diagram 4.4 Use Case Diagram

SYSTEM DESIGN DETAILED 5.1 Modules

CODING AND TESTING

6.1 Coding 6.2 Coding standards 6.2.1 Naming Convention 6.2.2 Value Conventions 6.2.3 Script Writing and Commenting Standard 6.2.4 Message Box Format 6.3 Test procedure 6.4 Test data and output 6.4.1 Unit Testing 6.4.2 Functional Tests 6.4.3 Performance Tests 6.4.4 Stress Test 6.4.5 Structured Test 6.4.6 Integration Testing 6.5 Testing Techniques / Testing strategies 6.5.1 Testing 6.5.1.1 White Box Testing 6.5.1.2 Black Box Testing 6.5.2 Software Testing Strategies 6.5.2.1 Integration Testing 6.5.2.2 Program Testing 6.5.2.3 Security Testing 6.5.2.4 Validation Testing

6.5.2.5 User Acceptance Testing 7 CONCLUSION AND FUTURE ENHANCEMENTS 7.1 Conclusion

SNAP SHOTS REFERENCES

LIST OF FIGURES

2.1 Categories of hyper links in Phishing Emails 3.1 Working of Java 3.2 Compilation and running procedure of Java 3.3 Java Platform 4.1 Data Flow Diagram 5.1 Module 1 5.2 Module 2 5.3 Module 3

LIST OF ABBREVATIONS

DNS PIN SMTP URL URI APWG CSS JSP JVM JMX HTML

Domain Name System Personal Identification Number Simple Mail Transfer Protocol Uniform Resource Locator Uniform Resource Identifier Anti-Phishing Working Group Cross Site Scripting Java Server Pages Java Virtual Machine Java Mail Extension Hypertext Markup Language

CHAPTER 1

INTRODUCTION
1.1 ABOUT THE PROJECT Detecting and identifying Phishing websites is really a complex and dynamic problem involving many factors and criteria, and because of the subjective considerations and the ambiguities involved in the detection, Fuzzy Logic model can be an effective tool in assessing and identifying phishing websites than any other traditional tool since it offers a more natural way of dealing with quality factors rather than exact values. In this paper, we present novel approach to overcome the fuzziness in traditional website phishing risk assessment and propose an intelligent resilient and effective model for detecting phishing websites. The proposed model is based on FL operators which is used to characterize the website phishing factors and indicators as fuzzy variables and produces six measures and criterias of website phishing attack dimensions with a layer structure. Our experimental results showed the significance and importance of the phishing website criteria (URL & Domain Identity) represented by layer one, and the variety influence of the phishing characteristic layers on the final phishing website rate. The word 'Phishing' initially emerged in 1990s. The early hackers often use 'ph' to replace 'f' to produce new words in the hacker's community, since they usually hack by phones. Phishing is a new word produced from 'fishing', it refers to the act that the attacker allure users to visit a faked Web site by sending them faked e-mails (or instant messages), and stealthily get victim's personal information such as user name, password, and national security ID, etc. These information then can be used for future target advertisements or even identity theft attacks (e.g., transfer money from victims' bank account). The frequently used attack method is to send emails to potential victims, which seemed to be sent by banks, online organizations, or ISPs.

In these e-mails, they will makeup some causes, e.g. the password of your credit card had been mis-entered for many times, or they are providing upgrading services, to allure you visit their Web site to conform or modify your account number and password through the hyperlink provided in the e-mail. You will then be linked to a counterfeited Web site after clicking those links. The style, the functions performed, sometimes even the URL of these faked Web sites are similar to the real Web site. It's very difficult for you to know that you are actually visiting a malicious site. If you input the account number and password, the attackers then successfully collect the information at the server side, and is able to perform their next step actions with that information (e.g., withdraw money out from your account).Phishing itself is not a new concept, but it's increasingly used by phishers to steal user information and perform business crime in recent years. Within one to two years, the number of phishing attacks increased dramatically. According to Gartner Inc., for the 12 months ending April 2004, "there were 1.8 million phishing attack victims, and the fraud incurred by phishing victims totaled $1.2 billion".

According to the statistics provided by the Anti-Phishing Working Group (APWG), in March 2006, the total number of unique phishing reports submitted to the APWG was 18,480; and the top three phishing site hosting countries are, the United States (35.13%), China (11.93%), and the Republic of Korea (8.85%). The infamous phishing attacks happened in China in recent years include the events to counterfeit the Bank of China (real Web site www.bank-ofchina.com, counterfeited Web site www.bank-off-china.com),the Industrial and Commercial Bank of China (real Website www.icbc.com.cn, faked web site www.lcbc.com.cn), the Agricultural Bank of China (real webs ite www.95599.com,faked Web site www.965555.com), etc. In this project, we

study the common procedure of phishing attacks and review possible anti-phishing approaches. We then focus on end-host based anti-phishing approach. We first analyze the common characteristics of the hyperlinks in phishing e-mails. Our analysis identifies that the phishing hyperlinks share one or more characteristics as listed below: 1) The visual link and the actual link are not the same; 2) The attackers often use dotted decimal IP address instead of DNS name; 3) Special tricks are used to encode the hyperlinks maliciously; 4) The attackers often use fake DNS names that are similar (but not identical) with the target Web site.

CHAPTER 2

SYSTEM ANALYSIS 2.1 EXISTING SYSTEM We briefly review the approaches for antiphishing. 1) Detect and block the phishing Web sites in time: If we can detect the phishing Web sites in time, we then can block the sites and prevent phishing attacks. It's relatively easy to (manually) determine whether a site is a phishing site or not, but it's difficult to find those phishing sites out in time. Here we list two methods for phishing site detection. A) The Web master of a legal Web site periodically scans the root DNS for suspicious sites (e.g. www. 1 cbc.com.cn vs. www.icbc.com.cn). B) Since the phisher must duplicate the content of the target site, he must use tools to (automatically) download the Web pages from the target site. It is therefore possible to detect this kind of download at the Web server and trace back to the phisher. Both approaches have shortcomings. For DNS scanning, it increases the overhead of the DNS systems and may cause problem for normal DNS queries, and furthermore, many phishing attacks simply do not require a DNS name. For phishing download detection, clever phishers may easily write tools which can mimic the behavior of human beings to defeat the detection. 2) Enhance the security of the web sites: The business Websites such as the Web sites of banks can take new methods to guarantee the security of users' personal information. One method to enhance the security is to use hardware devices. For example, the Barclays bank provides a hand-held card reader to the users. Before shopping in the net, users need to insert

their credit card into the card reader, and input their (personal identification number) PIN code, then the card reader will produce a onetime security password, users can perform transactions only after the right password is input. Another method is to use the biometrics characteristic (e.g. voice, fingerprint, iris, etc.) for user authentication. For example, Pay pal had tried to replace the single password verification by voice recognition to enhance the security of the Web site. With these methods, the phishers cannot accomplish their tasks even after they have gotten part of the victims' information. However, all these techniques need additional hardware to realize the authentication between the users and the Web sites hence will increase the cost and bring certain inconvenience. Therefore, it still needs time for these techniques to be widely adopted. 3) Block the phishing e-mails by various spam filters: Phishers generally use e-mails as 'bait' to allure potential victims. SMTP (Simple Mail Transfer Protocol) is the protocol to deliver e-mails in the Internet. It is a very simple protocol which lacks necessary authentication mechanisms. Information related to sender, such as the name and email address of the sender, route of the message, etc., can be counterfeited in SMTP. Thus, the attackers can send out large amounts of spoofed e-mails which are seemed from legitimate organizations. The phishers hide their identities when sending the spoofed e-mails, therefore, if anti-spam systems can determine whether an e-mail is sent by the announced sender (Am I Whom I Say I Am?), the phishing attacks will be decreased dramatically. From this point, the techniques that preventing senders from counterfeiting their Send ID (e.g. SIDF of Microsoft) can defeat phishing attacks efficiently. SIDF is a combination of Microsoft's Caller ID for E-mail and the SPF (Sender Policy Framework) developed by

Meng Weng Wong. Both Caller ID and SPF check e-mail sender's domain name to verify if the e-mail is sent from a server that is authorized to send e-mails of that domain and from that to determine whether that e-mail use spoofed e-mail address. If it's faked, the Internet service provider can then determine that e-mail is a spam e-mail. The spoofed e-mails used by phishers are one type of spam e-mails. From this point of view, the spam filters, can also be used to filter those phishing e-mails. For example, blacklist, white list, keyword filters, Bayesian filters with self learning abilities, and E-Mail Stamp, etc., can all be used at the e-mail server or client systems. Most of these anti-spam techniques perform filtering at the receiving side by scanning the contents and the address of the received e-mails. And they all have pros and cons as discussed below. Blacklist and whitelist cannot work if the names of the spamers are not known in advance. Keyword filter and Bayesian filters can detect spam based on content, hence can detect unknown spasm. But they can also result in false positives and false negatives. Furthermore, spam filters are designed for general spam e-mails and may not very suitable for filtering phishing e-mails since they generally do not consider the specific characteristics of phishing attacks. 4) Install online anti-phishing software in user's computers: Despite all the above efforts, it is still possible for the users to visit the spoofed Web sites. As a last defense, users can install anti-phishing tools in their computers. The antiphishing tools in use today can be divided into two categories: blacklist / whitelist based and rule-based. Category I: When a user visits a Web site, the antiphishing tool searches the address of that site in a blacklist stored in the database. If the visited site is on the list, the anti-phishing tool then warns the users. Tools in this category include ScamBlocker from the EarthLink company, PhishGuard and Netcraft , etc. Though the developers of these tools all announced

that they can update the blacklist in time, they cannot prevent the attacks from the newly emerged (unknown) phishing sites. Category II: this category of tools uses certain rules in their software, and checks the security of a Web site according to these rules. Examples of this type of tools include SpoofGuard developed by Stanford,TrustWatch of the GeoTrust, etc. SpoofGuard checks the domain name, URL (includes the port number) of Web site, it also checks whether the browser is directed to the current URL via the links in the contents of e-mails. If it finds that the domain name of the visited Web site is similar to a well-known domain name, or if they are not using the standard port, SpoofGuard will warn the users. In TrustWatch, the security of a Web site is determined by whether it has been reviewed by an independent trusted third party organization. Both SpoofGuard and TrustWatch provide a toolbar in the browsers to notify their users whether the Web site is verified and trusted. It is easy to observe that all the above defense methods are useful and complementary to each other, but none of them are perfect at the current stage.

2.2 PROPOSED SYSTEM 2.2.1 Classification of the hyperlinks in the phishing e-mails

In order to (illegally) collect useful information from potential victims, phishers generally tries to convince the users to click the hyperlink embedded in the phishing e-mail. A hyperlink has a structure as follows. <a href="URI "> Anchor text <\a> Where 'URI' (universal resource identifiers) provides the necessary information needed for the user to access the networked resource and 'Anchor text' is the text that will be displayed in user's Web browser. Examples of URIs are :

http://www.google.com https://www.icbc.com.cn/login.html ftp://61.112.1.90:2345.

'Anchor text' in general is used to display information related to the URI to help the user to better understand the resources provided by the hyperlink. In the following hyperlink, the URI links to the phishing archives provided by the APWG group, and its anchor text "Phishing Archive" informs the user what's the hyperlink is about. <a

href'http://www.antiphishing.org/phishing-archive.html">Phishing Archive </a> Note that the content of the URI will not be displayed in user's Web browser. Phishers therefore can utilize this fact to play trick in their 'bait' e-mails. In the rest of the paper, we call the URI in the hyperlink the actual link and the anchor text the visual link. After analyzing the 203 (there are altogether 210 phishing e-mails, with 7 of them with incomplete information or

with malware attachment and do not have hyperlinks) phishing email archives from Sep. 21st 2003 to July 4th 2005 provided by APWG. We classified the hyperlinks used in the phishing email into the following categories: 1) The hyperlink provides DNS domain names in the anchor text, but the destination DNS name in the visible link doesn't match that in the actual link. For instance, the following hyperlink: <a href= "http://www.profusenet.net/checksession.php"> https://secure.regionset.com/EBanking/logon/</a> %o340o31:%34%39%30%33/%6C/%69%6E%64%65%78 %2E%68%74%6D"> www.citibank.com </a> while this link is seemed pointed www.citibank.com, it actually points to

http://4.34.195.41:34/l/index.htm. b) Special characters (e.g. ( in the visible link) are used to fool the user to believe that the email is from a trusted sender. For instance, the following link seems is linked to Amazon, but it actually is linked to IP address 69.10.142.34. http://www.amazon.com:fvthsgbljhfcs83infoupdate@69.10.142.34. 4) The hyperlink does not provide destination information in its anchor text and uses DNS names in its URI. The DNS name in the URI usually is similar with a famous company or organization. For instance, the following link seems to be sent from pay pal, but it actually is not. Since paypal-cgi is actually registered by the phisher to let the users believe that it has something to do with pay pal

<a href= "http://www.paypal-cgi.us/webscr.php? cmd=LogIn"> Click here to confirm your account </a> 5) The attackers utilize the vulnerabilities of the target Web site to redirect users to their phishing sites or to launch CSS (cross site scripting) attacks. For example, the following link <a href="http://usa.visa.com/track/dyredirjsp?rDirl= http://200.251.251.10/.verified/"> Click here <a> Once clicked, will redirect the user to the phishing site 200.251.251.10 due to a vulnerability of usa.visa.com.

Appears to be linked to secure.regionset.com, which is the portal of a bank, but it actually is linked to a phishing site www.profusenet.net.

2) Dotted decimal IP address is used directly in the URI or the anchor text instead of DNS name. See below for an example. <a href= "http://61.129.33.105/secured-site/www.skyfi. com/ index.html?MfclSAPICommand=SignInFPP& UsingSSL= 1"> SIGN IN</a> 3) The hyperlink is counterfeited maliciously by using certain encoding schemes. There are two cases: a) The link is formed by encoding alphabets into their corresponding ASCII codes. See below for such a hyperlink. <a href="http://034%02E%0333%34%2E%311%39%355%2E Table 1 summarizes the number of hyperlinks and their percentages for all the categories. It can be observed that most of the phishing e-mails use faked DNS names (category 1,44.33O) or dotted decimal IP addresses (category 2, 41.87%). Encoding tricks are also frequently used (category 3a and 3b,17.24%). And phishing attackers often try to fool users by setting up DNS names that are very similar with the real ecommence sites or by not providing destination information in the anchor text (category 4). Phishing attacks that utilize the vulnerability of Web sites (category 5) are of small number (2%) and we leave this type of attacks for future study. Note that a phishing hyperlink can belong to several categories at the same time. For instance, an attacker may use tricks from

both categories 1 and 3 at the same time to increase his success chance. Hence the sum of percentages is larger than 1. Once the characteristics of the phishing hyperlinks are understood, we are able to design anti-phishing algorithms that can detect known or unknown phishing attacks in realtime. We present our LinkGuard algorithm in the next subsection. 2.2.2 Fuzzy Logic The proposed model is based on FL operators which is used to characterize the website phishing factors and indicators as fuzzy variables and produces six measures and criterias of website phishing attack dimensions with a layer structure.

2.3 FEASIBILITY STUDY The Phishing Website Risk Assessment Model1) Fuzzification The approach described here is to apply fuzzy logic modeling to assess website phishing risk on the 27 characteristics and factors which stamp the forged website. The essential advantage offered by fuzzy logic techniques is the use of linguistic variables to represent Key Phishing Characteristic Indicators and relating website phishing probability. In this step, linguistic descriptors such as High, Low, Medium, for example, are assigned to a range of values for each Key Phishing Characteristic Indicators. Valid ranges of the inputs are considered and divided into classes, or fuzzy sets. For example, length of URL address can range from low to high with other values in between. We cannot specify clear boundaries between classes. The degree of

belongingness of the values of the variables to any selected class is called the degree of membership; Membership function is designed for each Phishing characteristic indicator,

which is a curve that defines how each point in the input space is mapped to a membership value (or degree of membership) between. Linguistic values are assigned for each Phishing indicator as Low, Moderate, and High while for Phishing website risk rate as Very

legitimate, Legitimate, Suspicious, Phishy, and Very phishy (triangular and trapezoidal membership function). For each input their values ranges from 0 to 10 while for output, ranges from 0 to 100. An example of the linguistic descriptors used to represent one of the Key Phishing Characteristic Indicators (URL Address 3 Long) and a plot of the fuzzy membership functions are shown in figure 1. The fuzzy representation more closely matches human cognition, thereby facilitating

expert input and more reliably representing experts understanding of underlying dynamics. The same approach is used to calibrate the other 26 Key Phishing Characteristic Indicators. 2) Rule Evaluation. Having specified the risk of website phishing and its Key Phishing

Characteristic Indicators, the logical next step is to specify how the website phishing probability varies as a function of the Key Phishing Characteristic Indicators. Experts provide fuzzy rules in the form of ifthen statements that relate website phishing probability to various levels of Key Phishing Characteristic Indicators based on their knowledge and experience. Website phishing experiments, Anti phishing tools analysis,

web surveys, phishing quizzes and detailed questionnaire to assess factors, which collectively characterise the website phishing. A detailed checklist table is based on the types of phishing source and style, and weights assigned to them according to their effectiveness and influence. 3) Aggregation of the rule outputs. This is the process of unification of the outputs of all

rules. Combining the membership functions of all the rules consequents previously scaled into single fuzzy sets (output). 4) Defuzzification. This is the process of transforming a fuzzy output of a fuzzy inference system into a crisp output. Fuzziness helps to evaluate the rules, but the final output this system has to be a crisp number. The input for the defuzzification process is the aggregate output fuzzy set and the output is a number. This step was done using Centroid technique because it is most commonly used method of defuzzification. The output is website phishing risk rate and is defined in fuzzy sets like very phishy to very legitimate. The fuzzy output set is then defuzzified to arrive at a scalar value.

CHAPTER 3
REQUIREMENT SPECIFICATIONS 3.1 INTRODUCTION The requirements specification is a technical specification of requirements for the software products. It is the first step in the requirements analysis process it lists the requirements of a particular software system including functional, performance and security requirements. The requirements also provide usage scenarios from a user, an operational and an administrative perspective. The purpose of software requirements specification is to provide a detailed overview of the software project, its parameters and goals. This describes the project target audience and its user interface, hardware and software requirements. It defines how the client, team and audience see the project and its functionality.

3.2 HARDWARE AND SOFTWARE SPECIFICATION 3.2.1 HARDWARE REQUIREMENTS Hard disk RAM Processor speed : : : 20 GB and above 256 MB and above 1.6 GHz and above

3.2.2 SOFTWARE REQUIREMENTS

Operating System Documentation Tool

: :

Windows 2000/XP Ms word 2000

3.2.3 TECHNOLOGIES USED JSP Servlets Apache Tomcat 5.5

3.2.4 DATABASE Oracle XE

3.3 TECHNOLOGIES USED 3.3.1 JAVA: It is a Platform Independent. Java is an object-oriented programming language developed initially by James Gosling and colleagues at Sun Microsystems. The language, initially called Oak (named after the oak trees outside Gosling's office), was intended to replace C++, although the feature set better resembles that of Objective C.

3.3.1.1 WORKING OF JAVA For those who are new to object-oriented programming, the concept of a class will be new to you. Simplistically, a class is the definition for a segment of code that can contain both data (called attributes) and functions (called methods). When the interpreter executes a class, it looks for a particular method by the name of

main, which will sound familiar to C programmers. The main method is passed as a parameter an array of strings (similar to the argv[] of C), and is declared as a static method. To output text from the program, we execute the println method of System.out, which is javas output stream. UNIX users will appreciate the thoery behind such a stream, as it is actually standard output. For those who are instead used to the Wintel platform, it will write the string passed to it to the users program. Java consists of two things : Programming language platform 3.3.1.2 THE JAVA PROGRAMMING LANGUAGE Java is a high-level programming language that is all of the following: Simple Object-oriented Distributed

Interpreted Robust Secure Architecture-neutral Portable High-performance Multithreaded Dynamic

The code and can bring about changes whenever felt necessary. Some of the standard needed to achieve the above-mentioned objectives are as follows: Java is unusual in that each Java program is both co implied and interpreted. With a compiler, you translate a Java program into an intermediate language called Java byte codes the platform independent codes interpreted by the Java interpreter. With an interpreter, each Java byte code instruction is parsed and run on the computer. Compilation happens just once; interpretation occurs each time the program is executed. This figure illustrates how it works :

Fig.3.1 You can think of Java byte codes as the machine code instructions for the Java Virtual Machine (JVM). Every Java interpreter, whether its a Java development tool or a Web

browser that can run Java applets, is an implementation of JVM. That JVM can also be implemented in hardware. Java byte codes help make write once, run anywhere possible.
You can compile your Java program into byte codes on any platform that has a Java compiler. The byte codes can then be run on any implementation of the JVm. For example, that same Java program can e run on Windows NT, Solaris and Macintos

Java program

complier

interpreter

interpreter

interpreter

PC-Compatible Windows NT

Sun Ultra Solaris

Power macintosh System 8

Fig.3.2

3.3.1.3 THE JAVA PLATFORM A platform is the hardware or software environment in which a program runs. The Java platform differs from most other platforms in that its a software-only platform that runs on top of other, hardware-based platforms. Most other platforms are described as a combination of hardware and operating system.

The Java platform has two components :


The Java Virtual Machine (JVM) The Java Application Programming Interface (Java API)

Youve already been introduced to the JVM. Its the base for the Java platform and is ported onto various hardware-based platforms.

The Java API is a large collection of ready-made software components that provide many useful capabilities, such as graphical user interface (GUI) widgets. The Java API is grouped into libraries (packages) of related components. The following figure depicts a Java program, such as an application or applet, thats running on the Java platform. As the figure shows, the Java API and Virtual Machine insulates the Java program from hardware dependencies.

Fig.3.3 As a platform-independent environment, Java can be a bit slower than native code. However, smart compliers, weel-tuned interpreters, and just-in-time byte complilers can bring Javas performance close to that of native code without threatening protability.

3.3.2 JAVA SERVLETS A Java Servlet is a server-side program that is called by the user interface or another J2EE component and contains the business logic to process a request. In this the implicit and the explicit data is sent from a client to a server-side program in the form of a request that is processed and another set of explicit implicit data is returned.

Explicit data is information received from the client that is typically either entered by the user in to user interface or generated by the user interface itself. Implicit data is HTTP information that is generated by the client rather than the user. 3.3.2.1 Advantages of java servlets

Only one copy of a java servlet is loaded in to the JVM no matter the number of

simultaneous requests. A java servlet has persistence. This means that the servlet remains alive after the

request. 3.3.3 Java Server Pages A JSP is similar in design and functionality to java servlet. It is called by the client to provide a web service, the nature of which depends on the J2EE application. However, a JSP differs from a servlet in the way in which the JSP is written. Java Servlet is written using Java programming language and responses are encoded as an output string object that is passed to the println() method. In contrast a JSP is written in HTML, XML, or in the clients format that is interspersed with scripting elements, directives, and actions comprised of Java Programming language and JSP syntax. There are three methods that are automatically called when the JSP is requested and the JSP terminates normally. These are the jspInt() method, the jspDestroy() method, and the service() method. The jspInt() methid is called first when the jsp is requested and is used to initialize objects and variables that are used throughout the life of the JSP. The jspDestroy() method is

automatically called when the JSP terminates normally. It isnt called when the JSP abruptly terminates. The service () method is automatically called and retrieves connection to HTTP. The jsp programs are executed by a JSP virtual machine that runs on a web server. Therefore you will need to have access to a JSP virtual machine to run your JSP program. One of the most popular JSP virtual machines is TOMCAT. 3.3.4 APACHE TOMCAT SERVER Apache Tomcat (formerly under the Apache Jakarta Project; Tomcat is now a top level project) is a web container developed at the Apache Software Foundation. Tomcat implements the servlet and the JavaServer Pages (JSP) specifications from Sun Microsystems, providing an environment for Java code to run in cooperation with a web server. It adds tools for configuration and management but can also be configured by editing configuration files that are normally XML-formatted. Because Tomcat includes its own HTTP server internally, it is also considered a standalone web server. Environment Tomcat is a web server that supports servlets and JSPs. Tomcat comes with the Jasper compiler that compiles JSPs into servlets. The Tomcat servlet engine is often used in combination with an Apache web server or other web servers. Tomcat can also function as an independent web server. Earlier in its development, the perception existed that standalone Tomcat was only suitable for development environments and other environments with minimal requirements for speed and transaction handling. However, that perception no longer exists; Tomcat is increasingly used as a standalone web server in hightraffic, high-availability environments.

Since its developers wrote Tomcat in Java, it runs on any operating system that has a JVM. Product features Tomcat 3.x (initial release)

implements the Servlet 2.2 and JSP 1.1 specifications servlet reloading basic HTTP functionality Tomcat 4.x implements the Servlet 2.3 and JSP 1.2 specifications servlet container redesigned as Catalina JSP engine redesigned as Jasper Coyote connector Java Management Extensions (JMX), JSP and Struts-based administration Tomcat 5.x

implements the Servlet 2.4 and JSP 2.0 specifications reduced garbage collection, improved performance and scalability native Windows and Unix wrappers for platform integration faster JSP paring

History Tomcat started off as a servlet specification implementation by James Duncan Davidson, a software architect at Sun. He later helped make the project open source and played a key role in its donation by Sun to the Apache Software Foundation. Davidson had initially hoped that the project would become open-sourced and, since most opensource projects had O'Reilly books associated with them featuring an animal on the cover, he wanted to name the project after an animal. He came up with Tomcat since he reasoned the animal represented something that could take care of and fend for itself. His wish to see an animal cover eventually came true when O'Reilly published their Tomcat book with a tomcat on the cover 3.3.5 JAVA MAIL API Email is probably the most widely used methods of communication. A J2EE application is able to send and receive email messages through the use of the Java mail API. It is protocol independent and can send and receive messages created by a J2EE application via email using existing email protocols. It provides a set of abstract classes defining objects that comprise a mail system. The API defines classes like message, store and transport. The API can be extended and can be sub classed to provide new protocols and to add functionality when necessary.

JAVA MAIL LAYERED ARCHITECTURE The Java Mail architectural components are layered as shown below: The abstract layer declares classes, interfaces and abstract methods intended to support mail handling functions that all mail system support The internet implementation layer part of the abstract layer using Internet standards Java mail uses the Java Beans Activation Framework (JAF) in order to encapsulate message data, and to handle commands intended to interact with the data

CHAPTER 4
BLOCK DIAGRAM

Fig.4.1

4.1 Data Flow Diagram

Website Homepa ge Login InCorrect Data Base Correct

Register

User Homepa ge
F I N A L

Mail Send

Mail Server

Mail Receive

Logou t

P H I S H I N G

Layer 1

Layer 2

Layer 3

R A T

4.2 Sequence Diagram

4.3 Activity Diagram

4.4 Use case Diagram

CHAPTER 5
SYSTEM DESIGN Website phishing detection rate is performed based on six criteria: URL & Domain Identity, Security & Encryption, Source Code & Java script, Page Style & Contents, Web Address Bar And Social Human Factor as shown in Table I, which also shows that there are different number of components for each criterion, five components for URL & Domain Identity,

Source Code & Java script, Page Style & Contents, Web Address Bar, four components for Security & Encryption and three components for Social Human Factor. Therefore, there are twenty seven components in total. There are three layers on this website phishing fuzzy model as shown in figure 2. The first layer contains only URL & Domain Identity criteria with a weight equal to 0.3 for its importance; the second layer contains Security & Encryption criteria and Source Code & Java script criteria with a weight equal to 0.2 each; the third layer contains Page Style & Contents criteria, Web Address Bar criteria And Social Human Factor criteria with a weight equal to 0.1 each. The six criteria have been prioritized according to their importance using weights as concluded from the Website phishing experiments, case studies, Anti phishing tools analysis, web surveys, phishing quizzes, detailed questionnaire and phishing experts feedback.

5.1 MODULES Webpage Creation E-mail process Implementing Fuzzy Logic Model Final website Phishing rate

Webpage Creation: It is a web page it includes header and footer. In an index page have a login form like username and password and it contains homepage details about this project. New User Register Login Old user Remember the Password E-mail process: In this module includes sending and receiving a mail using JES server. Mail composing page have email address of recipient, subject and the content. All the mail received in the corresponding mail inbox. Input: User Send the Email Output: User Receive the E-Mail Implementing Fuzzy Logic Model: The essential advantage offered by fuzzy logic techniques is the use of linguistic variables to represent Key Phishing Characteristic Indicators and relating website phishing probability. Website phishing detection rate is performed based on six criteria: URL & Domain Identity, Security & Encryption, and Source Code & Java script, Page Style & Contents, Web Address Bar and Social Human Factor. There are three layers on this website phishing fuzzy model. The first layer contains only URL & Domain Identity criteria with a weight equal to 0.3

for its importance; the second layer contains Security & Encryption criteria and Source Code & Java script criteria with a weight equal to 0.2 each; the third layer contains Page Style & Contents criteria, Web Address Bar criteria And Social Human Factor criteria with a weight equal to 0.1 each. Input:

Input Email message in 3 Layers Output:

Return the status for the message in 3 Layers(Genuine, Fake, Uncertain)

Final website Phishing Rate: In the website phishing rule base last phase, there are three inputs, which are: layer one, layer two and layer three, and one output which is the rate of the phishing website The rule base contains (33) = 27 entries and the output of final website phishing rule base is one of the final output fuzzy sets (Very Legitimate, Legitimate, Suspicious, Phishy or Very Phishy) representing final phishing website rate. Input: Input 3 Layers Status

Output: Return Final Phishing Rate (legitimate, very legitimate, suspicious, phisy, very phisy)

CHAPTER 6
CODING AND TESTING 6.1 CODING Once the design aspect of the system is finalizes the system enters into the coding and testing phase. The coding phase brings the actual system into action by converting the design of the system into the code in a given programming language. Therefore, a good coding style has to be taken whenever changes are required it easily screwed into the system. 6.2 CODING STANDARDS Coding standards are guidelines to programming that focuses on the physical structure and appearance of the program. They make the code easier to read, understand and maintain. This phase of the system actually implements the blueprint developed during the design phase. The coding specification should be in such a way that any programmer must be able to understand the code and can bring about changes whenever felt necessary. Some of the standard needed to achieve the above-mentioned objectives are as follows: Program should be simple, clear and easy to understand. Naming conventions Value conventions Script and comment procedure Message box format Exception and error handling

6.2.1 NAMING CONVENTIONS Naming conventions of classes, data member, member functions, procedures etc., should be self-descriptive. One should even get the meaning and scope of the variable by its name. The conventions are adopted for easy understanding of the intended message by the user. So it is customary to follow the conventions. These conventions are as follows: Class names Class names are problem domain equivalence and begin with capital letter and hav mixed cases. Member Function and Data Member name Member function and data member name begins with a lowercase letter with each subsequent letters of the new words in uppercase and the rest of letters in lowercase. 6.2.2 VALUE CONVENTIONS Value conventions ensure values for variable at any point of time. This involves the following: Proper default values for the variables. Proper validation of values in the field. Proper documentation of flag values.

6.2.3 SCRIPT WRITING AND COMMENTING STANDARD Script writing is an art in which indentation is utmost important. Conditional and looping statements are to be properly aligned to facilitate easy understanding. Comments are included to minimize the number of surprises that could occur when going through the code. 6.2.4 MESSAGE BOX FORMAT When something has to be prompted to the user, he must be able to understand it properly. To achieve this, a specific format has been adopted in displaying messages to the user. They are as follows:

X User has performed illegal operation. ! Information to the user.

6.3 TEST PROCEDURE SYSTEM TESTING: Testing is performed to identify errors. It is used for quality assurance. Testing is an integral part of the entire development and maintenance process. The goal of the testing during phase is to verify that the specification has been accurately and completely incorporated into the design, as well as to ensure the correctness of the design itself. For example the design must not have any logic faults in the design is detected before coding commences, otherwise the cost of fixing the faults will be considerably higher as reflected. Detection of design faults can be achieved by means of inspection as well as walkthrough. Testing is one of the important steps in the software development phase. Testing checks for the errors, as a whole of the project testing involves the following test cases:

Static analysis is used to investigate the structural properties of the Source code. Dynamic testing is used to investigate the behavior of the source code by executing the program on the test data.

6.4 TEST DATA AND OUTPUT 6.4.1 UNIT TESTING: Unit testing is conducted to verify the functional performance of each modular component of the software. Unit testing focuses on the smallest unit of the software design (i.e.), the module. The white-box testing techniques were heavily employed for unit testing. 6.4.2 FUNCTIONAL TESTS Functional test cases involved exercising the code with nominal input values for which the expected results are known, as well as boundary values and special values, such as logically related inputs, files of identical elements, and empty files. Three types of tests in Functional test: Performance Test Stress Test Structure Test

6.4.3 PERFORMANCE TEST: It determines the amount of execution time spent in various parts of the unit, program throughput, and response time and device utilization by the program unit.

6.4.4 STRESS TEST: Stress Test is those test designed to intentionally break the unit. A Great deal can be learned about the strength and limitations of a program by examining the manner in which a programmer in which a program unit breaks. 6.4.5 STRUCTURED TEST Structure Tests are concerned with exercising the internal logic of a program and traversing particular execution paths. The way in which White-Box test strategy was employed to ensure that the test cases could Guarantee that all independent paths within a module have been have been exercised at least once. Exercise all logical decisions on their true or false sides. Execute all loops at their boundaries and within their operational bounds. Exercise internal data structures to assure their validity. Checking attributes for their correctness. Handling end of file condition, I/O errors, buffer problems and textual errors in output information 6.4.6 INTEGRATION TESTING: Integration testing is a systematic technique for construction the program structure while at the same time conducting tests to uncover errors associated with interfacing. i.e., integration testing is the complete testing of the set of modules which makes up the product. The objective is

to take untested modules and build a program structure tester should identify critical modules. Critical modules should be tested as early as possible. One approach is to wait until all the units have passed testing, and then combine them and then tested. This approach is evolved from unstructured testing of small programs. Another strategy is to construct the product in increments of tested units. A small set of modules are integrated together and tested, to which another module is added and tested in combination. And so on. The advantages of this approach are that, interface dispenses can be easily found and corrected. The major error that was faced during the project is linking error. When all the modules are combined the link is not set properly with all support files. Then we checked out for interconnection and the links. Errors are localized to the new module and its intercommunications. The product development can be staged, and modules integrated in as they complete unit testing. Testing is completed when the last module is integrated and tested.

6.5 TESTING TECHNIQUES / TESTING STRATEGIES 6.5.1 TESTING Testing is a process of executing a program with the intent of finding an error. A good test case is one that has a high probability of finding an as-yet undiscovered error. A successful test is one that uncovers an as-yet- undiscovered error. System testing is the stage of implementation, which is aimed at ensuring that the system works accurately and efficiently as expected before live operation commences. It verifies that the whole set of programs hang together. System testing requires a test consists of several key activities and steps for run

program, string, system and is important in adopting a successful new system. This is the last chance to detect and correct errors before the system is installed for user acceptance testing. The software testing process commences once the program is created and the documentation and related data structures are designed. Software testing is essential for correcting errors. Otherwise the program or the project is not said to be complete. Software testing is the critical element of software quality assurance and represents the ultimate the review of specification design and coding. Testing is the process of executing the program with the intent of finding the error. A good test case design is one that as a probability of finding an yet undiscovered error. A successful test is one that uncovers a yet undiscovered error. Any engineering product can be tested in one of the two ways: 6.5.1.1 WHITE BOX TESTING This testing is also called as Glass box testing. In this testing, by knowing the specific functions that a product has been design to perform test can be conducted that demonstrate each function is fully operational at the same time searching for errors in each function. It is a test case design method that uses the control structure of the procedural design to derive test cases. Basis path testing is a white box testing. Basis path testing: Flow graph notation
Cyclometric complexity

Deriving test cases Graph matrices Control 6.5.1.2 BLACK BOX TESTING

In this testing by knowing the internal operation of a product, test can be conducted to ensure that all gears mesh, that is the internal operation performs according to specification and all internal components have been adequately exercised. It fundamentally focuses on the functional requirements of the software. The steps involved in black box test case design are: Graph based testing methods Equivalence partitioning Boundary value analysis Comparison testing 6.5.2 SOFTWARE TESTING STRATEGIES A software testing strategy provides a road map for the software developer. Testing is a set activity that can be planned in advance and conducted systematically. For this reason a template for software testing a set of steps into which we can place specific test case design methods should be strategy should have the following characteristics: Testing begins at the module level and works outward toward the integration of the entire computer based system. Different testing techniques are appropriate at different points in time. The developer of the software and an independent test group conducts testing. Testing and Debugging are different activities but debugging must be accommodated in any testing strategy. 6.5.2.1 INTEGRATION TESTING:

Integration testing is a systematic technique for constructing the program structure while at the same time conducting tests to uncover errors associated with. Individual modules, which are highly prone to interface errors, should not be assumed to work instantly when we put them together. The problem of course, is putting them together- interfacing. There may be the chances of data lost across on anothers sub functions, when combined may not produce the desired major function; individually acceptable impression may be magnified to unacceptable levels; global data structures can present problems. 6.5.2.2 PROGRAM TESTING: The logical and syntax errors have been pointed out by program testing. A syntax error is an error in a program statement that in violates one or more rules of the language in which it is written. An improperly defined field dimension or omitted keywords are common syntax error. These errors are shown through error messages generated by the computer. A logic error on the other hand deals with the incorrect data fields, out-off-range items and invalid combinations. Since the compiler s will not deduct logical error, the programmer must examine the output. Condition testing exercises the logical conditions contained in a module. The possible types of elements in a condition include a Boolean operator, Boolean variable, a pair of Boolean parentheses A relational operator or on arithmetic expression. Condition testing method focuses on testing each condition in the program the purpose of condition test is to deduct not only errors in the condition of a program but also other a errors in the program. 6.5.2.3 SECURITY TESTING: Security testing attempts to verify the protection mechanisms built in to a system well, in fact, protect it from improper penetration. The system security must be tested for invulnerability

from frontal attack must also be tested for invulnerability from rear attack. During security, the tester places the role of individual who desires to penetrate system.

6.5.2.4 VALIDATION TESTING At the culmination of integration testing, software is completely assembled as a package. Interfacing errors have been uncovered and corrected and a final series of software testvalidation testing begins. Validation testing can be defined in many ways, but a simple definition is that validation succeeds when the software functions in manner that is reasonably expected by the customer. Software validation is achieved through a series of black box tests that demonstrate conformity with requirement. After validation test has been conducted, one of two conditions exists. * The function or performance characteristics confirm to specifications and are accepted. * A validation from specification is uncovered and a deficiency created. Deviation or errors discovered at this step in this project is corrected prior to completion of the project with the help of the user by negotiating to establish a method for resolving deficiencies. Thus the proposed system under consideration has been tested by using validation testing and found to be working satisfactorily. Though there were deficiencies in the system they were not catastrophic. 6.5.2.5 USER ACCEPTANCE TESTING User acceptance of the system is key factor for the success of any system. The system under consideration is tested for user acceptance by constantly keeping in touch with prospective

system and user at the time of developing and making changes whenever required. This is done in regarding to the following points. Input screen design. Output screen design. Menu driven system.

CHAPTER 7

CONCLUSION AND FUTURE ENHANCEMENT Phishing has becoming a serious network security problem, causing finical lose of billions of dollars to both consumers and e-commerce companies. And perhaps more fundamentally, phishing has made e-commerce distrusted and less attractive to normal consumers. In this paper, we have studied the characteristics of the hyperlinks that were embedded in phishing e-mails. The fuzzy website phishing model showed the significance and importance of the phishing website criteria (URL & Domain Identity) represented by layer one, and also showed that even if some of the website phishing characteristics or layers are not very clear or not definite, the website can still be phishy especially when other phishing characteristics or layers are obvious and clear. On the other hand even if some of the website phishing characteristics or layers are noticed or observed, that does not mean at all that the website is phishy, but it can be safe and secured especially when other phishing characteristics or layers are not noticeable, visible or detectable.

CHAPTER 8 Source code

Signup.java import java.io.*; import java.sql.Connection; import java.sql.DriverManager; import java.sql.PreparedStatement; import java.sql.ResultSet; import javax.servlet.*; import javax.servlet.http.*;

import Appn.Member; public class Signup extends HttpServlet {

public void doPost(HttpServletRequest request,HttpServletResponse response) throws ServletException, IOException { PrintWriter out = response.getWriter(); Member member=new Member(); ServerReg s=new ServerReg(); member.name=request.getParameter("p1"); String user=request.getParameter("p2"); member.id=user+"@vaseramail.com"; member.pwd=request.getParameter("p31"); member.secq=request.getParameter("p4"); member.seca=request.getParameter("p5"); boolean result=signup(member); if(result) {

request.setAttribute("user",member.id);

request.setAttribute("pwd",member.pwd); request.setAttribute("secq",member.secq); request.setAttribute("seca",member.seca); request.setAttribute("name",member.name);

RequestDispatcher rq=request.getRequestDispatcher("modsignup.jsp"); request.setAttribute("msg","Id choose another ID"); rq.forward(request,response); already exist,Please

} else { HttpSession ses=request.getSession(true); if(ses!=null) { ses.setAttribute("user",member.id);

ses.setAttribute("pwd",member.pwd); ses.setAttribute("seca",member.seca); ses.setAttribute("secq",member.secq); }

s.Reg(user,member.pwd,getServletConfig().getServletContext().g etInitParameter("jespath"),getServletConfig().getServletContext( ).getInitParameter("server")); RequestDispatcher rq=request.getRequestDispatcher("welcome.jsp"); rq.forward(request,response); } } public boolean signup(Member member) { boolean exist=false; try {

DriverManager.registerDriver( new oracle.jdbc.driver.OracleDriver() ); Connection conn = DriverManager.getConnection( "jdbc:oracle:thin:@"+getServlet Config().getServletContext().getInitParameter("server"),"system ","redhat" ); PreparedStatement st = conn.prepareStatement("Select * from MEMBER WHERE id = '"+member.id+"'"); System.out.println("member id$$$"+member.id); ResultSet rs = st.executeQuery(); while(rs.next()) { exist=true; }

if(!exist) {

String query="insert into MEMBER values('"+member.name+"','"+member.id+"','"+member.pwd+" ','"+member.secq+"','"+member.seca+"')"; st.executeUpdate(query);

st.close(); conn.close();

} catch(Exception e) { return true; }

return exist; }

REFERENCES

WholeSecurity Anti-Phishing Working

Web Group.

Caller-ID, Phishing Activity

www.wholesecurity.com Trends Report,

http://antiphishing.org/reports/apwg_report_DEC2005_FINAL.pdf, December 2005. B. Adida, S. Hohenberger and R. Rivest, Lightweight ncryption for Email, USENIX Steps to Reducing Unwanted Traffic on the Internet Workshop (SRUTI), 2005. S.M. Bridges and R.B.Vaughn, fuzzy data mining and genetic algorithms applied to intrusion detection, Department of Computer Science Mississippi State University, White Paper, 2001. R.Dhamija and J.D. Tygar, The Battle against Phishing: Dynamic Security Skins, Proc.Symp. Usable Privacy and Security,2005. FDIC., Putting an End to Account-Hijacking Identity Theft,

http://www.fdic.gov/consumers/consumer/idtheftstudy/identity_theft.pdf, 2004.

A. Y. Fu, L.

Wenyin and X. Deng, Detecting Phishing Web Pages with Visual Similarity Assessment Based on Earth Movers Distance (EMD) , IEEE transactions on dependable and secure computing, vol. 3, no. 4, 2006. A. Herzberg and A. Gbara, Protecting Naive Web sers, Draft of July 18, 2004. C. Y. Ho, B. W. Ling and J. D. Reiss, "Fuzzy Impulsive Control of High-Order

Interpolative Low-Pass SigmaDelta Modulators," IEEE Transactions on Circuits and Systems I: Regular Papers, Vol. 53, No. 10, October 2006. L. James, Phishing Exposed, Tech Target Article sponsored by: Sunbelt software, searchexchange.com, 2006. M. Liu, D. Chen and C. Wu. "The continuity of Mamdani method," International Conference on Machine Learning and Cybernetics, Page(s): 1680 - 1682 vol.3, 2002. W. Liu, G. Huang, X. Liu, M. Zhang, and X. Deng, Phishing Web Page Detection, Proc. Eighth Intl Conf. Documents Analysis and Recognition, pp. 560-564, 2005. W. iu, X. Deng, G. Huang and A. Y. Fu, An Antiphishing Strategy Based on Visual Similarity Assessment, Published by the IEEE Computer Society 1089-7801/06 IEEE, INTERNET COMPUTING IEEE, 2006. Microsoft Corp, Microsoft

Phishing Filter: A New Approach to Building Trust in E-Commerce Content, White Paper, 2005. S. Olsen, AOL tests caller ID for e-mail, CNET News.com, January 22, 2004. Y. Pan and X. Ding, Anomaly BasedWeb Phishing Page Detection, Proceedings of the 22nd Annual Computer Security Applications Conference ACSAC'06), Computer Society, 2006. J. C. Perez, Yahoo airs antispam initiative, ComputerWeekly.com, December 8, 2003. S. Shah,

Measuring Operational Risks using Fuzzy Logic Modeling, Article, Towers Perrin, JULY 2003. T.Sharif, Phishing Filter in IE7, http://blogs.msdn.com/ie/archive/2005/09/09/463204.aspx, September 9, 2006. http://www.w3.org, 2005. L. Wood, Document Object Model Level 1 Specification,

Vous aimerez peut-être aussi