Vous êtes sur la page 1sur 36

Auto Summarization Tool 201

ABSTRACT

This project on Auto Summarization is used for developing an innovative program for your
computer that will create a short summary from any document or a browsed web page.
Many people suffer with the daunting task of reading large amounts of electronic textual
material which takes a lot of human effort. So, the objective of this project is to design an
automatic text extraction system i.e., an auto summarization tool to alleviate, if not totally
solve, the above problem. We do this by using the statistical approach of auto summarization.

Procedure for this approach involves the following steps:-


First we study about auto-summarizing techniques and concentrate more on summarizers based
on statistical techniques. Then collect the list of stop-words and basic words.
The auto summarization tool begins by separating and reading each sentence and word.
Then it removes all the Stop Words from the given paragraph. The list of all the basic word
is collected. Then the next step is to count how frequently the words appear in a
document.

Now scoring each sentence — For this we will use the formula
Sentence Score = Sum of Word Scores/Number of Normal Words,
Where the frequency count of a normal word = number of occurrences of that word in the whole
document/number of sentences in which that word occurs

The proposed system is a computerized one. The system requires very less data entry. Automatic
text summarization is a process of condensing a source document into a shorter version of
text, while keeping the most significant information using a computer program.
As per the above suggested steps , the developed tool is a powerful tool and has fast
summarization capabilities. It will be a cross-platform software which will produce
summary based on the statistical methods and algorithms used.
The project has been made using C#.net technology.

Page 1
Auto Summarization Tool 201
0

TABLE OF CONTENTS

ABSTRACT ………………………………………………………………… i

TABLE OF CONTENTS ………………………………………………… ii

LIST OF TABLES ………………………………………………………... iii

LIST OF FIGURES …………………………………………………………. iv

Contents
Contents............................................................................................................................................2
INTRODUCTION............................................................................................................................6
1.1 BACKGROUND.................................................................................................................6
1.2 PURPOSE...............................................................................................................................7
1.3 SCOPE....................................................................................................................................8
1.4 OVERVIEW...........................................................................................................................8
1.5 ORGANIZATION OF REPORT............................................................................................9
2.1 SYSTEM INTERFACES......................................................................................................10
2.2 USER INTERFACES...........................................................................................................10
2.3 SOFTWARE INTERFACES................................................................................................11
2.4 HARDWARE INTERFACE................................................................................................11
2.5 REQUIREMENT ANALYSIS.............................................................................................12
2.6 FEASIBILITY STUDY........................................................................................................12
2.6.1 ECONOMICAL FEASIBILITY........................................................................................13
2.6.2 TECHNICAL FEASIBILITY............................................................................................13
2.6.3 OPERATIONAL FEASIBILITY .....................................................................................13
2.7 TECHNOLOGIES TO BE USED:.......................................................................................13
2.7.1 INTRODUCTION:............................................................................................................14
2.7.2 IMPORTANT FACTS ABOUT C#:.................................................................................14
2.7.3 VISUAL STUDIO 2008:...................................................................................................15
2.7.4 THE COMPILER:.............................................................................................................15
2.7.5 C#:......................................................................................................................................16
2.7.6 SQL SERVER MANAGEMENT STUDIO:.....................................................................16
3.1 MODULE DESCRIPTION ...............................................................................................17
3.1.1 TEXT PRE-PROCESSOR:................................................................................................17

Page 2
Auto Summarization Tool 201
0

3.1.2 SENTENCE SEPARATOR:..........................................................................................17


3.1.3 WORD SEPARATOR:..................................................................................................18
3.1.4 STOP-WORDS ELIMINATOR:...................................................................................18
3.1.5 WORD-FREQUENCY CALCULATOR:.....................................................................18
3.1.6SCORING ALGORITHM:.............................................................................................18
3.1.7 RANKING:....................................................................................................................18
3.1.9 USER INTERFACE:.....................................................................................................18
3.2 OVERVIEW OF SCORING TECHNIQUE:........................................................................19
3.2.1 WORD SCORE:................................................................................................................19
3.2.2 SENTENCE SCORE:........................................................................................................19
4.1 DATABASE USED..............................................................................................................20
4.1.1 TABLE 1- STOP WORDS TABLE .............................................................................20
5.1 CONTEXT DIAGRAM.......................................................................................................20
5.2 DATA FLOW DIAGRAM (LEVEL 1)................................................................................21
5.3 FLOW CHART ....................................................................................................................22
5.3.1 SENTENCE SEPARATOR:..........................................................................................22
5.3.2 WORDS FREQUENCY CALCULATOR:...................................................................24
5.4.3 SCORING......................................................................................................................25
5.4.4 RANKING.....................................................................................................................26
5.4.5 USER INTERFACE..........................................................................................................27
6.1 FEATURES OF THIS PROJECT........................................................................................28
6.2 CONSTRAINTS & LIMITATIONS....................................................................................28
6.3 ASSUMPTIONS & DEPENDENCIES..............................................................................28
6.4 MERITS OF PROPOSED SYSTEM..................................................................................29
7.1 SOFTWARE SYSTEM ATTRIBUTES...............................................................................30
7.1.1 RELIABILITY...............................................................................................................30
7.1.2 AVAILABILITY..........................................................................................................30
7.1.3 MAINTAINABILITY...................................................................................................30
7.1.4 PORTABILITY.............................................................................................................30
7.2 OTHER REQUIREMENTS.................................................................................................31
7.3 OTHER NON-FUNCTIONAL ATTRIBUTES...................................................................31
8.1 SCREEN-SHOTS:................................................................................................................34
8.3 REFERENCES:....................................................................................................................36

Page 3
Auto Summarization Tool 201
0

LIST OF TABLES

Table 2.1- Software Interface table


Table 2.2- Hardware Interface table
Table 4.1- Database table 1 (Stop words)

Page 4
Auto Summarization Tool 201
0

LIST OF FIGURES

Figure 1.1- Categories of Auto Summarization


Figure 3.1- Modules of Auto Summarization Tools
Figure 3.2- Scoring Technique
Page 5
Auto Summarization Tool 201
0

Figure 5.1- Context Diagram


Figure 5.2- Data Flow Diagram
Figure 5.3- Flow Chart (Sentence Separator)
Figure 5.4- Flow Chart (Basic Word Identifier)
Figure 5.5- Flow Chart (Scoring)
Figure 5.6- Flow Chart (Ranking)
Figure 5.7- Flow Chart (User Interface)
Figure 8.1- Screen Shot (Splash Screen)
Figure 8.2- Screen Shot (Main GUI)

INTRODUCTION
1.1 BACKGROUND
In the computer age, people are inundated with papers, memos, e-mail messages, reports, web
pages, schedules, reference materials, test results, and so on. All this requires a lot of time and
human effort. Unfortunately, many documents do not begin with summaries. Creation of
summaries is tedious, requiring the author to re-read the document, identify major themes,
and distill the main points of the document into a concise summary.

Summarizing a document is even more difficult and time-consuming for a reader. The
reader must first read the entire document (or at least skim it) to understand the contents.
The reader must then attempt to extract the document's key points from unimportant
details.

Page 6
Auto Summarization Tool 201
0

The problem is less critical, but still troubling, for individual users who are browsing
through the Internet or other networks to find documents on a related topic. With the
explosion in the quantity of online text and multimedia information in recent years, there
has been a renewed interest in automatic summarization.

Some of the negative aspects of the existing system are as follows:

Course of reading and understanding is time consuming.

Readability is constrained. All the records may not be handled or written by the same person. So
the format and style of records differ and hence it is difficult to understand.

Sometimes all the reading, work and time goes wasted as after reading the whole
document one realizes it to be worthless.

Manual reading needs added man power.

To overcome these, the proposed system has been suggested.

1.2 PURPOSE
A summary or recap is a shortened version of the original or it is the restating of the
main ideas of the text in as few words as possible. The main purpose of such a
simplification is to highlight the major points from the original (much longer) subject. The
target is to help the user get the gist in a short period of time.

Auto Summarizer analyzes a document and then assigns a score to each sentence. We
Decide the amount of detail you want, and AutoSummarize uses the scoring system to
Extract the key points and assemble them for us.

Auto-summarization is a technique used to generate summaries of electronic documents.


This has some applications like summarizing the search-engine results, providing briefs of big
documents that do not have an abstract etc.

So, this is a project for developing an innovative program for your computer that will
create a short summary from any document or a browsed web page. This has some
applications like summarizing the search-engine results, providing briefs of big documents
that do not have an abstract etc..
Automatic text summarization is a process of condensing a source document into a shorter
version of text, while keeping the most significant information using a computer program.

Page 7
Auto Summarization Tool 201
0

1.3 SCOPE

An abstract or summary at the beginning of a document can help a reader quickly understand the
scope of a body of information. The AutoSummary Tools in Microsoft Office Word highlights
and assembles key points of a document.

AutoSummarize analyzes a document and then assigns a score to each sentence. We decide the
amount of detail you want, and AutoSummarize uses the scoring system to extract the key points
and assemble them for us.

The possibilities in this project are endless


1. Generating newspaper headlines, given the article.
2. Filling up forms, given text containing the necessary data.
3. Creating a bio-data, from a textual detail of the person.
4. Clear attempt to generate the balanced summary even on documents with multiple
topics.

1.4 OVERVIEW

Auto-summarization is a technique used to generate summaries of electronic documents. This has


some applications like summarizing the search-engine results, providing briefs of big documents
that do not have an abstract etc. There are two categories of summarizers, linguistic and
statistical.

Linguistic summarizers use knowledge about the language (syntax/semantics/usage etc) to


summarize a document.

Statistical ones operate by finding the important sentences using statistical methods (like
frequency of a particular word etc). Statistical summarizers normally do not use any linguistic
information. In this project, an auto-summarization tool is developed using statistical techniques.
The techniques involve finding the frequency of words, scoring the sentences, ranking the
sentences etc. The summary is obtained by selecting a particular number of sentences (specified
by the user) from the top of the list. It operates on a single document (but can be made to work on
multiple documents by choosing proper algorithms for integration) and provides a summary of
the document. The size of the summary can be specified by the user when invoking the tool. Pre-
processing interfaces are there to handle the following document types: Plain Text, HTML, and
Word Document.

In this project, the technique used for summarizing is STATISTICAL one. This technique
is being implemented and used because in comparison to linguistic method for generating

Page 8
Auto Summarization Tool 201
0

the text, it is very easy to develop and also time saving, where the result of both the
summarizers (i.e., the summary generated) is not very different.

We have scored sentences in the given text to generate a summary comprising of the
most important ones obtained so. The program takes input from a text file, and outputs
the summary into a similar text file. The summary is obtained by selecting a particular
number of sentences (specified by the user) from the top of the list i.e., the size of the
summary can be specified by the user when invoking the tool. It operates on a single
document (but can be made to work on multiple documents by choosing proper algorithms
for integration) and provides a summary of the document. The advanced scoring techniques
like finding the frequency of words, scoring the sentences, ranking the sentences etc., based
upon the placement of sentences is being implemented.

1.5 ORGANIZATION OF REPORT

Our project report is sectioned into various sections that describes the pattern in which
our project has come to its completion. Abstract includes a brief and simple description about
overall project. Section one explains introductory part which contains overview about how
many summarizers are there and which out of them we are using for our project and
why.
Section second implies product perspective in which we highlight the initial requirements
that are required in the beginning for developing this project, like software interface,
hardware interface requirement analysis, feasibility study, and all the technical terms with
their explanation are mentioned in this section.
Next section explains about the various modules along with their working. Fourth section
includes description of various database tables to be used for storing information like
basic word table and stop word table. It is then followed by description of information in
diagrammatic manner like using Context diagrams, DFD’s, and flowcharts of all the
project modules.
The last section of this report includes a working example from the developed tool along
with screenshots and coding used for the project.

Page 9
Auto Summarization Tool 201
0

PRODUCT PERSPECTIVE

2.1 SYSTEM INTERFACES


Regarding system interfaces, the Auto Summarization Tool comprises of interface between
the application and the database. Front end will be in Asp.net and back end in Sql Server
Management Studio.

2.2 USER INTERFACES

Page 10
Auto Summarization Tool 201
0

The user can enter the size of summary i.e., the no. of sentences he wants to be
displayed in summary. He can also browse the file from the system or any web page
whose summary is to be generated by converting these pages in plain text. This provides
flexibility to the user to manage the output and get the summary displayed on screen
itself.

2.3 SOFTWARE INTERFACES

Operating system Windows Vista

Platform : C#.net
Front End
Language Used : C#

Back End Db used : SQL Server Management Studio


Table 2.1

2.4 HARDWARE INTERFACE

The proposed system is built on

Processor Core 2 Duo

RAM 2 GB

Hard Disk Drive 160 GB

Keyboard Standard 101/102

Monitor Display Panel (1024 X 764)

Page 11
Auto Summarization Tool 201
0

Mouse Logical Serial Mouse

Table 2.2

2.5 REQUIREMENT ANALYSIS

At the heart of system analysis is a detailed understanding of all important facets of


business area under investigation. (For this reason, the process of acquiring this is often
termed the detailed investigation). Analyst, working closely with the employees and
managers, must study the business process to answer.

These key questions:


 What is being done?
 How is it being done?
 How great is the volume of text?
 How well is the task being performed?
 Does a problem exist?
 If a problem exists, how serious is it?
 If a problem exists, what is the underlying cause?

Requirement analysis relies on fact-finding techniques. These include:

 Interview
 Questionnaires
 Record inspection
 On-site observation

2.6 FEASIBILITY STUDY

A feasibility study is conducted to select the best system that meets performance
requirement. This entails an identification description, an evaluation of candidate system and
the selection of best system for job. The system required a statement of constraints; the
identification of specific system objective and a description of output define performance
etc. The key considerations in feasibility analysis are:
 Economic Feasibility
 Technical Feasibility
 Operational Feasibility

Page 12
Auto Summarization Tool 201
0

2.6.1 ECONOMICAL FEASIBILITY


It looks at the financial aspects of the project. It determines whether the management has
enough resource and budget to invest in the proposed system and the estimated time for
the recovery of cost incurred. It also determines whether it is worthwhile to invest the
money in the proposed project. Economic feasibility is determined by the means of cost benefit
analysis. The proposed system is economically feasible because the cost involved in
purchasing the hardware and the software are within approachable. The operating-
environment costs are marginal. The less time involved also helped in its economical feasibility.

The backend required for storing other details(words) is also the database MY SQL. The
computer present are highly sophisticated and don’t need extra components to load the
software. Hence the tool can easily be implemented in the new system without any
additional expenditure. Hence, it is economically feasible.

2.6.2 TECHNICAL FEASIBILITY


It is a measure of the practically of a specific technical solution and the availability of
technical resources and expertise. The proposed system uses java swings as front-end and
MY SQL as back-end tool.
 MY SQL offers superior speed, reliability, and ease of use.
 The above tools are readily available, easy to work with and widely used for
developing commercial application.

2.6.3 OPERATIONAL FEASIBILITY


The system will be used if it is developed well then be resistance for users that
undetermined -

 No major training and new skills are required.


 It will help in the time saving and fast processing and dispersal of user request
and applications.
 Improved and selected information, better management.
 User support.
 User involvement in the building of present system is sought to keep in mind the
user specific requirement and needs.

2.7 TECHNOLOGIES TO BE USED:

Page 13
Auto Summarization Tool 201
0

 Framework – .NET

 Front end – C#.NET

 Back end – SQL

 Database – SQL SERVER MANAGEMENT STUDIO

2.7.1 INTRODUCTION:

C#.NET is Microsoft’s next-generation technology for creating robust application softwares. It’s
built on the Microsoft .NET Framework, which is a cluster of closely related new technologies
that revolutionizes everything from database access to distributed applications. C#.NET is one of
the most important components of the .NET Framework—it’s the part that enables you to develop
high-performance desktop applications and services.
C#, which was designed as a quick and easy language for creating quick applications, by contrast,
C#.NET is a full-blown platform for developing comprehensive, blisteringly fast applications.

2.7.2 IMPORTANT FACTS ABOUT C#:

• C#.NET is integrated with the .NET Framework


• C#.NET is compiled, not interpreted
• C#.NET is Multilanguage
• C#.NET runs inside the Common Language Runtime
• C#.NET is object-oriented
• C#.NET is Multi device and Multi Platform

Page 14
Auto Summarization Tool 201
0

• C#.NET is easy to deploy and configure

2.7.3 VISUAL STUDIO 2008:

Visual Studio 2008 and Visual Studio Team System 2008 codenamed Orcas were released to
MSDN subscribers on 19 November 2007 alongside .NET Framework 3.5.
Visual Studio 2008 is focused on development of Windows Vista, 2007 Office system, and Web
applications. For visual design, a new Windows Presentation Foundation visual designer and a
new HTML/CSS editor influenced by Microsoft Expression Web are included. J# is not included.
Visual Studio 2008 requires .NET 3.5 Framework and by default configures compiled assemblies
to run on .NET Framework 3.5, but it also supports multi-targeting which lets the developers
choose which version of the .NET Framework the assembly runs on. Visual Studio 2008 also
includes new code analysis tools, including the new Code Metrics tool (only in Team Edition and
Team Suite Edition). For Visual C++, Visual Studio adds a new version of Microsoft Foundation
Classes (MFC 9.0) that adds support for the visual styles and UI controls introduced with
Windows Vista. For native and managed code interoperability, Visual C++ introduces the
STL/CLR, which is a port of the C++ Standard Template Library (STL) containers and
algorithms to managed code. STL/CLR defines STL-like containers, iterators and algorithms that
work on C++/CLI managed objects.
Visual Studio 2008 features include an XAML-based designer (codenamed Cider), workflow
designer, LINQ to SQL designer (for defining the type mappings and object encapsulation for
SQL Server data), XSLT debugger, JavaScript Intelligence support, JavaScript Debugging
support, support for UAC manifests, a concurrent build system, among others. It ships with an
enhanced set of UI widgets, both for Windows Forms and WPF. It also includes a multithreaded
build engine (MSBuild) to compile multiple source files (and build the executable file) in a
project across multiple threads simultaneously. It also includes support for
compiling PNG compressed icon resources introduced in Windows Vista. An updated XML
Schema designer will ship separately some time after the release of Visual Studio 2008.
The Visual Studio debugger includes features targeting easier debugging of multi-threaded
applications. In debugging mode, in the Threads window, which lists all the threads, hovering
over a thread will display the stack trace of that thread in tooltips. The threads can directly be
named and flagged for easier identification from that window itself. In addition, in the code
window, along with indicating the location of the currently executing instruction in the current
thread, the currently executing instructions in other threads are also pointed out. The Visual
Studio debugger supports integrated debugging of the .NET 3.5 Framework Base Class Library
(BCL) which can dynamically download the BCL source code and debug symbols and allow
stepping into the BCL source during debugging. As of 2010 a limited subset of the BCL source is
available, with more library support planned for later.

2.7.4 THE COMPILER:

Page 15
Auto Summarization Tool 201
0

.NET separates these two pieces. That way, every language can use the same design tools.
The .NET language compilers include the following:
• The Visual Basic compiler (vbc.exe)
• The C# compiler (csc.exe)

2.7.5 C#:
C# (pronounced "see sharp") is a multi-paradigm programming language encompassing
imperative, functional, generic, object-oriented (class-based) and component- programming
disciplines. It was developed by Microsoft within the .NET.

2.7.6 SQL SERVER MANAGEMENT STUDIO:

SQL Server Management Studio is a tool included with Microsoft SQL Server 2005 and later
versions for configuring, managing, and administering all components within Microsoft SQL
Server. The tool includes both script editors and graphical tools which work with objects and
features of the server.

A central feature of SQL Server Management Studio is the Object Explorer, which allows the
user to browse, select, and act upon any of the objects within the server.

MODULES

This system will contain various modules. User can interact with the System using the GUI
interface provided. The list of modules is as follows:-

 Text Pre-processor
 Sentence Separator
 Words Separator

Page 16
Auto Summarization Tool 201
0

 Stop Words Eliminator


 Words Frequency Calculator
 Scoring Algorithm
 Ranking algorithm
 Summarizing

Figure 3.1

3.1 MODULE DESCRIPTION

3.1.1 TEXT PRE-PROCESSOR:


This will work on the HTML or Word Documents and convert them to plain text for processing
by the rest of the system.

3.1.2 SENTENCE SEPARATOR:


This goes through the document and separates the sentences based on some rules (like a sentence
ending is determined by a dot and a space etc). Any other appropriate criteria might also be added
to separate the sentences.

Page 17
Auto Summarization Tool 201
0

3.1.3 WORD SEPARATOR:


This separates the words based on some criteria (like a space denotes the end of a word etc).

3.1.4 STOP-WORDS ELIMINATOR:


This eliminates the regular English words like ‘a, an, the, of, from....’ etc for further processing.
These words are known as ‘stop-words’. A list of applicable stop-words for English is available
on the Internet.

3.1.5 WORD-FREQUENCY CALCULATOR:


This calculates the number of times a word appears in the document (stop-words have been
eliminated earlier itself and will not figure in this calculation) and also the number of sentences
that word appears in the document. For example, the word ‘Unix’ may appear a total of 100 times
in a document, and in 80 sentences. (Some sentences might have more than one occurrence of the
word). Some min-max thresholds can be set for the frequencies (the thresholds to be determined
by trial-and-error)

3.1.6SCORING ALGORITHM:
This algorithm determines the score of each sentence. Several possibilities exist. The score can be
made to be proportional to the sum of frequencies of the different words comprising the sentence
(i.e., if a sentence has 3 words A, B and C, then the score is proportional the sum of how many
times A, B and C have occurred in the document). The score can also be made to be inversely
proportional to the number of sentences in which the words in the sentence appear in the
document. Likewise, many such heuristic rules can be applied to score the sentences.

3.1.7 RANKING:
The sentences will be ranked according to the scores. Any other criteria like the position of a
sentence in the document can be used to control the ranking. For example, even though the scores
are high, we would not put consecutive sentences together.

3.1.8 SUMMARIZING:
Based on the user input on the size of the summary, the sentences will be picked from the ranked
list and concatenated. The resulting summary file could be stored with a name like
<originalfilename>_summary.txt.

3.1.9 USER INTERFACE:


The tool could use a GUI or a plain command-line interface. In either case, it should have easy
and intuitive ways of getting the input from the user (the document, the size of the summary
needed etc).

Page 18
Auto Summarization Tool 201
0

3.2 OVERVIEW OF SCORING TECHNIQUE:

Figure 3.2

3.2.1 WORD SCORE:

The words in a sentence are scored on basis of the word type.


A Stop Word is ignored and hence is not given any score.

On the other hand, the score of a Normal Word is calculated on the following criteria:
“Their score is directly proportional the total number of occurrences in the whole document and
inversely proportional to the number of sentences in which the word occurs.”

3.2.2 SENTENCE SCORE:


Final Score is calculated by dividing the primary score (sum of scores of all normal words)
by total no. of words (excluding stop words) in a sentence.

Sentence Score = Sum of Word Frequency Counts


Number of Normal Words

NOTE:
 Normal words are the words excluding the stop words.
 A list of stop words is given at the end of this document.

Page 19
Auto Summarization Tool 201
0

DATABASE DESCRIPTIONS

4.1 DATABASE USED


We are using SQL Server Management Studio as database in this project.

4.1.1 TABLE 1- STOP WORDS TABLE

This table consists of stop words i.e., the regular English words like a, an, the etc., which
needs to be eliminated from the document before the scoring of each sentence i.e., these
words are not counted while calculating the score of a particular sentence.

FIELD TYPE

Words VARCHAR2

Table 4.1

INFORMATION DESCRIPTION

5.1 CONTEXT DIAGRAM

Page 20
Auto Summarization Tool 201
0

Figure 5.1

5.2 DATA FLOW DIAGRAM (LEVEL 1)

Page 21
Auto Summarization Tool 201
0

Figure 5.2

5.3 FLOW CHART

5.3.1 SENTENCE SEPARATOR:

Page 22
Auto Summarization Tool 201
0

Figure 5.3

Page 23
Auto Summarization Tool 201
0

5.3.2 WORDS FREQUENCY CALCULATOR:

Figure 5.4

Page 24
Auto Summarization Tool 201
0

5.4.3 SCORING

Figure 5.5

Page 25
Auto Summarization Tool 201
0

5.4.4 RANKING

Figure 5.6

Page 26
Auto Summarization Tool 201
0

5.4.5 USER INTERFACE

Figure 5.7

Page 27
Auto Summarization Tool 201
0

OVERALL DESCRIPTION

6.1 FEATURES OF THIS PROJECT

 Produces document summary reports for text contents by processing documents.


 Summarizes text document from the Clipboard.
 Can modify in real-time the summary length value of a document summary
(maximum number of sentences required).
 Powerful and fast summarization capabilities

6.2 CONSTRAINTS & LIMITATIONS

 It can only process plain text only.

 Some amount of information is lost while the generation of the summary. The
amount of information lost depends on the specified number of sentences by the
user.

 It can only summarize one document at a time.

6.3 ASSUMPTIONS & DEPENDENCIES

Page 28
Auto Summarization Tool 201
0

 It is assumed that only few important sentences consisting of the key words from
the given text will give accurate summary of the text.
 Every user should be comfortable of working with computer.
 He must have basic knowledge of English too.

6.4 MERITS OF PROPOSED SYSTEM

 The system is simple in design and to implement.


 The project will be cross-platform software which will be capable of working on
different hardware and software platforms.
 The Summarizer is fast and easy-to-use and has time-saving capabilities.
 The system requires system resources that are easily available.
 The system requires no complex system configuration.
 BWM, NWM and SM are visible on UI.

The document summarizer is advantageous over prior summarizers because it is designed


from the author's standpoint. It enables authors to automatically create summaries of their
writings using a combined statistical and basic-words approach.

Another advantage of the summarizer stems from the combined statistical and basic word
processing. This dual analysis is beneficial because the statistical component ensures that a
summary will always be produced, and the basic word component improves the quality of
the resulting summary.

Page 29
Auto Summarization Tool 201
0

SPECIFIC REQUIREMENTS

7.1 SOFTWARE SYSTEM ATTRIBUTES

7.1.1 RELIABILITY
This System is very reliable in the sense that it saves all the summaries generated earlier
in the database.

7.1.2 AVAILABILITY
This feature will be available to all the users. No complex software or hardware requirements are
there. Only the text to be input is needed.

7.1.3 MAINTAINABILITY
As far as the maintainability is concerned, our System is very simple to maintain.
Information can easily be entered in database and database is even simple to use and
maintain.

7.1.4 PORTABILITY
For a software application to be good and effective, it should run on different platforms.
This project will be cross-platform software. Our System is basically meant for the stand-
alone system but it can transfer whole database along with the source codes to another
system. Hence portability can be achieved easily in our system. That is why, our System is
portable.

Page 30
Auto Summarization Tool 201
0

7.2 OTHER REQUIREMENTS


The basic requirement of our system is that the system on which this website will be
hosted should have the any OS which supports ADO.NET. But the System has been
developed using the Windows platform. It should also have Microsoft visual studio and the
database used is SQL Server 2008.

7.3 OTHER NON-FUNCTIONAL ATTRIBUTES

The checklist that follows provides a set of characteristics that lead to a testable software.

7.3.1 OPERABILITY ‘The better it works the more efficiently it can be tested’
 No bug in the system block the execution of tests.
 The product evolves in functional stages ( allows simultaneous development and
testing ).

7.3.2 OBSERVABILITY ‘What you see is what you test’


 Distinct output is generated for each input
 All factors affecting the output are visible
 Internal errors are automatically detected through self-testing mechanisms
 Internal errors are automatically reported
 Source code is accessible

7.3.3 CONTROLLABILITY ‘The better we can control the software, the more the
testing can be automated and optimized’
 All possible outputs can be generated through some combination of input.
 All code is executable through some combination of input
 Software and hardware states and variables can be controlled directly by the test
engineer.
 Input and output formats are consistent and structured
 Tests can be conveniently specified, automated and reproduced

Page 31
Auto Summarization Tool 201
0

7.3.4 DECOMPOSABILITY ‘By controlling the scope of testing, we can more


quickly isolate problems and perform smarter re-testing’
 The software system is built from intendment modules
 Software modules can be tested independently

7.3.5 SIMPLICITY ‘The less there is to test, the more quickly we can test it’
 Functional simplicity ( e.g. the feature set is the minimum necessary to meet
requirements ).
 Structural simplicity ( e.g. architecture is modularized to limit the propagation of
faults ).
 Code simplicity ( e.g. a coding standard is adopted for ease of inspection and
maintenance ).

7.3.6 STABILITY ‘The fewer the changes, the fewer the disruptions to testing’
 Changes to the software are infrequent
 Changes to the software are controlled
 Changes to the software do not invalidate existing tests
 The software recovers well from failures

7.3.7 UNDERSTANDABILITY ‘the more information we have, the smarter we will test’
 The design is well understood
 Changes to the design are communicated
 Technical documentation is instantly accessible
 Technical documentation is well organized
 Technical documentation is specific and detailed
 Technical documentation is accurate

Page 32
Auto Summarization Tool 201
0

WORKING (EXAMPLE)

The paragraph for summarizing is given as follows:

“The event horizon is where the force of gravity becomes so strong that even light is pulled
into the black hole. Although the event horizon is part of a black hole, it is not a tangible
object. If you were to fall into a black hole, it would be impossible for you to know when
you hit the event horizon. For a mathematical derivation of the radius of an event horizon
see below.
The singularity is not really a tangible object either. According to the General Theory of
Relativity the Singularity is a point of infinite space time curvature. This means that the
force of gravity has become infinitely strong at the center of a black hole. Everything that
falls into a black hole by passing the event horizon, including light, will eventually reach the
singularity of a black hole. Before something reaches the singularity it is torn apart by
intense gravitational forces. Even the atoms themselves are torn apart by the gravitational
forces.”

The generated summary of 3 sentences is given as follows:

“According to the General Theory of Relativity the Singularity is a point of infinite space
time curvature. Everything that falls into a black hole by passing the event horizon,
including light, will eventually reach the singularity of a black hole. Before something
reaches the singularity it is torn apart by intense gravitational forces.”

Page 33
Auto Summarization Tool 201
0

8.1 SCREEN-SHOTS:

Figure 8.1
Splash Screen appears before the GUI appears. This Screen is visible for just 5 seconds.

Page 34
Auto Summarization Tool 201
0

Figure 8.2
This is the main GUI of the software where all the working takes place.

Page 35
Auto Summarization Tool 201
0

8.3 REFERENCES:

• http://www.indiastudychannel.com/resources/12455-Development-an-auto-
summarization-tool.aspx
• http://www.tgmc-projects.com
• http://www.sourcecodeonline.com

Page 36

Vous aimerez peut-être aussi