Business Intelligence

BIA
UNIT 1 Business Intelligence And Business Decisions

Decision Support Systems
THEINTACTFRONT17 APR 2018 4 COMMENTS
A decision support system (DSS) is a computerized information system used to support decision-
making in an organization or a business. A DSS lets users sift through and analyze massive reams of
data and compile information that can be used to solve problems and make better decisions.
The benefits of decision support systems include more informed decision-making, timely problem
solving and improved efficiency for dealing with problems with rapidly changing variables.
A DSS can be used by operations management and planning levels in an organization to compile
information and data and synthesize it into actionable intelligence. This allows the end user to make
more informed decisions at a quicker pace.
What Can a DSS Analyze?
The DSS is an information application that produces comprehensive information. This is different
from an operations application, which would be used to collect the data in the first place. A DSS is
primarily used by mid- to upper-level management, and it is key for understanding large amounts of
data.
For example, a DSS could be used to project a company’s revenue over the upcoming six months
based on new assumptions about product sales. Due to the large amount of variables that surround
the projected revenue figures, this is not a straightforward calculation that can be done by hand. A
DSS can integrate multiple variables and generate an outcome and alternate outcomes, all based on
the company’s past product sales data and current variables.
How Can a DSS Present the Information?
The primary purpose of using a DSS is to present information to the customer in a way that is easy to
understand. A DSS system is beneficial because it can be programed to generate many types of
reports, all based on user specifications. A DSS can generate information and output it graphically,
such as a bar chart that represents projected revenue, or as a written report.
Where Can a DSS Be Used?
As technology continues to advance, data analysis is no longer limited to large bulky mainframes.
Since a DSS is essentially an application, it can be loaded on most computer systems, including
laptops. Certain DSS applications are also available through mobile devices. The flexibility of the
DSS is extremely beneficial for customers who travel frequently. This gives them the opportunity to
be well-informed at all times, which in turn provides them with the ability to make the best decisions
for their company and customers at any time.
1
BIA
Attributes of a DSS
 Adaptability and flexibility
 High level of Interactivity
 Ease of use
 Efficiency and effectiveness
 Complete control by decision-makers
 Ease of development
 Extendibility
 Support for modeling and analysis
 Support for data access
 Standalone, integrated, and Web-based
Characteristics of a DSS
 Support for decision-makers in semi-structured and unstructured problems.
 Support for managers at various managerial levels, ranging from top executive to
line managers.
 Support for individuals and groups. Less structured problems often requires the
involvement of several individuals from different departments and organization level.
 Support for interdependent or sequential decisions.
 Support for intelligence, design, choice, and implementation.
 Support for variety of decision processes and styles.
 DSSs are adaptive over time.
Benefits of DSS
 Improves efficiency and speed of decision-making activities.
2
BIA
 Increases the control, competitiveness and capability of futuristic decision-making of
the organization.
 Facilitates interpersonal communication.
 Encourages learning or training.
 Since it is mostly used in non-programmed decisions, it reveals new approaches and

sets up new evidences for an unusual decision.
 Helps automate managerial processes.
Components of a DSS
 Database Management System (DBMS): To solve a problem the necessary data

may come from internal or external database. In an organization, internal data are generated by a
system such as TPS and MIS. External data come from a variety of sources such as newspapers,
online data services, databases (financial, marketing, human resources).
 Model Management System: It stores and accesses models that managers use to
make decisions. Such models are used for designing manufacturing facility, analyzing the financial
health of an organization, forecasting demand of a product or service, etc.
Support Tools: Support tools like online help; pulls down menus, user interfaces, graphical analysis,
error correction mechanism, facilitates the user interactions with the system.
Classification of DSS
There are several ways to classify DSS. Hoi Apple and Whinstone classifies DSS as follows:
 Text Oriented DSS:It contains textually represented information that could have a
bearing on decision. It allows documents to be electronically created, revised and viewed as needed.
 Database Oriented DSS: Database plays a major role here; it contains organized
and highly structured data.
 Spreadsheet Oriented DSS: It contains information in spread sheets that allows

create, view, modify procedural knowledge and also instructs the system to execute self-contained
instructions. The most popular tool is Excel and Lotus 1-2-3.
 Solver Oriented DSS: It is based on a solver, which is an algorithm or procedure

written for performing certain calculations and particular program type.
 Rules Oriented DSS: It follows certain procedures adopted as rules.
 Rules Oriented DSS: Procedures are adopted in rules oriented DSS. Export system
is the example.
3
BIA
 Compound DSS: It is built by using two or more of the five structures explained
above.
Types of DSS
 Status Inquiry System: It helps in taking operational, management level, or middle

level management decisions, for example daily schedules of jobs to machines or machines to
operators.
 Data Analysis System: It needs comparative analysis and makes use of formula or
an algorithm, for example cash flow analysis, inventory analysis etc.
 Information Analysis System: In this system data is analyzed and the information
report is generated. For example, sales analysis, accounts receivable systems, market analysis etc.
 Accounting System: It keeps track of accounting and finance related information,

for example, final account, accounts receivables, accounts payables, etc. that keep track of the major
aspects of the business.
 Model Based System: Simulation models or optimization models used for decision-
making are used infrequently and creates general guidelines for operation or management.
Group Decision Support and

Groupware Technologies
THEINTACTFRONT17 APR 2018 1 COMMENT
Globalization has not only expanded the product markets. It has also made organizations
geographically more dispersed. Therefore, the way the business is done and decisions are
made has also changed significantly. Collaborative decision-making has become more
valuable than ever.
This is why there is an increased emphasis on developing and implementing

communications-driven group decision support systems. Decision making, in the current
business environment, is a collaborative process with participation from in-house and
remotely located teams or temporary work groups or task forces. In such a
scenario, communications-driven group DSS makes it easier for every participant to
send and receive communication and interact with others in real time, from their
respective locations, without meeting physically.
A communications-driven group DSS
 Fosters collaboration between cross functional business teams at same or

different locations
 Allows geographically separated decision makers connect face-to-face in

real time
4
BIA
 Allows data sharing with rest of the team members, work groups or task
forces
What is Communications-Driven Group Decision Support System ?
Now that we know how a communications-driven group DSS can support decision-making
among geographically dispersed teams using web-based tools, it’s time to understand what
exactly it is.
A communications-driven group decision support system:
 Is a type of hybrid computer-based interactive decision support system
 That uses communications and network technologies
 To facilitate communication, resource/information sharing, face-to-face

meetings and collaboration
 Among a group of decision makers that are separated by a distance
Group Support Tools
There are a number of tools and technologies that can be incorporated in a GDSS (Group
Decision Support System), in order to promote better decision making. These include:
 Groupware: A software system to enhance collaboration among

participants/ decision makers and support group/s in completing tasks.
 Multimedia Decision Support: An integration of computer, video and

decision-support technologies, facilitating information sharing, group decision tasks,
collaboration and coordination. It offers a smart decision support in which decisions are
directly affected by the way decision makers interact, review information, make choices and
take actions.
 Electronic Meeting System: A software system to facilitate creative

problem solving and decision making using electronic technologies.
 Collaborative Workgroup Software: A web-based team collaboration and

project management software facilitating group tasks and live discussions for better
decision making.
5
BIA
Group Decision Support Situations
A group decision support system fosters collaboration and team decision-making in four
different situations:
 Same time, same place
 Same time, different place
 Different time, same place
 Different time, different place
Same Time, Same Place
In this situation, all decision makers are available at same time at same place. The
information is displayed on either computer projection system or on individual computers of
participants.
Same Time, Different Place
In this situation, individuals participate in decision-making from geographically different

locations at the same time. A GDSS
 Allows people to see what others at different locations are doing
 Offers video conferencing facilities where participants can see and hear each
other in real time
6
BIA
 Offers support for meeting or interactions via two-way video
 Offers additional facilities, such as screen sharing, chat, audio, white boards
Different Time, Same Place
In this situation, GDSS fosters communication for those who work at same place but have
different shift timings. It offers numerous facilities, including:
 Document sharing
 Workstation software for shift work
 Email
Different Time, Different Place
It’s important to understand how GDSS work in different time and different place situations.
It is a situation where participants are geographically distant and also operate in a different
time zone. It fosters communication, collaboration and team decision making through:
 Conferencing
 Bulletin board
 Voice mail
 Email
A GDSS supports communication and collaboration in all the above situations.
A Managerial Perspective on Group Decision Support
A communications-driven group decision support system is implemented, so that it can

support all activities of a work team or task force, irrespective of the locations and time
zones of participants.
The major concern of investors/users at the time of deciding whether to develop a decision
support system or not must be:
 The type of support a proposed technology can offer
 The extent of support a GDSS will offer
 The technologies it must support to ensure smooth functioning
7
BIA
 The selection of the best technology or system in a given decision-making
situation
Therefore, the managers must ask themselves following questions, in order to attain more
clarity:
 Should there be an audio conferencing facility? If yes, how many people

should be able to participate in a conference at a given time?
 Will participants be using the technology, like bulletin boards?
 What will be the alternative for web conferencing when participants are at
different locations and in different time zones?
 How frequent will be resource sharing and how participants will access
information and to what extent?
 Do you wish to integrate emailing with the GDSS?
 How can video conferencing be made comfortable for participants?
A lot of thought and planning go into designing and development of a communications-

driven group decision support system.
Contingency Theory
A communications-driven GDSS addresses problems associated with group collaboration,
communication and decision making, when participants are geographically dispersed and
operate in different time zones.
This means the effectiveness of a GDSS directly depends upon its design, user-interface,
DSS architecture, integrated support tools and technical skills possessed by participants
who use DSS.
Although managers know that the set of tools that they have chosen for a GDSS are good,
but they may not perform equally good in all circumstances. There is no one best way of
making decisions or supporting group collaboration. A tool or process may work well in
some situations and may terribly fail in others.
In such a scenario, the managers must resort to a contingency approach that focuses on
three main points:
 Task Type: The deciding factors include idea generation, creativity, planning,
choosing alternatives and action. For example, computer mediated communication is a
good fit for idea generation activities, and video and audio conferencing is a good choice
when decision-making is a function of human intellect.
8
BIA
 Group Size: bigger the size, higher the difference between technical
abilities, likes and interests, preferences and judgments. Small groups may not require
extensive support or communication tools while large groups require more sophisticated
and automated tools.
 Group Proximity: More sophisticated communications-driven GDSS is

required when the group of decision makers is dispersed and operates in different time
zones, while a simpler system is sufficient for a group operating from the same place and at
same time.
A contingency approach depends on task structure, location of team members and

difference in organizational attributes.
Virtual Organizations
A virtual organization is an association of physically and/or professionally detached
individuals working together on a project or to achieve a mission. It doesn’t have any
physical existence but the technology (internet technology, more precisely) makes it look
real.
Communications-driven group decision support systems are best suited for virtual
organizations that require a lot of technological support to foster communication and
collaboration and get the work done.
A GDSS makes a virtual organization:
 Look real
 Work in real time
 Establish innovative relationships among task forces
 Establish professional alliances among participants
A communications-driven GDSS for a virtual organization makes use of various knowledge

management technologies, including:
 Personal computers
 Intranet and extranet
 Wireless technologies
 Collaborative technologies
 Web conferencing
9
BIA
 Groupware
 Worldwide Web
Benefits of Communications-Driven GDSS
 Allows group members contribute significantly in decision making

irrespective of their locations and time zones
 Extracts greater participation from team members, given the availability of

support technologies
 Makes document sharing easier, faster and more secured
 Fosters more concentrated and focused decision-making
 Saves a lot of money and time by allowing participants to contribute from

their own locations (users don’t need to spend time and money in traveling)
 Helps completing tasks fasters
 Reduces the chances of forgetfulness by offering facilities like bulletin boards

and whiteboards
 Encourages input of ideas because of its simplicity of use
 Increases information sharing, which ultimately speeds knowledge capturing

and enhances productivity
 Makes results available easily and immediately
 Makes it easier to understand by displaying information in the form of

graphics
 Gives more structure to virtual operations and decision-making
Evaluating Communication and Group Support Tools

Not all group communication and support tools may suit your requirements. In order to
choose the right group support tools for your communication-driven decision support
system, it’s essential to consider these factors:
 Scalability: A tool’s ability to support the needs of all anticipated users is

known as scalability. Plus, it should be easily integrated with existing hardware and
software applications.
10
BIA
 Reliability: A group support tool must be able to perform necessary tasks
without failing. Though decision makers use different technologies at different times in
different situations, but the reliability of a support tool should be evaluated before
integrating it with the system.
 Ease of Installation and Use: A support tool must be easy to install and
use. An ideal tool is the one that requires minimal or no formal training for its users. The
decision makers may consult DSS experts to integrate group support tools that are easy to
use.
 Versatility: Versatility of a support tool plays a crucial role. As different DSS

users prefer different platforms, it must be compatible across all platforms. In addition, it
must allow easy customization of features and capabilities.
 Security: As a GDSS fosters resource sharing, a support tool must ensure

security of data transfer by executing it across firewalls.
 Cost: Given the significant expenditure on a GDSS, a support tool must be

affordable enough, so that it doesn’t add much to the basic cost of developing and
implementing a DSS.
It’s important to select the right communication and support tools to promote good decision
making by a team that is physically dispersed. Moreover, a GDSS must be carefully aligned
to the structure of an organization, in order to get the best results.
Groupware Technologies
Groupware is a class of computer programs that enables individuals to collaborate on
projects with a common goal from geographically dispersed locations through shared
Internet interfaces as a means to communicate within the group.
Groupware may also include remote access storage systems to archive frequently used
data files. These can be altered, accessed and retrieved by workgroup members.
Groupware is also known as collaborative software.
The first commercial groupware products emerged in early 1990s when international giants
such as IBM and Boeing began using electronic meeting systems for their internal projects.
Further, Lotus Notes appeared as a major product of this category, further enhancing
remote group collaborations.
Groupware systems are classified based on functions, specifically:
 Computer mediated communication supporting direct participant

communication
11
BIA
 Meeting and decision support systems capturing the common understanding
of participants
 Shared applications
 Artifacts supporting the interaction of participants through shared work

objects
Groupware is either synchronous or asynchronous in nature. Synchronous groupware is a

class of applications that allows a group of individuals who are physically separated to
interact with each other using shared computational objects in real time. The fundamental
requirement of synchronous groupware is real-time coordination among clients. The user
interfaces advocate a feeling of togetherness. They require shared audio channels for
communication.
Asynchronous groupware uses email, structured messages, agents, workflow, computer

conferencing agents, file sharing systems and collaborative writing systems, among others.
Asynchronous collaborations between users are well maintained only if they are allowed to
perform their contributions without any restrictions. This can be accomplished through
replicated data management systems with read any or write any data access. Users can
execute concurrent updates.
The extensive use of groupware on the Internet helped contribute to the development of
Web 2.0, which uses instant messaging, Web conferencing, group calendars, document
sharing, etc.
Expert Systems
What are Expert Systems?
The expert systems are the computer applications developed to solve complex problems in a
particular domain, at the level of extra-ordinary human intelligence and expertise.
Characteristics of Expert Systems
 High performance
 Understandable
12
BIA
 Reliable
 Highly responsive
Capabilities of Expert Systems
 Advising
 Instructing and assisting human in decision making
 Demonstrating
 Deriving a solution
 Diagnosing
 Explaining
 Interpreting input
 Predicting results
 Justifying the conclusion
 Suggesting alternative options to a problem
They are incapable of
 Substituting human decision makers
 Possessing human capabilities
 Producing accurate output for inadequate knowledge base
 Refining their own knowledge
Components of Expert Systems
 Knowledge Base
 Inference Engine
 User Interface
Let us see them one by one briefly
13
BIA
Knowledge Base
It contains domain-specific and high-quality knowledge. Knowledge is required to exhibit
intelligence. The success of any ES majorly depends upon the collection of highly accurate and
precise knowledge.
What is Knowledge?
The data is collection of facts. The information is organized as data and facts about the task
domain. Data, information, and past experience combined together are termed as knowledge.
Components of Knowledge Base
The knowledge base of an ES is a store of both, factual and heuristic knowledge.
 Factual Knowledge: It is the information widely accepted by the Knowledge

Engineers and scholars in the task domain.
 Heuristic Knowledge: It is about practice, accurate judgement, one’s ability of

evaluation, and guessing.
Knowledge representation
It is the method used to organize and formalize the knowledge in the knowledge base. It is in the
form of IF-THEN-ELSE rules.
Knowledge Acquisition
The success of any expert system majorly depends on the quality, completeness, and accuracy of the
information stored in the knowledge base.
The knowledge base is formed by readings from various experts, scholars, and the Knowledge
Engineers. The knowledge engineer is a person with the qualities of empathy, quick learning, and
case analyzing skills.
He acquires information from subject expert by recording, interviewing, and observing him at work,
etc. He then categorizes and organizes the information in a meaningful way, in the form of IF-
THEN-ELSE rules, to be used by interference machine. The knowledge engineer also monitors the
development of the ES.
Inference Engine
Use of efficient procedures and rules by the Inference Engine is essential in deducting a correct,
flawless solution.
14
BIA
In case of knowledge-based ES, the Inference Engine acquires and manipulates the knowledge from
the knowledge base to arrive at a particular solution.
In case of rule based ES
 Applies rules repeatedly to the facts, which are obtained from earlier rule
application.
 Adds new knowledge into the knowledge base if required.
 Resolves rules conflict when multiple rules are applicable to a particular case.
To recommend a solution, the Inference Engine uses the following strategies −
 Forward Chaining
 Backward Chaining
Forward Chaining
It is a strategy of an expert system to answer the question, “What can happen next?”
Here, the Inference Engine follows the chain of conditions and derivations and finally deduces the
outcome. It considers all the facts and rules, and sorts them before concluding to a solution.
This strategy is followed for working on conclusion, result, or effect. For example, prediction of
share market status as an effect of changes in interest rates.
Backward Chaining
With this strategy, an expert system finds out the answer to the question, “Why this happened?”
On the basis of what has already happened, the Inference Engine tries to find out which conditions
could have happened in the past for this result. This strategy is followed for finding out cause or
reason. For example, diagnosis of blood cancer in humans.
15
BIA
User Interface
User interface provides interaction between user of the ES and the ES itself. It is generally Natural
Language Processing so as to be used by the user who is well-versed in the task domain. The user of
the ES need not be necessarily an expert in Artificial Intelligence.
It explains how the ES has arrived at a particular recommendation. The explanation may appear in
the following forms −
 Natural language displayed on screen.
 Verbal narrations in natural language.
 Listing of rule numbers displayed on the screen.
The user interface makes it easy to trace the credibility of the deductions.
Requirements of Efficient ES User Interface
 It should help users to accomplish their goals in shortest possible way.
 It should be designed to work for user’s existing or desired work practices.
 Its technology should be adaptable to user’s requirements; not the other way round.
 It should make efficient use of user input.
Expert Systems Limitations
No technology can offer easy and complete solution. Large systems are costly, require significant
development time, and computer resources. ESs have their limitations which include −
16
BIA
 Limitations of the technology
 Difficult knowledge acquisition
 ES are difficult to maintain
 High development costs
Applications of Expert System
The following table shows where ES can be applied.
Application Description
Design Domain Camera lens design, automobile design.
Diagnosis Systems to deduce cause of disease

Medical Domain from observed data, conduction medical
operations on humans.
Comparing data continuously with observed

Monitoring Systems system or with prescribed behavior such as
leakage monitoring in long petroleum pipeline.
Process Control Controlling a physical process based on

Systems monitoring.
Knowledge Domain Finding out faults in vehicles, computers.
Detection of possible fraud, suspicious

Finance/Commerce transactions, stock market trading, Airline
scheduling, cargo scheduling.
Expert System Technology
There are several levels of ES technologies available. Expert systems technologies include −
 Expert System Development Environment− The ES development environment

includes hardware and tools. They are −
17
BIA
o Workstations, minicomputers, mainframes.
o High level Symbolic Programming Languages such as LISt Programming

(LISP) and PROgrammation en LOGique (PROLOG).
o Large databases.
 Tools− They reduce the effort and cost involved in developing an expert system to
large extent.
o Powerful editors and debugging tools with multi-windows.
o They provide rapid prototyping
o Have Inbuilt definitions of model, knowledge representation, and inference

design.
 Shells− A shell is nothing but an expert system without knowledge base. A shell
provides the developers with knowledge acquisition, inference engine, user interface, and
explanation facility. For example, few shells are given below −
o Java Expert System Shell (JESS) that provides fully developed Java API for
creating an expert system.
o Vidwan, a shell developed at the National Centre for Software Technology,

Mumbai in 1993. It enables knowledge encoding in the form of IF-THEN rules.
Development of Expert Systems: General Steps
The process of ES development is iterative. Steps in developing the ES include −
Identify Problem Domain
 The problem must be suitable for an expert system to solve it.
 Find the experts in task domain for the ES project.
 Establish cost-effectiveness of the system.
Design the System
 Identify the ES Technology
 Know and establish the degree of integration with the other systems and databases.
 Realize how the concepts can represent the domain knowledge best.
18
BIA
Develop the Prototype
From Knowledge Base: The knowledge engineer works to −
 Acquire domain knowledge from the expert.
 Represent it in the form of If-THEN-ELSE rules.
Test and Refine the Prototype
 The knowledge engineer uses sample cases to test the prototype for any deficiencies
in performance.
 End users test the prototypes of the ES.
Develop and Complete the ES
 Test and ensure the interaction of the ES with all elements of its environment,
including end users, databases, and other information systems.
 Document the ES project well.
 Train the user to use ES.
Maintain the ES
 Keep the knowledge base up-to-date by regular review and update.
 Cater for new interfaces with other information systems, as those systems evolve.
Benefits of Expert Systems
 Availability− They are easily available due to mass production of software.
 Less Production Cost− Production cost is reasonable. This makes them affordable.
 Speed− They offer great speed. They reduce the amount of work an individual puts
in.
 Less Error Rate− Error rate is low as compared to human errors.
 Reducing Risk− They can work in the environment dangerous to humans.
 Steady response− They work steadily without getting motional, tensed or fatigued.
19
BIA
UNIT 2 Data Warehousing

Introduction to SQL Server – it’s features
SQL is a database computer language designed for the retrieval and management of data in a
relational database. SQL stands for Structured Query Language.
SQL is a language to operate databases; it includes database creation, deletion, fetching rows,
modifying rows, etc. SQL is an ANSI (American National Standards Institute) standard language,
but there are many different versions of the SQL language.
What is SQL?
SQL is Structured Query Language, which is a computer language for storing, manipulating and
retrieving data stored in a relational database.
20
BIA
SQL is the standard language for Relational Database System. All the Relational Database
Management Systems (RDMS) like MySQL, MS Access, Oracle, Sybase, Informix, Postgres and
SQL Server use SQL as their standard database language.
Also, they are using different dialects, such as −
 MS SQL Server using T-SQL,
 Oracle using PL/SQL,
 MS Access version of SQL is called JET SQL (native format) etc.
Why SQL?
SQL is widely popular because it offers the following advantages −
 Allows users to access data in the relational database management systems.
 Allows users to describe the data.
 Allows users to define the data in a database and manipulate that data.
 Allows to embed within other languages using SQL modules, libraries & pre-
compilers.
 Allows users to create and drop databases and tables.
 Allows users to create view, stored procedure, functions in a database.
 Allows users to set permissions on tables, procedures and views.
SQL Process
When you are executing an SQL command for any RDBMS, the system determines the best way to
carry out your request and SQL engine figures out how to interpret the task.
There are various components included in this process.
These components are −
 Query Dispatcher
 Optimization Engines
21
BIA
 Classic Query Engine
 SQL Query Engine, etc.
A classic query engine handles all the non-SQL queries, but a SQL query engine won’t handle
logical files.
Following is a simple diagram showing the SQL Architecture −
Features of SQL
 High Performance.
 High Availability.
 Scalability and Flexibility Run anything.
 Robust Transactional Support.
 Web and Data Warehouse Strengths.
 Strong Data Protection.
 Comprehensive Application Development.
 Management Ease.
 Open Source Freedom and 24 x 7 Support.
22
BIA
 Lowest Total Cost of Ownership.
System Databases
SQL Server mainly contains four System Databases (master,model,msdb,tempdb). Each of

them is used by SQL Server for Separate purposes. From all the databases, master database is the
most important database.
(i) Master Database
Master Database contains information about SQL server configuration. Without Master database,
server can’t be started. This will store the metadata information about all other
objects(Databases,Stored Procedure,Tables,Views,etc.) which is Created in the SQL Server .
It will contain login information of users.
If the master database gets corrupted and is not recoverable from the backup, then a user has to again
rebuild the master database. Therefore, it is always recommended to maintain a current backup of the
master database. As everything crucial to SQL server is stored in the master database, it cannot be
deleted as it is the heart of SQL SERVER.
(ii) Model Database
The model database sets a template for every database that was newly created . It serves as a
template for the SQL server in order to create a new database. When we create a new database, the
23
BIA
data present in model database are moved to new database to create its default objects which include
tables, stored procedures, etc. Primarily, the requirement of model database is not specific to creation
of new database only. Whenever the SQL server starts, the Tempdb is created by using model
database in the form of a template. By default it does not contain any data.
(iii) Msdb
The msdb database is used mainly by the SQL server Management Studio, SQL Server Agent to
store system activities like sql server jobs, mail, service broker, maintenance plans, user and system
database backup history, Replication information, log shipping .We need to take a backup of this
database for the proper function of SQL Server Agent Service.
(iv) TempDB
From the name of the database itself, we can identify the purpose of this database. It can be accessed
by all the users in the SQL Server Instance.
The tempdb is a temporary location for storing temporary tables(Global and Local) and temporary
stored procedure that hold intermediate results during the sorting or query processing and cursors.
If more temporary objects are created and used storage of tempDB then performance of SQL Server
will affect.So recommened to move the temdb to the location where sufficient amount of space is
there.
This Database will be created by SQL Server instance when the SQL Server service starts. This
database is created using model database.We cannot take a backup of temp Database.
MySQL
MySQL is an open source SQL database, which is developed by a Swedish company – MySQL AB.
MySQL is pronounced as “my ess-que-ell,” in contrast with SQL, pronounced “sequel.”
MySQL is supporting many different platforms including Microsoft Windows, the major Linux
distributions, UNIX, and Mac OS X.
MySQL has free and paid versions, depending on its usage (non-commercial/commercial) and
features. MySQL comes with a very fast, multi-threaded, multi-user and robust SQL database server.
History
 Development of MySQL by Michael Widenius & David Axmark beginning in 1994.
24
BIA
 First internal release on 23rdMay 1995.
 Windows Version was released on the 8thJanuary 1998 for Windows 95 and NT.
 Version 3.23: beta from June 2000, production release January 2001.
 Version 4.0: beta from August 2002, production release March 2003 (unions).
 Version 4.01: beta from August 2003, Jyoti adopts MySQL for database tracking.
 Version 4.1: beta from June 2004, production release October 2004.
 Version 5.0: beta from March 2005, production release October 2005.
 Sun Microsystems acquired MySQL AB on the 26thFebruary 2008.
 Version 5.1: production release 27thNovember 2008.
Features
 High Performance.
 High Availability.
 Scalability and Flexibility Run anything.
 Robust Transactional Support.
 Web and Data Warehouse Strengths.
 Strong Data Protection.
 Comprehensive Application Development.
 Management Ease.
 Open Source Freedom and 24 x 7 Support.
 Lowest Total Cost of Ownership.
MS SQL Server
MS SQL Server is a Relational Database Management System developed by Microsoft Inc. Its
primary query languages are −
25
BIA
 T-SQL
 ANSI SQL
History
 1987 – Sybase releases SQL Server for UNIX.
 1988 – Microsoft, Sybase, and Aston-Tate port SQL Server to OS/2.
 1989 – Microsoft, Sybase, and Aston-Tate release SQL Server 1.0 for OS/2.
 1990 – SQL Server 1.1 is released with support for Windows 3.0 clients.
 Aston – Tate drops out of SQL Server development.
 2000 – Microsoft releases SQL Server 2000.
 2001 – Microsoft releases XML for SQL Server Web Release 1 (download).
 2002 – Microsoft releases SQLXML 2.0 (renamed from XML for SQL Server).
 2002 – Microsoft releases SQLXML 3.0.
 2005 – Microsoft releases SQL Server 2005 on November 7th, 2005.
Features
 High Performance
 High Availability
 Database mirroring
 Database snapshots
 CLR integration
 Service Broker
 DDL triggers
 Ranking functions
 Row version-based isolation levels
 XML integration
26
BIA
 TRY…CATCH
 Database Mail
ORACLE
It is a very large multi-user based database management system. Oracle is a relational database
management system developed by ‘Oracle Corporation’.
Oracle works to efficiently manage its resources, a database of information among the multiple
clients requesting and sending data in the network.
It is an excellent database server choice for client/server computing. Oracle supports all major
operating systems for both clients and servers, including MSDOS, NetWare, UnixWare, OS/2 and
most UNIX flavors.
History
Oracle began in 1977 and celebrating its 32 wonderful years in the industry (from 1977 to 2009).
 1977 – Larry Ellison, Bob Miner and Ed Oates founded Software Development
Laboratories to undertake development work.
 1979 – Version 2.0 of Oracle was released and it became first commercial relational
database and first SQL database. The company changed its name to Relational Software Inc. (RSI).
 1981 – RSI started developing tools for Oracle.
 1982 – RSI was renamed to Oracle Corporation.
 1983 – Oracle released version 3.0, rewritten in C language and ran on multiple
platforms.
 1984 – Oracle version 4.0 was released. It contained features like concurrency
control – multi-version read consistency, etc.
 1985 – Oracle version 4.0 was released. It contained features like concurrency
control – multi-version read consistency, etc.
 2007 – Oracle released Oracle11g. The new version focused on better partitioning,
easy migration, etc.
Features
 Concurrency
27
BIA
 Read Consistency
 Locking Mechanisms
 Quiesce Database
 Portability
 Self-managing database
 SQL*Plus
 ASM
 Scheduler
 Resource Manager
 Data Warehousing
 Materialized views
 Bitmap indexes
 Table compression
 Parallel Execution
 Analytic SQL
 Data mining
 Partitioning
MS ACCESS
This is one of the most popular Microsoft products. Microsoft Access is an entry-level database
management software. MS Access database is not only inexpensive but also a powerful database for
small-scale projects.
MS Access uses the Jet database engine, which utilizes a specific SQL language dialect (sometimes
referred to as Jet SQL).
MS Access comes with the professional edition of MS Office package. MS Access has easyto-use
intuitive graphical interface.
28
BIA
 1992 – Access version 1.0 was released.
 1993 – Access 1.1 released to improve compatibility with inclusion the Access Basic
programming language.
 The most significant transition was from Access 97 to Access 2000.
 2007 – Access 2007, a new database format was introduced ACCDB which supports
complex data types such as multi valued and attachment fields.
Features
 Users can create tables, queries, forms and reports and connect them together with
macros.
 Option of importing and exporting the data to many formats including Excel,
Outlook, ASCII, dBase, Paradox, FoxPro, SQL Server, Oracle, ODBC, etc.
 There is also the Jet Database format (MDB or ACCDB in Access 2007), which can
contain the application and data in one file. This makes it very convenient to distribute the entire
application to another user, who can run it in disconnected environments.
 Microsoft Access offers parameterized queries. These queries and Access tables can
be referenced from other programs like VB6 and .NET through DAO or ADO.
 The desktop editions of Microsoft SQL Server can be used with Access as an
alternative to the Jet Database Engine.
 Microsoft Access is a file server-based database. Unlike the client-server relational

database management systems (RDBMS), Microsoft Access does not implement database triggers,
stored procedures or transaction logging.
Creating Databases and Tables

The SQL CREATE DATABASE statement is used to create a new SQL database.
Syntax
The basic syntax of this CREATE DATABASE statement is as follows −
29
BIA
CREATE DATABASE DatabaseName;
Always the database name should be unique within the RDBMS.
Example
If you want to create a new database <testDB>, then the CREATE DATABASE statement would be
as shown below −
SQL> CREATE DATABASE testDB;
Make sure you have the admin privilege before creating any database. Once a database is created,
you can check it in the list of databases as follows −
SQL> SHOW DATABASES;
+——————–+
| Database |
+——————–+
| information_schema |
| AMROOD |
| TUTORIALSPOINT |
| mysql |
| orig |
| test |
| testDB |
+——————–+
7 rows in set (0.00 sec)
CREATING TABLES
30
BIA
Creating a basic table involves naming the table and defining its columns and each column’s data
type.
The SQL CREATE TABLE statement is used to create a new table.
Syntax
The basic syntax of the CREATE TABLE statement is as follows −
CREATE TABLE table_name( column1 datatype, column2 datatype, column3 datatype, …..
columnN datatype, PRIMARY KEY( one or more columns ));
CREATE TABLE is the keyword telling the database system what you want to do. In this case, you
want to create a new table. The unique name or identifier for the table follows the CREATE TABLE
statement.
Then in brackets comes the list defining each column in the table and what sort of data type it is. The
syntax becomes clearer with the following example.
A copy of an existing table can be created using a combination of the CREATE TABLE statement
and the SELECT statement. You can check the complete details at Create Table Using another Table.
Example
The following code block is an example, which creates a CUSTOMERS table with an ID as a
primary key and NOT NULL are the constraints showing that these fields cannot be NULL while
creating records in this table −
SQL> CREATE TABLE CUSTOMERS( ID INT NOT NULL, NAME VARCHAR (20) NOT
NULL, AGE INT NOT NULL, ADDRESS CHAR (25) , SALARY DECIMAL (18, 2),
PRIMARY KEY (ID));
You can verify if your table has been created successfully by looking at the message displayed by the
SQL server, otherwise you can use the DESC command as follows −
SQL> DESC CUSTOMERS;+———+—————+——+—–+———+——-+| Field | Type

| Null | Key | Default | Extra |+———+—————+——+—–+———+——-+| ID | int(11) |
NO | PRI | | || NAME | varchar(20) | NO | | | || AGE | int(11) | NO |
| | || ADDRESS | char(25) | YES | | NULL | || SALARY | decimal(18,2) | YES
| | NULL | |+———+—————+——+—–+———+——-+5 rows in set (0.00 sec)
Now, you have CUSTOMERS table available in your database which you can use to store the
required information related to customers.
Constraints
31
BIA
Constraints are the rules enforced on the data columns of a table. These are used to limit the type of
data that can go into a table. This ensures the accuracy and reliability of the data in the database.
Constraints could be either on a column level or a table level. The column level constraints are
applied only to one column, whereas the table level constraints are applied to the whole table.
Following are some of the most commonly used constraints available in SQL. These constraints
have already been discussed in SQL – RDBMS Conceptschapter, but it’s worth to revise them at this
point.
 NOT NULL Constraint− Ensures that a column cannot have NULL value.
 DEFAULT Constraint− Provides a default value for a column when none is

specified.
 UNIQUE Constraint− Ensures that all values in a column are different.
 PRIMARY Key− Uniquely identifies each row/record in a database table.
 FOREIGN Key− Uniquely identifies a row/record in any of the given database table.
 CHECK Constraint− The CHECK constraint ensures that all the values in a column
satisfies certain conditions.
 INDEX− Used to create and retrieve data from the database very quickly.
Constraints can be specified when a table is created with the CREATE TABLE statement or you can
use the ALTER TABLE statement to create constraints even after the table is created.
Dropping Constraints
Any constraint that you have defined can be dropped using the ALTER TABLE command with the
DROP CONSTRAINT option.
For example, to drop the primary key constraint in the EMPLOYEES table, you can use the
following command.
ALTER TABLE EMPLOYEES DROP CONSTRAINT EMPLOYEES_PK;
Some implementations may provide shortcuts for dropping certain constraints. For example, to drop
the primary key constraint for a table in Oracle, you can use the following command.
32
BIA
ALTER TABLE EMPLOYEES DROP PRIMARY KEY;
Some implementations allow you to disable constraints. Instead of permanently dropping a

constraint from the database, you may want to temporarily disable the constraint and then enable it
later.
Integrity Constraints
Integrity constraints are used to ensure accuracy and consistency of the data in a relational database.
Data integrity is handled in a relational database through the concept of referential integrity.
There are many types of integrity constraints that play a role in Referential Integrity (RI). These
constraints include Primary Key, Foreign Key, Unique Constraints and other constraints which are
mentioned above.
Data Manipulation Language

A Data Manipulation Language (DML) is a family of computer languages including commands
permitting users to manipulate data in a database. This manipulation involves inserting data into
database tables, retrieving existing data, deleting data from existing tables and modifying existing
data. DML is mostly incorporated in SQL databases.
DML resembles simple English language and enhances efficient user interaction with the system.
The functional capability of DML is organized in manipulation commands like SELECT, UPDATE,
INSERT INTO and DELETE FROM, as described below:
 SELECT: This command is used to retrieve rows from a table. The syntax is
SELECT [column name(s)] from [table name] where [conditions]. SELECT is the most widely used
DML command in SQL.
 UPDATE: This command modifies data of one or more records. An update

command syntax is UPDATE [table name] SET [column name = value] where [condition]
 INSERT: This command adds one or more records to a database table. The insert
command syntax is INSERT INTO [table name] [column(s)] VALUES [value(s)].
 DELETE: This command removes one or more records from a table according to
specified conditions. Delete command syntax is DELETE FROM [table name] where [condition].
OLTP & OLAP

OLTP (On-line Transaction Processing) is characterized by a large number of short on-line

transactions (INSERT, UPDATE, DELETE). The main emphasis for OLTP systems is put on very
fast query processing, maintaining data integrity in multi-access environments and an effectiveness
33
BIA
measured by number of transactions per second. In OLTP database there is detailed and current data,
and schema used to store transactional databases is the entity model (usually 3NF).
OLAP (On-line Analytical Processing) is characterized by relatively low volume of transactions.

Queries are often very complex and involve aggregations. For OLAP systems a response time is an
effectiveness measure. OLAP applications are widely used by Data Mining techniques. In OLAP
database there is aggregated, historical data, stored in multi-dimensional schemas (usually star
schema).
The following table summarizes the major differences between OLTP and OLAP system design.
OLTP System OLAP System

Online Transaction Processing Online Analytical Processing
(Operational System) (Data Warehouse)
Operational data; OLTPs are the Consolidation data; OLAP data comes from
Source of data
original source of the data. the various OLTP Databases
Purpose of To control and run fundamental To help with planning, problem solving, and
data business tasks decision support
Reveals a snapshot of ongoing Multi-dimensional views of various kinds of

What the data
business processes business activities
Inserts and Short and fast inserts and updates Periodic long-running batch jobs refresh the
Updates initiated by end users data
Relatively standardized and simple

Often complex queries involving
Queries queries Returning relatively few
aggregations
records
Depends on the amount of data involved;

Processing batch data refreshes and complex queries
Typically very fast
Speed may take many hours; query speed can be
improved by creating indexes
Larger due to the existence of aggregation

Space Can be relatively small if historical
structures and history data; requires more
Requirements data is archived
indexes than OLTP
Database Typically de-normalized with fewer tables;

Highly normalized with many tables
Design use of star and/or snowflake schemas
Backup and Backup religiously; operational data is Instead of regular backups, some
Recovery critical to run the business, data loss is environments may consider simply
likely to entail significant monetary reloading the OLTP data as a recovery
loss and legal liability
34
BIA
method
Data Marts
A data mart is a repository of data that is designed to serve a particular community of knowledge
workers.
The difference between a data warehouse and a data mart can be confusing because the two terms
are sometimes used incorrectly as synonyms. A data warehouse is a central repository for all an
organization’s data. The goal of a data mart, however, is to meet the particular demands of a specific
group of users within the organization, such as human resource management (HRM). Generally, an
organization’s data marts are subsets of the organization’s data warehouse.
Because data marts are optimized to look at data in a unique way, the design process tends to start
with an analysis of user needs. In contrast, a data warehouse’s design process tends to start with an
analysis of what data already exists and how it can be collected and managed in such a way that it
can be used later on. A data warehouse tends to be a strategic but somewhat unfinished concept; a
data mart tends to be tactical and aimed at meeting an immediate need.
Today, data virtualization software can be used to create virtual data marts, pulling data from
disparate sources and combining it with other data as necessary to meet the needs of specific
business users. A virtual data mart provides knowledge workers with access to the data they need
while preventing data silos and giving the organization’s data management team a level of control
over the organization’s data throughout its lifecycle.
Data Warehousing and Architecture

Data warehousing is the process of constructing and using a data warehouse. A data warehouse is
constructed by integrating data from multiple heterogeneous sources that support analytical
reporting, structured and/or ad hoc queries, and decision making. Data warehousing involves data
cleaning, data integration, and data consolidations.
Using Data Warehouse Information

There are decision support technologies that help utilize the data available in a data warehouse.
These technologies help executives to use the warehouse quickly and effectively. They can gather
data, analyze it, and take decisions based on the information present in the warehouse. The
information gathered in a warehouse can be used in any of the following domains −
35
BIA
 Tuning Production Strategies− The product strategies can be well tuned by
repositioning the products and managing the product portfolios by comparing the sales quarterly or
yearly.
 Customer Analysis− Customer analysis is done by analyzing the customer’s buying

preferences, buying time, budget cycles, etc.
 Operations Analysis− Data warehousing also helps in customer relationship

management, and making environmental corrections. The information also allows us to analyze
business operations.
Integrating Heterogeneous Databases

To integrate heterogeneous databases, we have two approaches −
 Query-driven Approach
 Update-driven Approach
Query-Driven Approach
This is the traditional approach to integrate heterogeneous databases. This approach was used to
build wrappers and integrators on top of multiple heterogeneous databases. These integrators are also
known as mediators.
Process of Query-Driven Approach
 When a query is issued to a client side, a metadata dictionary translates the query
into an appropriate form for individual heterogeneous sites involved.
 Now these queries are mapped and sent to the local query processor.
 The results from heterogeneous sites are integrated into a global answer set.
Disadvantages
 Query-driven approach needs complex integration and filtering processes.
 This approach is very inefficient.
 It is very expensive for frequent queries.
 This approach is also very expensive for queries that require aggregations.
Update-Driven Approach
This is an alternative to the traditional approach. Today’s data warehouse systems follow update-
driven approach rather than the traditional approach discussed earlier. In update-driven approach, the
36
BIA
information from multiple heterogeneous sources are integrated in advance and are stored in a
warehouse. This information is available for direct querying and analysis.
Advantages
This approach has the following advantages −
 This approach provide high performance.
 The data is copied, processed, integrated, annotated, summarized and restructured in

semantic data store in advance.
 Query processing does not require an interface to process data at local sources.
Functions of Data Warehouse Tools and Utilities

The following are the functions of data warehouse tools and utilities −
 Data Extraction− Involves gathering data from multiple heterogeneous sources.
 Data Cleaning− Involves finding and correcting the errors in data.
 Data Transformation− Involves converting the data from legacy format to

warehouse format.
 Data Loading− Involves sorting, summarizing, consolidating, checking integrity,

and building indices and partitions.
 Refreshing− Involves updating from data sources to warehouse.
Note − Data cleaning and data transformation are important steps in improving the quality of data
and data mining results.
Data Warehousing Architecture
Business Analysis Framework

The business analyst get the information from the data warehouses to measure the performance and
make critical adjustments in order to win over other business holders in the market. Having a data
warehouse offers the following advantages −
 Since a data warehouse can gather information quickly and efficiently, it can
enhance business productivity.
 A data warehouse provides us a consistent view of customers and items, hence, it

helps us manage customer relationship.
37
BIA
 A data warehouse also helps in bringing down the costs by tracking trends, patterns
over a long period in a consistent and reliable manner.
To design an effective and efficient data warehouse, we need to understand and analyze the business
needs and construct a business analysis framework. Each person has different views regarding the
design of a data warehouse. These views are as follows −
 The top-down view− This view allows the selection of relevant information needed
for a data warehouse.
 The data source view− This view presents the information being captured, stored,
and managed by the operational system.
 The data warehouse view− This view includes the fact tables and dimension tables.
It represents the information stored inside the data warehouse.
 The business query view− It is the view of the data from the viewpoint of the end-
user.
Three-Tier Data Warehouse Architecture

Generally a data warehouses adopts a three-tier architecture. Following are the three tiers of the data
warehouse architecture.
 Bottom Tier− The bottom tier of the architecture is the data warehouse database
server. It is the relational database system. We use the back end tools and utilities to feed data into
the bottom tier. These back end tools and utilities perform the Extract, Clean, Load, and refresh
functions.
 Middle Tier− In the middle tier, we have the OLAP Server that can be implemented
in either of the following ways.
o By Relational OLAP (ROLAP), which is an extended relational database

management system. The ROLAP maps the operations on multidimensional data to standard
relational operations.
o By Multidimensional OLAP (MOLAP) model, which directly implements the

multidimensional data and operations.
 Top-Tier− This tier is the front-end client layer. This layer holds the query tools and
reporting tools, analysis tools and data mining tools.
The following diagram depicts the three-tier architecture of data warehouse −
38
BIA
Data Warehouse Models

From the perspective of data warehouse architecture, we have the following data warehouse models
−
 Virtual Warehouse
 Data mart
 Enterprise Warehouse
Virtual Warehouse
The view over an operational data warehouse is known as a virtual warehouse. It is easy to build a
virtual warehouse. Building a virtual warehouse requires excess capacity on operational database
servers.
Data Mart
Data mart contains a subset of organization-wide data. This subset of data is valuable to specific
groups of an organization.
In other words, we can claim that data marts contain data specific to a particular group. For example,
the marketing data mart may contain data related to items, customers, and sales. Data marts are
confined to subjects.
Points to remember about data marts −
 Window-based or Unix/Linux-based servers are used to implement data marts. They

are implemented on low-cost servers.
39
BIA
 The implementation data mart cycles is measured in short periods of time, i.e., in
weeks rather than months or years.
 The life cycle of a data mart may be complex in long run, if its planning and design
are not organization-wide.
 Data marts are small in size.
 Data marts are customized by department.
 The source of a data mart is departmentally structured data warehouse.
 Data mart are flexible.
Enterprise Warehouse
 An enterprise warehouse collects all the information and the subjects spanning an
entire organization
 It provides us enterprise-wide data integration.
 The data is integrated from operational systems and external information providers.
 This information can vary from a few gigabytes to hundreds of gigabytes, terabytes
or beyond.
Load Manager
This component performs the operations required to extract and load process.
The size and complexity of the load manager varies between specific solutions from one data
warehouse to other.
Load Manager Architecture

The load manager performs the following functions −
 Extract the data from source system.
 Fast Load the extracted data into temporary data store.
 Perform simple transformations into structure similar to the one in the data
warehouse.
Success Factors of Data Warehousing

40
BIA
1. Expectations are communicated to the users
IT is often unwilling or afraid to tell the users what they will be getting and when. Users should
be told about the following:
 Performance – what to expect for response time
 Availability – scheduled hours and days
 Function – the data that will be accessible and what pre-defined queries and reports
are available. The level of detail data, as well as how the data is integrated and aggregated
 Historical data
 The expectations of accuracy for both the cleanliness of the data and an
understanding of what the data means
 Timeliness – when the data will be available and how frequently the data is
loaded/updated/refreshed
 Schedule expectations involve when the system is due for delivery
2. User involvement is ensured
There are three levels of user involvement, as follows:
 Build it; they will use it
 Solicit requirements from the users
 Have the users involved all the way through the project
The last level is by far the most successful approach, while the first almost always results in
failure.
3. The project has a good sponsor
The best sponsor is from the business side, not from IT. Most importantly, the sponsor should be
in serious need of the data warehouse’s capabilities to solve a specific problem or gain some
advantage for his or her department.
4. The team has the right skill set
Without the right skills dedicated to the team, the project will fail. The emphasis is on “dedicated
to the team.”
41
BIA
5. The schedule is realistic
The most common cause of failure is an unrealistic schedule, usually imposed without the input
or the concurrence of the project manager or team members. Most often, the imposed schedules
have no rationale for specific dates, but are only means to “hold the project manager to a
schedule.” A realistic schedule will include all the required tasks to implement the project along
with their durations, assigned resources and task dependencies.
6. The right tools have been chosen
The first decisions to be made are the categories of tools: Extract/Transform/Load, data cleansing,
OLAP, ROLAP, data modeling, administration, and so on. The tools must match the requirements
of the organization, the users, and the project. The tools should work together without the need to
build interfaces or write special code.
7 Users are properly trained
In spite of what the vendors tell you, users must be trained and the training should be geared to
the level of user and the way they plan to use the data warehouse. All users must learn about the
data, and power users should have additional in-depth training on the data structures.
UNIT 3 Data Mining And Knowledge Discovery

Phases of Knowledge Discovery in DataBases (KDD)
Knowledge Discovery
Some people don’t differentiate data mining from knowledge discovery while others view data
mining as an essential step in the process of knowledge discovery. Here is the list of steps involved
in the knowledge discovery process −
 Data Cleaning− In this step, the noise and inconsistent data is removed.
 Data Integration− In this step, multiple data sources are combined.
 Data Selection− In this step, data relevant to the analysis task are retrieved from the
database.
42
BIA
 Data Transformation− In this step, data is transformed or consolidated into forms
appropriate for mining by performing summary or aggregation operations.
 Data Mining− In this step, intelligent methods are applied in order to extract data
patterns.
 Pattern Evaluation− In this step, data patterns are evaluated.
 Knowledge Presentation− In this step, knowledge is represented.
The following diagram shows the process of knowledge discovery −
Steps involved in the entire KDD process are:
1. Identify the goal of the KDD process from the customer’s perspective.
2. Understand application domains involved and the knowledge that’s required
3. Select a target data set or subset of data samples on which discovery is be

performed.
4. Cleanse and preprocess data by deciding strategies to handle missing fields and alter
the data as per the requirements.
5. Simplify the data sets by removing unwanted variables. Then, analyze useful
features that can be used to represent the data, depending on the goal or task.
6. Match KDD goals with data mining methods to suggest hidden patterns.
7. Choose data mining algorithms to discover hidden patterns. This process includes
deciding which models and parameters might be appropriate for the overall KDD process.
8. Search for patterns of interest in a particular representational form, which include

classification rules or trees, regression and clustering.
43
BIA
9. Interpret essential knowledge from the mined patterns.
10. Use the knowledge and incorporate it into another system for further action.
11. Document it and make reports for interested parties.
Data Mining Techniques

Data Mining is the process of extracting useful information and patterns from enormous data. Data
Mining includes collection, extraction, analysis and statistics of data. It is also known as Knowledge
discovery process, Knowledge Mining from Data or data/ pattern analysis. Data Mining is a logical
process of finding useful information to find out useful data. Once the information and patterns are
found it can be used to make decisions for developing the business. Data mining tools can give
answers to your various questions related to your business which was too difficult to resolve. They
also forecast the future trends which lets the business people to make proactive decisions.
Data mining involves three steps. They are
 Exploration– In this step the data is cleared and converted into another form. The
nature of data is also determined
 Pattern Identification– The next step is to choose the pattern which will make the
best prediction
 Deployment– The identified patterns are used to get the desired outcome.
Benefits of Data Mining
 Automated prediction of trends and behaviours
 It can be implemented on new systems as well as existing platforms
 It can analyze huge database in minutes
 Automated discovery of hidden patterns
 There are a lot of models available to understand complex data easily
 It is of high speed which makes it easy for the users to analyze huge amount of data
in less time
 It yields improved predictions
Data Mining Techniques
44
BIA
One of the most important task in Data Mining is to select the correct data mining technique. Data
Mining technique has to be chosen based on the type of business and the type of problem your
business faces. A generalized approach has to be used to improve the accuracy and cost effectiveness
of using data mining techniques. There are basically seven main Data Mining techniques which is
discussed in this article. There are also a lot of other Data Mining techniques but these seven are
considered more frequently used by business people.
 Statistics association rules
 Clustering neural networks
 Visualization classification
 Decision Tree
1. Statistical Techniques
Data mining techniques statistics is a branch of mathematics which relates to the collection and
description of data. Statistical technique is not considered as a data mining technique by many
analysts. But still it helps to discover the patterns and build predictive models. For this reason data
analyst should possess some knowledge about the different statistical techniques. In today’s world
people have to deal with large amount of data and derive important patterns from it. Statistics can
help you to a greater extent to get answers for questions about their data like
 What are the patterns in their database ?
 What is the probability of an event to occur ?
 Which patterns are more useful to the business ?
 What is the high level summary that can give you a detailed view of what is there in
the database ?
Statistics not only answers these questions they help in summarizing the data and count it. It also
helps in providing information about the data with ease. Through statistical reports people can take
smart decisions. There are different forms of statistics but the most important and useful technique is
the collection and counting of data. There are a lot of ways to collect data like
 Histogram
 Mean
 Median
 Mode
45
BIA
 Variance
 Max
 Min
 Linear Regression
2. Clustering Technique
Clustering is one among the oldest techniques used in Data Mining. Clustering analysis is the
process of identifying data that are similar to each other. This will help to understand the differences
and similarities between the data. This is sometimes called segmentation and helps the users to
understand what is going on within the database. For example, an insurance company can group its
customers based on their income, age, nature of policy and type of claims.
There are different types of clustering methods. They are as follows
 Partitioning Methods
 Hierarchical Agglomerative methods
 Density Based Methods
 Grid Based Methods
 Model Based Methods
The most popular clustering algorithm is Nearest Neighbour. Nearest neighbour technique is very
similar to clustering. It is a prediction technique where in order to predict what a estimated value is
in one record look for records with similar estimated values in historical database and use the
prediction value from the record which is near to the unclassified record. This technique simply
states that the objects which are closer to each other will have similar prediction values. Through this
method you can easily predict the values of nearest objects very easily. Nearest Neighbour is the
most easy to use technique because they work as per the thought of the people. They also work very
well in terms of automation. They perform complex ROI calculations with ease. The level of
accuracy in this technique is as good as the other Data Mining techniques.
In business Nearest Neighbour technique is most often used in the process of Text Retrieval. They
are used to find the documents that share the important characteristics with that main document that
have been marked as interesting.
3. Visualization
Visualization is the most useful technique which is used to discover data patterns. This technique is
used at the beginning of the Data Mining process. Many researches are going on these days to
46
BIA
produce interesting projection of databases, which is called Projection Pursuit. There are a lot of data
mining technique which will produce useful patterns for good data. But visualization is a technique
which converts Poor data into good data letting different kinds of Data Mining methods to be used in
discovering hidden patterns.
4. Induction Decision Tree Technique
A decision tree is a predictive model and the name itself implies that it looks like a tree. In this
technique, each branch of the tree is viewed as a classification question and the leaves of the trees
are considered as partitions of the dataset related to that particular classification. This technique can
be used for exploration analysis, data pre-processing and prediction work.
Decision tree can be considered as a segmentation of the original dataset where segmentation is done
for a particular reason. Each data that comes under a segment has some similarities in their
information being predicted. Decision trees provides results that can be easily understood by the
user.
Decision tree technique is mostly used by statisticians to find out which database is more related to
the problem of the business. Decision tree technique can be used for Prediction and Data pre-
processing.
The first and foremost step in this technique is growing the tree. The basic of growing the tree
depends on finding the best possible question to be asked at each branch of the tree. The decision
tree stops growing under any one of the below circumstances
 If the segment contains only one record
 All the records contain identical features
 The growth is not enough to make any further spilt
CART which stands for Classification and Regression Trees is a data exploration and prediction
algorithm which picks the questions in a more complex way. It tries them all and then selects one
best question which is used to split the data into two or more segments. After deciding on the
segments it again asks questions on each of the new segment individually.
Another popular decision tree technology is CHAID (Chi-Square Automatic Interaction Detector). It
is similar to CART but it differs in one way. CART helps in choosing the best questions whereas
CHAID helps in choosing the splits.
5. Neural Network
Neural Network is another important technique used by people these days. This technique is most
often used in the starting stages of the data mining technology. Artificial neural network was formed
out of the community of Artificial intelligence.
47
BIA
Neural networks are very easy to use as they are automated to a particular extent and because of this
the user is not expected to have much knowledge about the work or database. But to make the neural
network work efficiently you need to know
 How the nodes are connected ?
 How many processing units to be used ?
 When should the training process to be stopped ?
There are two main parts of this technique – the node and the link
 The node– which freely matches to the neuron in the human brain
 The link– which freely matches to the connections between the neurons in the
human brain
A neural network is a collection of interconnected neurons. which could form a single layer or
multiple layer. The formation of neurons and their interconnections are called architecture of the
network. There are a wide variety of neural network models and each model has its own advantages
and disadvantages. Every neural network model has different architectures and these architectures
use different learning procedures.
Neural networks are very strong predictive modelling technique. But it is not very easy to understand
even by experts. It creates very complex models which is impossible to understand fully. Thus to
understand the Neural network technique companies are finding out new solutions. Two solutions
have already been suggested
 First solution is Neural network is packaged up into a complete solution which will
let it to be used for a single application
 Second solution is it is bonded with expert consulting services
Neural network has been used in various kinds of applications. This has been used in the business to
detect frauds taking place in the business.
6. Association Rule Technique
This technique helps to find the association between two or more items. It helps to know the
relations between the different variables in databases. It discovers the hidden patterns in the data sets
which is used to identify the variables and the frequent occurrence of different variables that appear
with the highest frequencies.
Association rule offers two major information
 Support– Hoe often is the rule applied ?

48
BIA
 Confidence– How often the rule is correct ?
This technique follows a two step process
 Find all the frequently occurring data sets
 Create strong association rules from the frequent data sets
There are three types of association rule. They are
 Multilevel Association Rule
 Multidimensional Association Rule
 Quantitative Association Rule
This technique is most often used in retail industry to find patterns in sales. This will help increase
the conversion rate and thus increases profit.
7. Classification
Data mining techniques classification is the most commonly used data mining technique which
contains a set of pre classified samples to create a model which can classify the large set of data.
This technique helps in deriving important information about data and metadata (data about data).
This technique is closely related to cluster analysis technique and it uses decision tree or neural
network system. There are two main processes involved in this technique
 Learning– In this process the data are analyzed by classification algorithm
 Classification– In this process the data is used to measure the precision of the
classification rules
There are different types of classification models. They are as follows
 Classification by decision tree induction
 Bayesian Classification
 Neural Networks
 Support Vector Machines (SVM)
 Classification Based on Associations
One good example of classification technique is Email provider.
49
BIA
Market Basket Analysis

Market basket analysis (MBA) is an example of an analytics technique employed by retailers to
understand customer purchase behaviors. It is used to determine what items are frequently bought
together or placed in the same basket by customers. It uses this purchase information to leverage
effectiveness of sales and marketing. MBA looks for combinations of products that frequently occur
in purchases and has been prolifically used since the introduction of electronic point of sale systems
that have allowed the collection of immense amounts of data.
Market basket analysis only uses transactions with more than one item, as no associations can be
made with single purchases. Item association does not necessarily suggest a cause and effect, but
simply a measure of co-occurrence. It does not mean that since energy drinks and video games are
frequently bought together, one is the cause for the purchase of the other, but it can be construed
from the information that this purchase is most probably made by (or for) a gamer. Such rules or
hypothesis must be tested and should not be taken as truth unless item sales say otherwise.
There are two main types of MBA:
 Predictive MBA is used to classify cliques of item purchases, events and services
that largely occur in sequence.
 Differential MBA removes a high volume of insignificant results and can lead to
very in-depth results. It compares information between different stores, demographics, seasons of the
year, days of the week and other factors.
MBA is commonly used by online retailers to make purchase suggestions to consumers. For
example, when a person buys a particular model of smartphone, the retailer may suggest other
products such as phone cases, screen protectors, memory cards or other accessories for that
particular phone. This is due to the frequency with which other consumers bought these items in the
same transaction as the phone.
MBA is also used in physical retail locations. Due to the increasing sophistication of point of sale
systems coupled with big data analytics, stores are using purchase data and MBA to help improve
store layouts so that consumers can more easily find items that are frequently purchased together.
Application of Data Mining

Data Mining is widely used in diverse areas. There are a number of commercial data mining system
available today and yet there are many challenges in this field. In this tutorial, we will discuss the
applications and the trend of data mining.
Data Mining Applications
50
BIA
Here is the list of areas where data mining is widely used −
 Financial Data Analysis
 Retail Industry
 Telecommunication Industry
 Biological Data Analysis
 Other Scientific Applications
 Intrusion Detection
Financial Data Analysis

The financial data in banking and financial industry is generally reliable and of high quality which
facilitates systematic data analysis and data mining. Some of the typical cases are as follows −
 Design and construction of data warehouses for multidimensional data analysis and
data mining.
 Loan payment prediction and customer credit policy analysis.
 Classification and clustering of customers for targeted marketing.
 Detection of money laundering and other financial crimes.
Retail Industry
Data Mining has its great application in Retail Industry because it collects large amount of data from
on sales, customer purchasing history, goods transportation, consumption and services. It is natural
that the quantity of data collected will continue to expand rapidly because of the increasing ease,
availability and popularity of the web.
Data mining in retail industry helps in identifying customer buying patterns and trends that lead to
improved quality of customer service and good customer retention and satisfaction. Here is the list of
examples of data mining in the retail industry −
 Design and Construction of data warehouses based on the benefits of data mining.
 Multidimensional analysis of sales, customers, products, time and region.
 Analysis of effectiveness of sales campaigns.
 Customer Retention.
51
BIA
 Product recommendation and cross-referencing of items.
Telecommunication Industry
Today the telecommunication industry is one of the most emerging industries providing various
services such as fax, pager, cellular phone, internet messenger, images, e-mail, web data
transmission, etc. Due to the development of new computer and communication technologies, the
telecommunication industry is rapidly expanding. This is the reason why data mining is become very
important to help and understand the business.
Data mining in telecommunication industry helps in identifying the telecommunication patterns,

catch fraudulent activities, make better use of resource, and improve quality of service. Here is the
list of examples for which data mining improves telecommunication services −
 Multidimensional Analysis of Telecommunication data.
 Fraudulent pattern analysis.
 Identification of unusual patterns.
 Multidimensional association and sequential patterns analysis.
 Mobile Telecommunication services.
 Use of visualization tools in telecommunication data analysis.
Biological Data Analysis

In recent times, we have seen a tremendous growth in the field of biology such as genomics,
proteomics, functional Genomics and biomedical research. Biological data mining is a very
important part of Bioinformatics. Following are the aspects in which data mining contributes for
biological data analysis −
 Semantic integration of heterogeneous, distributed genomic and proteomic

databases.
 Alignment, indexing, similarity search and comparative analysis multiple nucleotide

sequences.
 Discovery of structural patterns and analysis of genetic networks and protein

pathways.
 Association and path analysis.
 Visualization tools in genetic data analysis.
52
BIA
Other Scientific Applications
The applications discussed above tend to handle relatively small and homogeneous data sets for
which the statistical techniques are appropriate. Huge amount of data have been collected from
scientific domains such as geosciences, astronomy, etc. A large amount of data sets is being
generated because of the fast numerical simulations in various fields such as climate and ecosystem
modeling, chemical engineering, fluid dynamics, etc. Following are the applications of data mining
in the field of Scientific Applications −
 Data Warehouses and data preprocessing.
 Graph-based mining.
 Visualization and domain specific knowledge.
Intrusion Detection
Intrusion refers to any kind of action that threatens integrity, confidentiality, or the availability of
network resources. In this world of connectivity, security has become the major issue. With
increased usage of internet and availability of the tools and tricks for intruding and attacking
network prompted intrusion detection to become a critical component of network administration.
Here is the list of areas in which data mining technology may be applied for intrusion detection −
 Development of data mining algorithm for intrusion detection.
 Association and correlation analysis, aggregation to help select and build

discriminating attributes.
 Analysis of Stream data.
 Distributed data mining.
 Visualization and query tools.
Data Mining System Products
There are many data mining system products and domain specific data mining applications. The new
data mining systems and applications are being added to the previous systems. Also, efforts are
being made to standardize data mining languages.
Choosing a Data Mining System
The selection of a data mining system depends on the following features −
53
BIA
 Data Types− The data mining system may handle formatted text, record-based data,
and relational data. The data could also be in ASCII text, relational database data or data warehouse
data. Therefore, we should check what exact format the data mining system can handle.
 System Issues− We must consider the compatibility of a data mining system with
different operating systems. One data mining system may run on only one operating system or on
several. There are also data mining systems that provide web-based user interfaces and allow XML
data as input.
 Data Sources− Data sources refer to the data formats in which data mining system
will operate. Some data mining system may work only on ASCII text files while others on multiple
relational sources. Data mining system should also support ODBC connections or OLE DB for ODBC
connections.
 Data Mining functions and methodologies− There are some data mining systems
that provide only one data mining function such as classification while some provides multiple data
mining functions such as concept description, discovery-driven OLAP analysis, association mining,
linkage analysis, statistical analysis, classification, prediction, clustering, outlier analysis, similarity
search, etc.
 Coupling data mining with databases or data warehouse systems− Data mining
systems need to be coupled with a database or a data warehouse system. The coupled components
are integrated into a uniform information processing environment. Here are the types of coupling
listed below −
o No coupling
o Loose Coupling
o Semi tight Coupling
o Tight Coupling
 Scalability− There are two scalability issues in data mining −
o Row (Database size) Scalability− A data mining system is considered as row

scalable when the number or rows are enlarged 10 times. It takes no more than 10 times to execute
a query.
o Column (Dimension) Salability− A data mining system is considered as

column scalable if the mining query execution time increases linearly with the number of columns.
 Visualization Tools− Visualization in data mining can be categorized as follows −

54
BIA

o Data Visualization
o Mining Results Visualization
o Mining process visualization
o Visual data mining
 Data Mining query language and graphical user interface− An easy-to-use graphical
user interface is important to promote user-guided, interactive data mining. Unlike relational
database systems, data mining systems do not share underlying data mining query language.
Trends in Data Mining
Data mining concepts are still evolving and here are the latest trends that we get to see in this field −
 Application Exploration.
 Scalable and interactive data mining methods.
 Integration of data mining with database systems, data warehouse systems and web
database systems.
 SStandardization of data mining query language.
 Visual data mining.
 New methods for mining complex types of data.
 Biological data mining.
 Data mining and software engineering.
 Web mining.
 Distributed data mining.
 Real time data mining.
 Multi database data mining.
 Privacy protection and information security in data mining.
55
BIA
UNIT 4 Knowledge Management
Types of knowledge
Knowledge management is an activity practiced by enterprises all over the world. In the process of
knowledge management, these enterprises comprehensively gather information using many methods
and tools.
Then, gathered information is organized, stored, shared, and analyzed using defined techniques.
The analysis of such information will be based on resources, documents, people and their skills.
56
BIA
Properly analyzed information will then be stored as ‘knowledge’ of the enterprise. This knowledge
is later used for activities such as organizational decision making and training new staff members.
There have been many approaches to knowledge management from early days. Most of early
approaches have been manual storing and analysis of information. With the introduction of
computers, most organizational knowledge and management processes have been automated.
Therefore, information storing, retrieval and sharing have become convenient. Nowadays, most
enterprises have their own knowledge management framework in place.
The framework defines the knowledge gathering points, gathering techniques, tools used, data
storing tools and techniques and analyzing mechanism.
1. A Priori
A priori and a posteriori are two of the original terms in epistemology (the study of knowledge). A
priori literally means “from before” or “from earlier.” This is because a priori knowledge depends
upon what a person can derive from the world without needing to experience it. This is better known
as reasoning. Of course, a degree of experience is necessary upon which a priori knowledge can
take shape.
Let’s look at an example. If you were in a closed room with no windows and someone asked you
what the weather was like, you would not be able to answer them with any degree of truth. If you
did, then you certainly would not be in possession of a priori knowledge. It would simply be
impossible to use reasoning to produce a knowledgeable answer.
On the other hand, if there were a chalkboard in the room and someone wrote the equation 4 + 6
= ? on the board, then you could find the answer without physically finding four objects and adding
six more objects to them and then counting them. You would know the answer is 10 without needing
a real world experience to understand it. In fact, mathematical equations are one of the most popular
examples of a priori knowledge.
Interested in learning more about philosophy? Check out this five-star course on an introduction
to philosophy and its different schools of thought.
57
BIA
2. A Posteriori
Naturally, then, a posteriori literally means “from what comes later” or “from what comes after.”
This is a reference to experience and using a different kind of reasoning (inductive) to gain
knowledge. This kind of knowledge is gained by first having an experience (and the important idea
in philosophy is that it is acquired through the five senses) and then using logic and reflection to
derive understanding from it. In philosophy, this term is sometimes used interchangeably with
empirical knowledge, which is knowledge based on observation.
It is believed that a priori knowledge is more reliable than a posteriori knowledge. This might seem
counter-intuitive, since in the former case someone can just sit inside of a room and base their
knowledge on factual evidence while in the latter case someone is having real experiences in the
world. But the problem lies in this very fact: everyone’s experiences are subjective and open to
interpretation. This is a very complex subject and you might find it illuminating to read this post on
knowledge issues and how to identify and use them. A mathematical equation, on the other hand,
is law.
3. Explicit Knowledge
Now we are entering the realm of explicit and tacit knowledge. As you have noticed by now, types of
knowledge tend to come in pairs and are often antitheses of each other. Explicit knowledge is similar
to a priori knowledge in that it is more formal or perhaps more reliable. Explicit knowledge is
knowledge that is recorded and communicated through mediums. It is our libraries and databases.
The specifics of what is contained is less important than how it is contained. Anything from the
sciences to the arts can have elements that can be expressed in explicit knowledge. Get a taste of
explicit knowledge for yourself with this top-rated course on learning how to learn and knowing
how to tap into your inner genius.
The defining feature of explicit knowledge is that it can be easily and quickly transmitted from one
individual to another, or to another ten-thousand or ten-billion. It also tends to be organized
systematically. For example, a history textbook on the founding of America would take a
chronological approach as this would allow knowledge to build upon itself through a progressive
system; in this case, time.
4. Tacit Knowledge
I should note that tacit knowledge is a relatively new theory introduced only as recently as the 1950s.
Whereas explicit knowledge is very easy to communicate and transfer from one individual to
another, tacit knowledge is precisely the opposite. It is extremely difficult, if not impossible, to
communicate tacit knowledge through any medium.
For example, the textbook on the founding of America can teach facts (or things we believe to be
facts), but someone who is an expert musician can not truly communicate their knowledge; in other
words, they can not tell someone how to play the instrument and the person will immediately
possess that knowledge. That knowledge must be acquired to a degree that goes far, far beyond
58
BIA
theory. In this sense, tacit knowledge would most closely resemble a posteriori knowledge, as it can
only be achieved through experience.
The biggest difficult of tacit knowledge is knowing when it is useful and figuring out how to make it
usable. Tacit knowledge can only be communicated through consistent and extensive relationships or
contact (such as taking lessons from a professional musician). But even in this cases there will not be
a true transfer of knowledge. Usually two forms of knowledge are born, as each person must fill in
certain blanks (such as skill, short-cuts, rhythms, etc.). You can better understand this theory and
other ways we use knowledge with this video textbook on the psychology of learning.
5. Propositional Knowledge (also Descriptive or Declarative Knowledge)
Our last pair of knowledge theories are propositional and non-propositional knowledge, both of
which share similarities with some of the other theories already discussed. Propositional knowledge
has the oddest definition yet, as it is commonly held that it is knowledge that can literally be
expressed in propositions; that is, in declarative sentences (to use its other name) or indicative
propositions.
Propositional knowledge is not so different from a priori and explicit knowledge. The key attribute
is knowing that something is true. Again, mathematical equations could be an example of
propositional knowledge, because it is knowledge of something, as opposed to knowledge of how to
do something.
The best example is one that contrasts propositional knowledge with our next form of knowledge,
non-propositional or procedural knowledge. Let’s use a textbook/manual/instructional pamphlet that
has information on how to program a computer as our example. Propositional knowledge is simply
knowing something or having knowledge of something. So if you read and/or memorized the
textbook or manual, then you would know the steps on how to program a computer. You could even
repeat these steps to someone else in the form of declarative sentences or indicative propositions.
However, you may have memorized every word yet have no idea how to actually program a
computer. That is where non-propositional or procedural knowledge comes in.
Now might be a good time to brush up on how we learn with this sweet course on how to base
goals on what you want to learn in order to exceed your wildest dreams.
6. Non-Propositional Knowledge (also Procedural Knowledge)
Non-propositional knowledge (which is better known as procedural knowledge, but I decided to use
“non-propositional” because it is a more obvious antithesis to “propositional”) is knowledge that can
be used; it can be applied to something, such as a problem. Procedural knowledge differs from
propositional knowledge in that it is acquired “by doing”; propositional knowledge is acquired by
more conservative forms of learning.
59
BIA
One of the defining characteristics of procedural knowledge is that it can be claimed in a court of
law. In other words, companies that develop their own procedures or methods can protect them as
intellectual property. They can then, of course, be sold, protected, leased, etc.
Procedural knowledge has many advantages. Obviously, hands-on experience is extremely valuable;
literally so, as it can be used to obtain employment. We are seeing this today as experience
(procedural) is eclipsing education (propositional). Sure, education is great, but experience is what
defines what a person is capable of accomplishing. So someone who “knows” how to write code is
not nearly as valuable as someone who “writes” or “has written” code. However, some people
believe that this is a double-edged sword, as the degree of experience required to become proficient
limits us to a relatively narrow field of variety.
But nobody can deny the intrinsic and real value of experience. This is often more accurate than
propositional knowledge because it is more akin to the scientific method; hypotheses are tested,
observation is used, and progress results.
Knowledge Generation, Knowledge Storage

Knowledge Generation
Sources of Knowledge of an Organization
 Intranet
 Data warehouses and knowledge repositories
 Decision support tools
 Groupware for supporting collaboration
 Networks of knowledge workers
 Internal expertise
Definition of KMS
A knowledge management system comprises a range of practices used in an organization to identify,

create, represent, distribute, and enable adoption to insight and experience. Such insights and
experience comprise knowledge, either embodied in individual or embedded in organizational
processes and practices.
60
BIA
Purpose of KMS
 Improved performance
 Competitive advantage
 Innovation
 Sharing of knowledge
 Integration
 Continuous improvement by:
o Driving strategy
o Starting new lines of business
o Solving problems faster
o Developing professional skills
o Recruit and retain talent
Activities in Knowledge Management
 Start with the business problem and the business value to be delivered first.
 Identify what kind of strategy to pursue to deliver this value and address the KM
problem.
 Think about the system required from a people and process point of view.
 Finally, think about what kind of technical infrastructure are required to support the
people and processes.
 Implement system and processes with appropriate change management and iterative
staged release.
Knowledge Management Technologies

Knowledge Management Technologies are information technologies that can be used to facilitate
knowledge management. Knowledge Management Technologies are intrinsically no different from
information technologies, but they can focus on knowledge management rather than information
processing.
61
BIA
Knowledge Management Technologies also support knowledge management systems and benefit
from the knowledge management infrastructure, especially the information technology
infrastructure. KM technologies constitute a key component of KM systems.
Technologies that support KM include artificial intelligence (AI) technologies including those used
for knowledge acquisition and case-based reasoning systems, electronic discussion groups,
computer-based simulations, databases, decision support systems, enterprise resource planning
systems, expert systems, management information systems, expertise locator systems,
videoconferencing, and information repositories including best practices databases and lessons
learned systems. KM technologies also include the emergent Web 2.0 technologies, such as wikis
and blog (Becerra-Fernandez and Sabherwal, 2010).
Knowledge Management Mechanisms and Technologies work together and affect each other. You
can follow the following video-clips to learn more about how information technology influence
knowledge management
There are four main knowledge management processes, and each process comprises two sub-
processes:
 Knowledge discovery
o Combination
o Socialization
 Knowledge capture
o Externalization
o Internalization
 Knowledge sharing
o Socialization
o Exchange
 Knowledge application
o Direction
o Routines
Emerging Issues in Business Intelligence

62
BIA
Organizations are closely watching emerging technology trends to discover the next great
competitive advantage in the use of information. One trend is easy to identify: more information.
Data volumes are growing across the board, with organizations seeking to tap new sources generated
by social media and online customer behavior. This trend is spurring tremendous interest in better
access and analysis of the variety of information available in unstructured or semi-structured content
sources.
1. Data Discovery Accelerates Self-Service BI and Analytics
From a macro perspective, it’s easy to identify the biggest long-term trend in business intelligence:
providing nontechnical users with the tools and capabilities to access, analyze, and share data on
their own. However, the road to this destination has not been easy. With IT driving application
development and deployment, standard approaches to extending enterprise BI and data analysis
capabilities have been difficult and slow. Getting the requirements right for the data, reports,
visualization, and drill-down analysis capabilities is difficult and never fully satisfactory. By the time
requirements have been gathered and turned into application features, users will have identified
different requirements.
2. Unified Access and Analysis of All Types of Information Improves User Productivity
As the implementation of BI and analytics tools spreads to more users within organizations, a
question inevitably arises: What about all the information in text and document formats, which
accounts for the vast majority of what users encounter? Difficulty in finding information, whether
structured or unstructured, is a productivity cost to organizations. If one of the measures of BI’s
value is improved productivity, then BI should help users access and analyze unstructured as well as
structured information.
Historically, BI systems have developed in technology ecosystems limited to structured,

alphanumeric data, leaving unstructured content to document and content management systems,
search engines, and a lot of manual paperwork. With the majority of content increasingly being
stored and generated in digital form, users are demanding better integration between content access
and analysis and the structured realm of BI. Integrated views of all types of information can help
managers and frontline workers see the context surrounding the numbers in structured systems. This
enables them to uncover business opportunities and find the root causes of problems more quickly.
3. Big Data Generated by Social Media Drives Innovation in Customer Analytics
Customer data intelligence has long been a major driver behind growth in the implementation of
sophisticated analytics for prediction and pattern recognition as well as advanced data warehousing.
In the brick-and-mortar days, organizations wanted to slice, dice, and mine transaction data and
interpret it against demographic information. Advanced organizations sought to mine the data to
uncover buying patterns and product affinities. As e-commerce and call centers proliferated,
organizations needed to expand customer analysis to include interaction information recorded in all
channels, bringing more terabytes into their data warehouses.
63
BIA
Now, with Twitter, Facebook, and other sites, we have hit the social media age: customers are using
social networks to influence others and express their shopping interests and experiences.
Organizations are hungry to capture and analyze activity by current and potential customers in social
networks and comment fields across the Internet marketplace.
4. Text Analytics Enables Organizations to Interpret Social Media Sentiment Trends and
Commentary
Rising interest in social media analysis is putting the spotlight on text analytics, which is the critical
technology for understanding “sentiment” in social media, as well as customer reviews and other
content sources. Like data mining, the text mining and analytics category stretches to include a range
of techniques and software, such as natural language processing, relationship extraction,
visualization, and predictive analysis.
Text analytics falls within the realm of interpretation rather than exact science, which makes it a nice
complement to BI and structured data analytics. Sentiment analysis, for example, employs statistical
and linguistic text analysis methods to understand positive and negative comments. While this
analysis can provide an early sense of the reception of a new product or service, the interpretation
cannot replace the more exacting analysis of the numbers done with BI or structured analytics tools.
Sentiment analysis, however, can help organizations become more proactive in taking steps to
address negative reactions to products and services before they lead to the poor sales that BI and data
warehouse users detect later in the reporting and analysis of sales transaction figures.
5. Decision Management Enables Organizations to be Predictive and Proactive in Real

Time
Trailblazing organizations in many industries are applying automated information technology to

dramatically reduce, if not eliminate, delays in how they respond to customer interactions, adjust to
changes in supply chains, prevent fraudulent activity, and more. The goal is to operate in as close to
real time as possible. Along with automation, organizations are striving to use information analysis
to become predictive and proactive. The objective is to develop predictive models and forecast
behavior patterns so that organizations can anticipate certain events; then, they can orchestrate
processes so that they can be proactive and fully prepared when predicted events or patterns occur.
When limited to a reactive posture, organizations face delays and confusion in how to respond to
events, which can lead to increased costs and missed opportunities. Reactive organizations lack a
well-orchestrated plan and can only respond to events on a case-by-case basis. With speed and
complexity rising in many industries, a reactive posture isn’t good enough. Organizations need
business intelligence and analytics applications and services that will help them shift from a reactive
to a proactive and predictive posture. Traditional BI systems are not enough for organizations to
make this shift.
Decision management is the term industry experts and vendors use to describe the integration of
analytics with business rules and process management systems to achieve a predictive and proactive
posture in a real-time world. Decision management requires several technologies. Business rules, or
conditional statements for guiding decision processes, are common in application code and logic; the
64
BIA
challenge is to implement business rules systems that can guide decisions across applications and
processes, not just within one system. Business process management systems help organizations
optimize processes that cross applications and use analytics as part of the continuous improvement
of those processes.
Along with business rules and business process management, a third technology important to
decision management is complex (or business) event processing. Events are happening everywhere;
they are recorded or “sensed” from online behavior, RFID tags, manufacturing systems, surveillance,
financial services trading, and so on. Integrated with analytics and data visualization, event
processing systems can enable organizations to pick out meaningful events from a stream or “cloud”
of noise that is not important.
Organizations can use decision management technologies to automate decisions where speed and
complexity overwhelm human-centered decision processes, and where there are competitive
advantages to having decisions executed in real time and driven by predictive models. Decision
management is an emerging technology area currently focused on specialized systems, but as
demand for greater execution speed and efficiency grows, more organizations will evaluate its
potential for mainstream requirements.
The Shape of Things to Come

Picking just five trends is not easy, given that we are in an exciting phase of innovation—particularly
regarding the access and analysis of big data, including social media content and data for new forms
of investigation, such as geospatial analysis. In addition, the trends are unfolding as the
infrastructure of computing is changing dramatically to include cloud platforms and the vast,
worldwide adoption of mobile devices. So, while I did not identify either cloud computing or mobile
adoption among the trends in this article, these platform shifts should be kept in mind as context for
how the trends are likely to play out.
65
BIA
66

Business Intelligence

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Business Intelligence

Transféré par

Droits d'auteur :

Formats disponibles

BIA

UNIT 1 Business Intelligence And Business Decisions

What Can a DSS Analyze?

How Can a DSS Present the Information?

Where Can a DSS Be Used?

 Adaptability and flexibility

 High level of Interactivity

 Efficiency and effectiveness

 Complete control by decision-makers

 Support for modeling and analysis

 Support for data access

 Standalone, integrated, and Web-based

 Support for decision-makers in semi-structured and unstructured problems.

 Support for interdependent or sequential decisions.

 Support for intelligence, design, choice, and implementation.

 Support for variety of decision processes and styles.

 DSSs are adaptive over time.

 Improves efficiency and speed of decision-making activities.

 Facilitates interpersonal communication.

 Encourages learning or training.

 Since it is mostly used in non-programmed decisions, it reveals new approaches and

 Helps automate managerial processes.

 Database Management System (DBMS): To solve a problem the necessary data

 Spreadsheet Oriented DSS: It contains information in spread sheets that allows

 Solver Oriented DSS: It is based on a solver, which is an algorithm or procedure

 Rules Oriented DSS: It follows certain procedures adopted as rules.

 Status Inquiry System: It helps in taking operational, management level, or middle

 Accounting System: It keeps track of accounting and finance related information,

Group Decision Support and

This is why there is an increased emphasis on developing and implementing

A communications-driven group DSS

 Fosters collaboration between cross functional business teams at same or

 Allows geographically separated decision makers connect face-to-face in

What is Communications-Driven Group Decision Support System ?

A communications-driven group decision support system:

 Is a type of hybrid computer-based interactive decision support system

 That uses communications and network technologies

 To facilitate communication, resource/information sharing, face-to-face

 Among a group of decision makers that are separated by a distance

Group Support Tools

 Groupware: A software system to enhance collaboration among

 Multimedia Decision Support: An integration of computer, video and

 Electronic Meeting System: A software system to facilitate creative

 Collaborative Workgroup Software: A web-based team collaboration and

Group Decision Support Situations

 Same time, same place

 Same time, different place

 Different time, same place

 Different time, different place

Same Time, Same Place

Same Time, Different Place

In this situation, individuals participate in decision-making from geographically different

 Allows people to see what others at different locations are doing

Different Time, Same Place

 Workstation software for shift work

Different Time, Different Place

A GDSS supports communication and collaboration in all the above situations.

A Managerial Perspective on Group Decision Support

A communications-driven group decision support system is implemented, so that it can

 The type of support a proposed technology can offer

 The extent of support a GDSS will offer

 The technologies it must support to ensure smooth functioning

 Should there be an audio conferencing facility? If yes, how many people

 Will participants be using the technology, like bulletin boards?