Vous êtes sur la page 1sur 221


 Anil Tatti
 aniltatti [at] simca [dot] ac [dot] in
 Send mail – will reply within 24 hours
 Tue – 8.15 a.m. to 9.30 a.m.
 Thursday / Friday – 3.15 p.m. to 4.30 p.m.
 Walk In Any time – You are Welcome
 If holiday- make it up in next week / available time slot
 Exam 70 %
 Assignment / Homework every week -25 %
 Attendenance – 5 %
 Copy – Fail

SIMCA 2009 Lecture 2 1

Information Systems
Why Do People Need Information?

 Individuals - Entertainment and enlightenment

 Businesses - Decision making, problem solving and


SIMCA 2009 Lecture 2 2

Data, Information, and Systems
 Data vs. Information

 Data
 A “given,” or fact; a number, a statement, or a picture
 Represents something in the real world
 The raw materials in the production of information

 Information
 Data that have meaning within a context
 Data in relationships
 Data after manipulation

SIMCA 2009 Lecture 2 3

Data, Information, and Systems

 Data Manipulation

 Example: customer survey

 Reading through data collected from a customer survey with
questions in various categories would be time-consuming and
not very helpful.
 When manipulated, the surveys may provide useful

SIMCA 2009 Lecture 2 4

Data, Information, and Systems
 Generating Information
 Computer-based ISs take data as raw material, process it, and
produce information as output.

Figure 1.1 Input-process-output

SIMCA 2009 Lecture 2 5
Data, Information, and Systems
 Information in Context

Figure 1.2 Characteristics of useful information

SIMCA 2009 Lecture 2 6
Data, Information, and Systems
 What Is a System?
 System: A set of components that work together to achieve a
common goal

 Subsystem: One part of a system where the products of more

than one system are combined to reach an ultimate goal

 Closed system: Stand-alone system that has no contact with other


 Open system: System that interfaces with other systems

SIMCA 2009 Lecture 2 7

Data, Information, and Systems

Figure 1.3 Several subsystems make up this corporate accounting

SIMCA 2009 Lecture 2 8
Data, Information, and Systems
 Information and Managers

 Systems thinking
 Creates a framework for problem solving and decision
 Keeps managers focused on overall goals and operations of

SIMCA 2009 Lecture 2 9

Data, Information, and Systems

Figure 1.5 Qualities of humans and computers that contribute to


SIMCA 2009 Lecture 2 10

Data, Information, and Systems

 The Benefits of Human-Computer Synergy

 Synergy
 When combined resources produce output that exceeds the
sum of the outputs of the same resources employed separately

 Allows human thought to be translated into efficient

processing of large amounts of data

SIMCA 2009 Lecture 2 11

Data, Information, and Systems

Figure 1.6 Components of an information system

SIMCA 2009 Lecture 2 12

Data, Information, and Systems
 The Four Stages of Data Processing

 Input: Data is collected and entered into computer.

 Data processing: Data is manipulated into information using

mathematical, statistical, and other tools.

 Output: Information is displayed or presented.

 Storage: Data and information are maintained for later use.

SIMCA 2009 Lecture 2 13

Why Study IS?
 Information Systems Careers
 Systems analyst, specialist in enterprise resource planning (ERP),
database administrator, telecommunications specialist, consulting, etc.
 Knowledge Workers
 Managers and non-managers
 Employers seek computer-literate professionals who know how to use
information technology.
 Computer Literacy Replacing Traditional Literacy
 Key to full participation in western society

SIMCA 2009 Lecture 2 14

Ethical and Societal Issues
The Not-So-Bright Side
 Consumer Privacy
 Organizations collect (and sometimes sell) huge
amounts of data on individuals.

 Employee Privacy
 IT supports remote monitoring of employees, violating
privacy and creating stress.

SIMCA 2009 Lecture 2 15

Ethical and Societal Issues
The Not-So-Bright Side
 Freedom of Speech
 IT increases opportunities for pornography, hate speech, intellectual
property crime, an d other intrusions; prevention may abridge free

 IT Professionalism
 No mandatory or enforced code of ethics for IT professionals--unlike
other professions.

 Social Inequality
 Less than 20% of the world’s population have ever used a PC; less than
3% have Internet access.

SIMCA 2009 Lecture 2 16

MIS Components


Backup data
Restart job
Virus scan

People Data
SIMCA 2009 Lecture 2 17
Management Information – Related Subsystems
 Information Technology (IT)
 is any computer based tool that people use to work with
information and support the information-processing needs of an
 Includes Hardware, Software, Communications, networks,
production automation, etc
 Any ‘Kit’ concerned with the capture, storage, transmission, and
presentation of information

SIMCA 2009 Lecture 2 18

Decision Support Systems (DSS) ,

 Computer system designed to provide assistance in determining

and evaluating alternative courses of action.
 (1) acquires data from the mass of routine transactions of a firm,
 (2) analyzes it with advanced statistical techniques to extract
meaningful information, and
 (3) narrows down the range of choices by applying rules based on
decision theory. Its objective is facilitation of 'what if' analysis and not
replacement of a manager's judgment.
 Example: Decision Explorer from Banxia
 Example: Analyitica from Lumina

SIMCA 2009 Lecture 2 19

Strategic Management Information Systems

 Systems considered critical to the current or future business

competitiveness of an organisation
 SMIS is a relative rather than an absolute term as one must
assess the of a give organisation first before attaching the term
SMIS to a technology
 Example: A web service offering a product online could be
considered strategic – i.e. Dell computers, Air online booking
 Example: Business Process re-engineering modelling software

SIMCA 2009 Lecture 2 20

Geographic Information Systems (GIS)

 Business information over layed on Geographical Maps

 Example: Google Earth shows Business locations, visitor
attractions, etc in particular areas

SIMCA 2009 Lecture 2 21

Management Information – Related Subsystems
 Expert System (ES)
 Also called a knowledge based system – is an Artificial Intelligence
system that applies reasoning capabilities to reach a conclusion.
 Expert systems are software systems which capture the knowledge
and experience of “experts” in particular fields – Accounting,
Medicine, Production Control, etc.
 Expert Systems, through a series of carefully contrived questions
to the user, can determine “What's wrong”, and “what to do”.
 Example: Forensic accounting

SIMCA 2009 Lecture 2 22

Dashboard System (DS)(EIS)

 A dashboard is an Executive Information System user interface that

(similar to an automobile’s dashboard) is designed to be easy to read.
For example, a product might obtain information from the local
operating system in a computer, from one or more applications that
may be running, and from one or more remote sites on the Web and
present it as though it all came from the same source.
 Digital dashboards may be laid out to track the flows inherent in the
business processes that they monitor. Graphically, users may see the
high-level processes and then drill down into low level data.

SIMCA 2009 Lecture 2 23

Airline Dashboard System

SIMCA 2009 Lecture 2 24

Traditional / Classical Organisation

CEO Condensed reports


Finance Accounting HRM MIS
Analyze data

Layers of middle managers


SIMCA 2009 Lecture 2 25

Pioneers of Traditional / Scientific

 5 Key Functions of • Principles for Organisational

Management Structure
- Unity of Command
- To Plan - Small Spans of Control
- To Organise - Line or Chain of Command
- To Command - Division of Work - specialism
- To Co-ordinate - Delegate Authority & Retain
- To Control Responsibility

SIMCA 2009 Lecture 2 26

Modern Criticisms of Classical Management

 Inhuman working conditions and poor industrial relations

 Over-specialisation and restrictive work practices

 Bureaucratic organisational structures – long chains of


 Inward- looking organisational structures

 Closed Systems – run out of steam when not conscious of

environmental influences

SIMCA 2009 Lecture 2 27

The Matrix Management

• Project Focussed
• Multi-disciplinary teams
• Team members have more than one
• Project team disbanded when
project completes
• New project team for new project
• Gives team members an insight into
the workings of other departments
• Leadership training ground
• Allows people with ideas to carry
them forward
• May cause blurring of
communication lines

SIMCA 2009 Lecture 2 28

Modern Organisation structure

Customer Partner


Fin Prod HR Mkt IT


teams teams teams teams teams

Contractor Partner

SIMCA 2009 Lecture 2 29

New structure - Decentralised

Management Team

Dir Dir Dir Dir Dir

Fin Mrkt Acct HRM MIS

Finance Marketing Accounting HRM
Team Team Team Team Database

Sales Methodology/Rules


SIMCA 2009 Lecture 2 30

Business Trends

 Changing business environment

 Specialization
 Management by Methodology and Franchises
 Mergers
 Decentralization and Small Business
 Temporary Workers
 Internationalization
 Service-Oriented Business
 Re-engineering
 Recession

 Need for faster responses and flexibility

 MIS reflecting these requirements -

SIMCA 2009 Lecture 2 31

Business Trends & Implications
 Specialisation
 Increased demand for technical skills
 Specialized MIS tools
 Increased communication

 Methodology & Franchises

 Reduction of middle management
 Increased data sharing
 Increased analysis by top management
 Computer support for rules
 Re-engineering

 Mergers
 Larger companies
 Need for control and information
 Economies of scale

 Decentralization & Small Business

 Communication needs
 Lower cost of management tasks
 Low maintenance technology

SIMCA 2009 Lecture 2 32

Business Trends & Implications

 Temporary Workers
 Managing through rules
 Finding and evaluating workers
 Coordination and control
 Personal advancement through technology
 Security

 Internationalization
 Communication
 Product design
 System development and programming
 Sales and marketing

 Service Orientation
 Management jobs are information jobs
 Customer service requires better information
 Speed

SIMCA 2009 Lecture 2 33

Business Trend Implications for Technology
Specialization •Increased demand for technical skills
•Specialized MIS tools
•Increased communication
Methodology & Franchises •Reduction of middle management
•Increased data sharing
•Increased analysis by top management
•Computer support for rules
Mergers •Four or five big firms dominate most industries
•Need for communication
•Strategic ties to customers and suppliers
Decentralization & Small •Communication needs
Business •Lower cost of management tasks
•Low maintenance technology
Temporary Workers •Managing through rules
•Finding and evaluating workers
•Coordination and control
•Personal advancement through technology
Internationalization •Communication
•Product design
•System development and programming
•Sales and marketing
Service Orientation •Management jobs are information jobs
•Customer service requires better information
SIMCA 2009 Lecture 2 34
Management Information Systems (MIS)

 Management information system (MIS)

 An MIS provides managers with information and support for
effective decision making, and provides feedback on daily operations
 Output, or reports, are usually generated through accumulation of
transaction processing data
 Each MIS is an integrated collection of subsystems, which are
typically organized along functional lines within an organization

SIMCA 2009 Lecture 2 35

Sources of Management Information


SIMCA 2009 Lecture 2 36


Databases Corporate
of intranet
data Decision

Transaction Databases Management Executive

Business processing of information Application support
transactions systems valid systems databases systems

Drill-down reports Expert

Exception reports systems
Demand reports
Operational Key-indicator reports
Input and Scheduled
error list reports

SIMCA 2009 Lecture 2 37

Outputs of a
Management Information System
 Scheduled reports
 Produced periodically, or on a schedule (daily, weekly, monthly)
 Key-indicator report
 Summarizes the previous day’s critical activities
 Typically available at the beginning of each day
 Demand report
 Gives certain information at a manager’s request
 Exception report
 Automatically produced when a situation is unusual or requires management

SIMCA 2009 Lecture 2 38

Scheduled Report Example

Daily Sales Detail Report

Prepared: 08/10/xx

Order Customer Sales Rep ID Ship

# ID Date Quantity Item # Amount

P12453 C89321 CAR 08/12/96 144 P1234 $3,214

P12453 C89321 CAR 08/12/96 288 P3214 $5,660

P12453 C03214 GWA 08/13/96 12 P4902 $1,224

P12455 C52313 SAK 08/12/96 24 P4012 $2,448

P12456 C34123 JMW 08J/13/96 144 P3214 $720

SIMCA 2009 Lecture 2 39

Key Indicator Report Example

Daily Sales Key Indicator Report

This Last Last

Month Month Year

Total Orders Month to Date $1,808 $1,694 $1,014

Forecasted Sales for the Month $2,406 $2,224 $2,608

SIMCA 2009 Lecture 2 40

Demand Report Example

Daily Sales by Sales Rep Summary Report

Prepared: 08/10/xx

Sales Rep ID Amount

CAR $42,345

GWA $38,950

SAK $22,100

JWN $12,350

SIMCA 2009 Lecture 2 41

Exception Report Example

Daily Sales Exception Report – ORDERS OVER $10,000

Prepared: 08/10/xx

Order Customer Sales Rep ID Ship

# ID Date Quantity Item # Amount

P12453 C89321 CAR 08/12/96 144 P1234 $13,214

P12453 C89321 CAR 08/12/96 288 P3214 $15,660

P12453 C03214 GWA 08/13/96 12 P4902 $11,224

… … … … … … …

… … … … … … …

SIMCA 2009 Lecture 2 42

Outputs of a Management
Information System

Earnings by Quarter (Millions)

Actual Forecast Variance

2ND Qtr 1999 $12.6 $11.8 6.8%

Drill Down Reports
Provide detailed data 1st Qtr 1999 $10.8 $10.7 0.9%

about a situation. 4th Qtr 1998 $14.3 $14.5 -1.4%

3rd Qtr 1998 $12.8 $13.3 -3.0%

Etc. See Figure 9.2

SIMCA 2009 Lecture 2 43

Characteristics of a Management
Information System

 Provides reports with fixed and standard formats

 Hard-copy and soft-copy reports
 Uses internal data stored in the computer system
 End users can develop custom reports
 Requires formal requests from users

SIMCA 2009 Lecture 2 44

Management Information Systems for
Competitive Advantage
 Provides support to managers as they work to achieve
corporate goals
 Enables managers to compare results to established
company goals and identify problem areas and
opportunities for improvement

SIMCA 2009 Lecture 2 45

MIS and Web Technology
 Data may be made available from management
information systems on a company’s intranet
 Employees can use browsers and their PC to gain access
to the data

SIMCA 2009 Lecture 2 46

Functional Aspects
 MIS is an integrated collection of functional information
systems, each supporting particular functional areas.


SIMCA 2009 Lecture 2 47

Internet An Organization’s


Drill down reports

Transaction Databases MIS Exception reports
processing of
Demand reports
systems valid
transactions Key-indicator reports
MIS Scheduled reports

transactions Databases Human
Resources Etc.
data MIS

Figure 9.3
SIMCA 2009 Lecture 2 48
Financial MIS
 Provides financial information to all financial managers
within an organization.


SIMCA 2009 Lecture 2 49

Databases of Financial
Databases of
internal data external data DSS

Transaction Databases
processing of valid
systems transactions MIS Financial
for each applications
TPS databases

Financial statements
Operational Uses and management ES
Internet databases
Internetoror of funds
Extranet Financial statistics
for control

Business Customers,
transactions Suppliers
Figure 9.3
SIMCA 2009 Lecture 2 50
Inputs to the Financial Information
 Strategic plan or corporate policies
 Contains major financial objectives and often projects financial needs.
 Transaction processing system (TPS)
 Important financial information collected from almost every TPS -
payroll, inventory control, order processing, accounts payable, accounts
receivable, general ledger.
 External sources
 Annual reports and financial statements of competitors and general news

SIMCA 2009 Lecture 2 51

Financial MIS Subsystems and
 Financial subsystems
 Profit/loss and cost systems
 Auditing
 Internal auditing
 External auditing
 Uses and management of funds

SIMCA 2009 Lecture 2 52

Manufacturing MIS


SIMCA 2009 Lecture 2 53

Databases of Manufacturing
Databases of
internal data external data DSS

Transaction Databases
processing of valid
systems transactions MIS Manufacturing
for each applications
TPS databases

Business Quality control reports

Process control reports Manufacturing
Operational ES
JIT reports
Internet databases
Internetoror MRP reports
Production schedule
CAD output

Business Customers,
transactions Suppliers
Figure 9.6
SIMCA 2009 Lecture 2 54
Inputs to the Manufacturing MIS
 Strategic plan or corporate policies.
 The TPS:
 Order processing
 Inventory data
 Receiving and inspecting data
 Personnel data
 Production process
 External sources

SIMCA 2009 Lecture 2 55

Manufacturing MIS Subsystems
and Outputs
 Design and engineering
 Master production scheduling
 Inventory control
 Manufacturing resource planning
 Just-in-time inventory and manufacturing
 Process control
 Computer-integrated manufacturing (CIM)
 Quality control and testing

SIMCA 2009 Lecture 2 56

Marketing MIS
 Supports managerial activities in product development,
distribution, pricing decisions, and promotional


SIMCA 2009 Lecture 2 57

Databases of Manufacturing
Databases of
internal data external data DSS

Transaction Databases
Business processing of valid
transactions systems transactions MIS Marketing
for each applications
TPS databases

Sales by customer

Sales by salesperson Manufacturing

Operational Sales by product ES
databases Pricing report
Total service calls
Customer satisfaction

Figure 9.9
SIMCA 2009 Lecture 2 58
Inputs to Marketing MIS
 Strategic plan and corporate policies
 The TPS
 External sources:
 The competition
 The market

SIMCA 2009 Lecture 2 59

Marketing MIS Subsystems and
 Marketing research
 Product development
 Promotion and advertising
 Product pricing

SIMCA 2009 Lecture 2 60

Human Resource MIS
 Concerned with all of the activities related to employees
and potential employees of the organization

SIMCA 2009 Lecture 2 61

Databases of Manufacturing
Databases of
internal data external data DSS

Transaction Databases Human

Business processing of valid Resource Human
transactions systems transactions resource
for each MIS applications
TPS databases

Benefit reports

Salary surveys Manufacturing

Operational Scheduling reports ES
databases Training test scores
Job applicant profiles
Needs and planning

Figure 9.12
SIMCA 2009 Lecture 2 62
Inputs to the Human Resource MIS

 Strategic plan or corporate policies

 The TPS:
 Payroll data
 Order processing data
 Personnel data
 External sources

SIMCA 2009 Lecture 2 63

Human Resource MIS Subsystems
and Outputs
 Human resource planning
 Personnel selection and recruiting
 Training and skills inventory
 Scheduling and job placement
 Wage and salary administration

SIMCA 2009 Lecture 2 64

Other MIS
 Accounting MISs
 Provides aggregated information on accounts payable,
accounts receivable, payroll, and other applications.
 Geographic information systems (GIS)
 Enables managers to pair pre-drawn maps or map outlines
with tabular data to describe aspects of a particular
geographic region.

SIMCA 2009 Lecture 2 65

MIS & Related Organisational Functions

Strategic Management:
Provides an organisation with overall
direction and guidance – mission and
Strategic S
ES P Tactical Management:

an DSS Develops the goals and strategies

Tactical cti outlined by Strategic Management


on Operational Management:

Manages and directs the day-to-day


operations and implementations of the


Operational Mgmt

goals and strategies

Non – Management
Producing goods and services – serving
customers, order processing

SIMCA 2009 Lecture 2 66

SIMCA 2009 Lecture 2 67
What is MIS?
 Is a system which gives us the
 Right information
 To the right person
 At the right place
 At the right time
 In the right form
 At the right cost

SIMCA 2009 Lecture 2 68

 Why is it necessary
 Increased Business and Management complexities

 Who is a Good Manager

 One who minimizes / eliminates the elements of risk & uncertainty.

 Response Simulator
 Enables a decision maker to give either a reactive or proactive response
 May be futuristic.

SIMCA 2009 Lecture 2 69

Characteristics- Sub Systems
 Marketing
 Sales Forecasting , Sales Planning, Customer & Sales Analysis.
 Manufacturing
 Production Planning, scheduling, cost control analysis.
 Logistics
 Planning & Control of purchasing, inventories, distribution
 Personnel
 Planning Personnel requirements , Analyzing performance, salary administration.

SIMCA 2009 Lecture 2 70

Finance & Accounting
Financial analysis, cost analysis, capital requirements, planning,
income measurement.
Information Processing
Information system planning , Cost – Benefit analysis.
Top Management
Strategic Planning, resource allocation.

SIMCA 2009 Lecture 2 71

Activity Sub-Systems
 Transaction Processing
 Processing of Orders, shipments & receipts.
 Operational Control
 Scheduling of activities & performance receipts.
 Management Control
 Formulation of budgets & resource allocation.
 Strategic Planning
 Formulation of objectives & strategic plans.

SIMCA 2009 Lecture 2 72

Users & Characteristics
Type of Information Inputs Processing Information Outputs Users

ESS/EIS Aggregate data , external , Graphics; simulations, Projections; response to Senior Managers
internal interactive Queries

DSS Low- Volume data, analyticInteractive; simulations, Special reports; decision Professionals; Staff
models analysis analysis; response to Managers

MIS Summary Transaction Routine reports; simple Summary & exception Middle Managers
data; high volume models; low level reports
data; simple models analysis

KWS Design Specializations, Modeling, simulations Models, Graphics Professionals; Technical Staff
Knowledge base

OAS Documents, schedules Document; management; Documents; schedules; mail Clerical Workers

TPS Transactions; events Sorting; listing; merging; Detailed reports; list Operations; Personnel;
updating summaries Supervisors

SIMCA 2009 Lecture 2 73

MIS Requirements
 Unified system
 Should support / facilitate decisions
 Should be compatible with the organisation’s structure & culture
 Should be cost effective / beneficial
 Should be responsive to changes around & within the organisation.
 Should be speedy & accurate
 Should provide validated & valid information
 Should be Management & not Manipulated Information system.

SIMCA 2009 Lecture 2 74

 Technical Approach
 Based on Mathematical & normative models
 Relies heavily on physical technology – CS , MS, OR
 Behavioral Approach
 Behavioral impact / response of people – Political Science,
Psychology, Sociology & organisational Behavior.
 Socio-Technical Approach
 Borrows from both the above approaches.

SIMCA 2009 Lecture 2 75

Why is it Important for Managers
Today to Consider the Strategic Role
of Information Systems?
Strategic Advantage and IT
 Important Managerial Questions
 What is strategy?
 What is strategic advantage?
 Information Systems as a strategic resource
 How do we use Information Systems to achieve some form of
strategic advantage over competitors?

SIMCA 2009 Lecture 2

What is Strategy?
Strategy Definitions

 Strategy
 A plan
 Early 1990s definition:
 “A well coordinated set of objectives, policies, and plans aimed at

securing a long-term competitive advantage. A vision for the

organization that is implemented.”
 Webster’s Dictionary
 “a careful plan or method”

 “the art of devising or employing plans toward a goal”

 “the art and science of military command exercised to meet the

enemy in combat under advantageous circumstances”

SIMCA 2009 Lecture 2 78

What is Strategy?
Strategy Definitions
 Strategy
 Henry Mintzberg:
 Explicitly planned: “Intended Strategy”

 Realized: planned and succeed

 Unrealized: planned but fail

 Implicit, not explicitly planned yet executed: “Emergent Strategy”

Planned Executed
Strategy Strategy

Failed Emergent
SIMCA 2009 Lecture
Strategy 2
Strategy 79
Strategic Advantage and IT
Evolution of Strategy Concepts

 Competitive Strategy Strategy

 Competitive Advantage Speeding Up
 Sustainable Competitive Advantage
 defensible market position (CQFDS), unique core competence

 long-term barriers to competition, non-competitive profits (>0)

 Temporary (Non-Sustainable) Competitive Advantage

 Strategic Advantage
 Sustainable Strategic Advantage
 long-term, dominant strategy, strategic systems, strategic structural

 Temporary Strategic Advantage
 Leverageable Strategic Advantage (Carr)
» dominant strategy is only a stepping-stone to future

dominant strategies

SIMCA 2009 Lecture 2

Strategic Advantage and IT
Evolution of Strategy Concepts
 Venkatraman (BU) and Subramaniam (BC Prof.)
 Three eras of approaches for achieving strategic advantage
 Portfolio of Business (1970s)
 performance a result of businesses you pick to be in
 motivated by economies of scale
 Portfolio of Capabilities (mid 1980s)
 performance a result of internal processes and routines, which
provide distinctive capabilities
 motivated by economies of scale and scope
 Portfolio of Relationships (mid 1990s)
 performance a result of building a wide array of relationships with
external companies that possess hard-to-imitate capabilities
 motivated by economies of scale, scope, and expertise

SIMCA 2009 Lecture 2 81

Information Systems as a Strategic
 Inwardly Strategic  Outwardly Strategic
 focused on internal processes  aimed at direct competition
 lower costs  beat competitors
 increase employee  new services
productivity  new “knowledge” that leads
 improve teamwork to new services
 enhance communication

SIMCA 2009 Lecture 2

Information Systems as a Strategic
 Hayes and Wheelwright (1985) - operations effectiveness, applies equally
well to ISD effectiveness
 Stage 1: Internally Neutral
 not seen as a source of process improvement technology
 Minimize negative impact of functional area on organization
 Top management “in control”; tells dept. what to do
 Stage 2: Externally Neutral
 not seen as a source of external competitive advantage
 Stage 3: Internally Supportive
 source of internally focused competitive advantages
 Stage 4: Externally Supportive
 viewed as competitive force in the business
 function drives issues of top-management strategy making
SIMCA 2009 Lecture 2
Information Systems as a Strategic
Competitive Marketplace


Company A
Internally Company B
Strategic Inter-Firm
SIMCA 2009 Lecture 2
Elements of Strategic Management
 Innovation
 Response-Management
 Long-Range Planning
 Competitive Intelligence

SIMCA 2009 Lecture 2

Model #1:
Porter’s Competitive Forces Model
 Threat of new competitors
 Bargaining power of suppliers
 Bargaining power of customers
 Threat of substitute products or services
 Rivalry among existing firms

SIMCA 2009 Lecture 2

Model #1:
Porter’s Competitive Forces Model - “Generic
Response Strategies”
 Cost leadership Market Size
Niche Broad
 Differentiation
 Focus
 Other dimensions … Cost Focus Cost

 Strategic positioning Strategic

 Customer service Advantage
 Operational Effectiveness Diff. Focus Diff.
 Cost, Quality, Flexibility, Delivery

SIMCA 2009 Lecture 2

Model #1:
Use of Porter’s Model
 List players
 Analyze business drivers
 Devise a strategy
 Investigate supportive information technologies

SIMCA 2009 Lecture 2

Models for Understanding the Value
Creation Process
Model #2:
Porter’s Value Chain Analysis Model
Porter’s “Value Chain”
Firm Infrastructure
Human Resources Management
Technology Development
Procurement Profit

Inbound Outbound Marketing

Operations Service
Logistics Logistics & Sales

SIMCA 2009 Lecture 2 90

Model #2:
Porter’s Value Chain Analysis Model - Primary
 Inbound logistics
 Operations
 Outbound logistics
 Marketing / sales
 Service

SIMCA 2009 Lecture 2

Model #2:
Porter’s Value Chain Analysis Model - Support
 Firm infrastructure
 Human resource management
 Technology department
 Procurement

SIMCA 2009 Lecture 2

Model #3:
Porter and Millar Five-Step Process

 Assess information intensity (note: quite subjective)

 High … implies strategic opportunities exist
 customers need a lot of information to understand and/or use a product
 suppliers dependent on information
 Determine the role of IT in the industry structure
 Identify and rank the ways in which IT can create competitive
 Investigate how IT might spawn new businesses
 Develop a plan for taking advantage of IT

SIMCA 2009 Lecture 2

SIMCA 2009 Lecture 2 94
Value Web

SIMCA 2009 Lecture 2 95


SIMCA 2009 Lecture 2 96

 Strategic Information System Applications
 Cost leadership
 Differentiation
 Growth
 Alliances
 Innovation
 Improve internal efficiency
 Customer-oriented approaches

SIMCA 2009 Lecture 2

Functional Use Of MIS
 To lower cost in all parts of Value chain
 Facilitate product delivery
 Adding value to quality
 Transform physical processing component into information
 Speed / Ability – Competitive Advantage
 Quality Enhancement
 Simplification – Product , Process, Cycle Time
 Organisation – Benchmark , Customer Service, Precision etc

SIMCA 2009 Lecture 2 98

Strategic Use
 Out perform rivals
 Product differentiation
 Focussed differentiation
 Right linkages to customers & suppliers
 Low cost Product
 Precise development of strategies, planning , forecasting &
 Problem Solving / Decision making

SIMCA 2009 Lecture 2 99

Strategic Uses Contd….
 Coordinate activities globally
 Think Globally, act Locally
 Competitive Advantage
 More Flexible & Responsive
 Flexibility

SIMCA 2009 Lecture 2 100

MIS - Organisation & Change

SIMCA 2009 Lecture 2 101

Why Firms Seek Competitive Advantage
(Porter’s Five-Force Model):

• Rivalry among existing competitors

• Threat of new entrants
• Threat of substitute product and services
• Bargaining power of buyers
• Bargaining power of suppliers

SIMCA 2009 Lecture 2 102

Competitive Forces Model

SIMCA 2009 Lecture 2 103

Information Systems for Competitive Advantage
 Businesses continually seek to establish competitive advantage in the
 There are eight principles:
 The first three principles concern products.
 The second three principles concern the creation of barriers.
 The last two principles concern establishing alliances and reducing costs.

SIMCA 2009 Lecture 2 104

Organizational Change
 Organizational change deals with how organizations plan for,
implement and handle change. Overcoming resistance to change
can be the hardest part of bringing information systems into a
business. Too many computer systems and new technologies have
failed because managers and employees were not prepared for
 A change model identifies the phases of change and the best way to
implement it:
 Unfreezing is the process of removing old habits and creating a climate
receptive to change
 Moving is the process of learning new work methods, behaviors and systems
 Refreezing involves reinforcing changes to make the new process second
nature, accepted and part of the job

SIMCA 2009 Lecture 2 105

Internet Business Models

SIMCA 2009 Lecture 2 106

Internet Business Models

SIMCA 2009 Lecture 2 107

 IT
 Networks
 Database Management Systems
 Data Mining
 Mid term – Next Sunday – 10 to 12 pm. -50 marks –
Counted as internal – Portion till Friday 10/04/2009

SIMCA 2009 Lecture 2 108

Information Technology
 -is the acquisition, processing, storage and dissemination
of vocal, pictorial, textual & numeric information by a
micro-electronics based combination of computing &
 -used to describe technologies which enable the users to
record ,store, process, transmit & receive information.

SIMCA 2009 Lecture 2 109

IT Capabilities
 Transactional – transform unstructured process into routine
 Geographical-overcome distance barrier
 Automation – reduce human labour
 Informational – huge amounts of data
 Sequential – Sequence / Multiple Sequence
 Knowledge Management- allows capture /dissemination of
knowledge & expertise to improve a process
 Tracking- of task status , inputs & outputs
 Disintermediation – connect 2 parties without an intermediary.

SIMCA 2009 Lecture 2 110


 Evolution
 Hardware
 Input
 Output
 Storage – Pri /Sec
 Media / Communication devices
 Software
 System Software – OS, Complier – Diff OS
 Application Software- Concerned with accomplishing task of end users.

SIMCA 2009 Lecture 2 111

SIMCA 2009 Lecture 2 112
Data Processing
 Data- What is data?
 Bits , Bytes ,Character , Field, Record , Blocks, File , Database
 Activity – Read , Sort , Write, Merge, Delete, Store, Compare, Collate,
Decide, Display , Print, Copy, Compute, Plot, Transfer, Create , Perform
 Operations
 Capturing data from an event. Transaction
 Verifying for correctness
 Classifying into specific categories
 Sorting – placing data in a particular sequence
 Summarizing- aggregating data elements
 Calculating- Arithmetic / Logic operations
 Storing in a media
 Retrieving – searching & gaining access to specific data elements
 Reproducing from one medium to another
 Communicating from one place to another.

SIMCA 2009 Lecture 2 113

Data Processing Hierarchy

 Electronic Data Processing- transactions occurring due to day to day

 Office Automation Systems – for performing office routines
 TPS – capturing, classifying, storing , maintaining , updating &
retrieving data
 MIS – provide information for decision makers
 DSS- interactive system to support operations & decision making @
strategic / tactical levels
 EIS / ESS – combines data from both internal & external sources to
be applied to a changing array of problems
 Knowledge based / Expert Systems – based on rules of thumb or
heuristic knowledge intuition, judgment & inferences

SIMCA 2009 Lecture 2 114

Transaction Processing
 Has relevance for 3 reasons
 Information
 Action
 Investigational
 Validation Tests
 Missing data
 Valid Size
 Class / Composition
 Range or Reasonableness
 Invalid Value
 Comparison with Stored data
 Check Digit

SIMCA 2009 Lecture 2 115

TP Controls
 Audit Trial- Tracing
 Pre-Numbered Source Document- to ensure sequential
 Document Produced as a byproduct of Transaction- use
of credit card.
 Control Report- Summary -cash register
 Anticipation Report- waiting for certain event to occur &
then scheduling other transactions.

SIMCA 2009 Lecture 2 116

 Batch Processing
 Online
 Real Time
 Distributed Processing
 Time Sharing
 Multi Programming
 Multi Processing

SIMCA 2009 Lecture 2 117

Data Transmission
 Transmitter
 Converter @ transmitting end
 Transmission Channel
 Convertor @ receiving end
 Receiver of Transmitted Channel

 Universal Seven Part Data Circuit

 DTE / DCE Interface
 Transmission Channel
 DCE /DTE Interface

SIMCA 2009 Lecture 2 118

Transmission Process
 Analog / Digital Signal
 Modem
 Multiplexer / Demultiplexer
 Channels
 Physical Line
 Twisted Pair
 Coaxial Cable
 Optical Fibre
 Micro Wave
 Tower – LOS
 Radio / Wireless
 Satellite

SIMCA 2009 Lecture 2 119

 Transmission Speed- bps
 Bandwidth - Capacity
 Transmission mode –
 Synchronous
 Asynchronous
 Transmission Direction
 Simplex
 Duplex
 Half Duplex

SIMCA 2009 Lecture 2 120

Traditional File Processing

SIMCA 2009 Lecture 2 121

 Data Redundancy & Inconsistency
 Program Data Dependence
 Lack of Flexibility
 Poor Security
 Lack of Data Sharing & Availability

SIMCA 2009 Lecture 2 122

Contemporary Database Systems

SIMCA 2009 Lecture 2 123


SIMCA 2009 Lecture 2 124

Hierarchical Database

SIMCA 2009 Lecture 2 125

Network Model

SIMCA 2009 Lecture 2 126


SIMCA 2009 Lecture 2 127

Data Warehouse

SIMCA 2009 Lecture 2 128

Hypermedia database

SIMCA 2009 Lecture 2 129

Web Linkage

SIMCA 2009 Lecture 2 130

Components of a Network

SIMCA 2009 Lecture 2 131

 Node
 Access Path
 Protocol
 File Server
 Network Operating System

SIMCA 2009 Lecture 2 132

SIMCA 2009 Lecture 2 133
4 layered model

SIMCA 2009 Lecture 2 134

 What is a topology
 Terminology
 Different technologies

SIMCA 2009 Lecture 2 135

Why Network Computers?
 To share files
 To share hardware
 To share programs
 User communication

SIMCA 2009 Lecture 2 136

Networking – consists of computers, wiring, and other devices, such as
hubs, switches, and routers that make up the network infrastructure.

 Topology – (from the Greek word topos meaning place) is a description of

any kind of locality in terms of its layout.

 There are two ways to describe a network topology.

1. Physical topology
2. Logical Topology

SIMCA 2009 Lecture 2 137


Client – a computer that allows a user to log onto the

network and take advantages of the resources on the

Server – Much more powerful computer that provides

centralized administration of the network and serves up the
resources that are available on the network.

SIMCA 2009 Lecture 2 138

Client/Server network operating
systems allow the network to
centralize functions and
applications in one or more
file servers

Advantages Disadvantages
 Centralized
 Scalable • Maintenance
 Flexible • Expense
 Interoperable
 Accessible • Dependence

SIMCA 2009 Lecture 2 139

Peer to Peer
Each computer acts both as a Advantages
client and server.  Less expense
 Easy setup
 Decentralized

 Security
 Decentralized

SIMCA 2009 Lecture 2 140

Standard Physical Topologies



SIMCA 2009 Lecture 2 141
Bus Topology

 Characterized by a main trunk or backbone line with networked computers

attached at intervals along the trunk line.
 Passive topology
 Typically use coaxial cable hooked to each computer using a T-connector.

SIMCA 2009 Lecture 2 142

Bus Topology cont.

Coaxial Cable

SIMCA 2009 Lecture 2 143

Star Topology
Computers on the network connect to a centralized
connectivity device, usually a hub or a switch.

SIMCA 2009 Lecture 2 144

Ring Topology

 Connects the LAN computers one after

the other on the wire in a physical
 Moves info on the wire in one
direction, considered an active

SIMCA 2009 Lecture 2 145

Mesh Topology
 All nodes are directly connected with all other nodes.
 Best choice when fault tolerance is required.
 Very difficult to setup and maintain.

SIMCA 2009 Lecture 2 146

Standard Logical Topologies

 The way in which data accesses the medium (cable) and

transmits packets.

 There are only two: Ring and Bus

SIMCA 2009 Lecture 2 147

Logical Topology: Ring
In the ring logical topology only one node can send information across
the network at any given time. This is done by way of a ‘token’.
Each terminal receives this special packet, and if it has data to send, it
will do so.
Once it has sent the data, it passes the token to the next station.
 Used for very fast networks
 No collisions
 Susceptible to faults

SIMCA 2009 Lecture 2 148

Each time a node on a network has data for another node
the sending node broadcasts to the entire network.

 Stations can always transmit.

 Less susceptible to breaks.
 Collisions (two stations transmitting at once) have to be dealt with.

SIMCA 2009 Lecture 2 149

Selecting a Topology
Do you need very high speeds?
Will you be moving really large files?

How far is it between stations?
Will you be relocating stations often?

Do you want something (relatively) painless?

Are you on a budget?
Do you want replacement parts
SIMCA easily
2009 Lecture 2 accessible? 150
Domain Name System

SIMCA 2009 Lecture 2 151

Internet Network Architecture

SIMCA 2009 Lecture 2 152

Types of Network
 Lan- Within buildings / campuses
 Controlled, Maintained & Operated by end users
 High transmission hence high data & high speed.
 Share costly hardware & software
 Promote productivity as direct communication is possible
 MAN- Metropolitan Area Network\
 Lan Interconnection
 Bulk Data Transfer
 Compressed Video
 Backbone Network
 WAN – Wide area Network
 VAN- Value Added Network

SIMCA 2009 Lecture 2 153

Key issues in implementation
 Human Factors
 Cost
 Security
 Reliability
 Network Management
 Compatibility with current / future networks.

SIMCA 2009 Lecture 2 154

Open System Interconnect (OSI)
 Application
 End uder Applications –File transfers / Remote access , Email etc
 Presentation
 Various data formats. Data conversions , encryption
 Session
 Manages dialogues / sessions
 Transport
 Reliable end to end transport of data , error recovery , flow control
 Network
 Establish , maintain , terminate n/w connection , routing
 Datalink
 Procedures & protocols for communication lines , error correction
 Physical
 Physical means of sending data over lines. Electrical / Mechanical & functional
control of data circuits.

SIMCA 2009 Lecture 2 155

 Transmission Control Protocol

SIMCA 2009 Lecture 2 156

Internet Capabilities
 Email –messaging , document sharing
 Usenet Networking – Discussion groups, electronic boards.
 Chatting - Conversation
 Telnet – Remote Login
 Gophers- Locate Textual info using a hierarchy of menus.
 Archie – Search database of documents .s/w & data available for
 Wide Area Info. Services – Locate files in database using keywords
 WWW – Retrieve , format & display information.

SIMCA 2009 Lecture 2 157

Pros & Cons
 Reducing Communication Costs
 Enhancing communication & coordination
 Accelerating the distribution of knowledge
 Improving Customer Service
 Facilitating marketing & sales
 Disadvantages
 Security
 Technology Problems
 Lack of Standards
 Legal Issues
 Traditional Internet culture

SIMCA 2009 Lecture 2 158

Intranet / Extranet
 Not internet but the application of internet technologies
to the internal corporate network
 Extranet – semi private – specifically designed for a very
select group of users /audience – e.g.. Company’s
suppliers or business associates

SIMCA 2009 Lecture 2 159

Integrated Services Digital Network
 Standard for transmitting voice data, image & video support over
public telephone lines
 Integrate all current & emerging technologies into a single world
wide network
 Allows user to
 Achieve convenience
 Flexibility
 Economy
 Lower power consumption
 Easy Maintenance
 Clarity , accuracy & speed

SIMCA 2009 Lecture 2 160

IT Enabled Services (ITES)
 Offering of services from remote location
 Call Centers
 Medical Transcription
 Animation
 Back Office
 Legal database
 Market Research
 Remote Education
 Website Services

SIMCA 2009 Lecture 2 161

 Measurement of natural & human made phenomenon & processes from a spatial
perspective with emphasis on 3 properties
 Elements
 Attributes
 Relationship
 Storage of measurements
 Points
 Lines
 Areas / Polygons.
 Analysis of collected measurements to produce more data & discover new relationships
 Depiction of measured / analyzed data in some type of display
 Maps
 Lists
 Graphs
 Summary statistics

SIMCA 2009 Lecture 2 162

GIS Applications
 Advertising
 Archeology
 Education
 Cartography
 Site Selection
 Election Administration
 Insurance
 Routing / Distribution Network
 Oil, Gas & Mineral exploration
 Wild Life
 Government Agencies – Police
 Transportation & Logistics
 Urban & Regional Planning
 Emergency Response Planning

SIMCA 2009 Lecture 2 163

Why Outsource?
 MNC’s can save costs
 Increase revenue
 Conserve capital
 Greater efficiency due to increased speed
 Rapidly improving Infrastructure
 Declining telecom costs
 Foster innovation
 Improve Quality

SIMCA 2009 Lecture 2 164

What Is a DBMS?

 A very large, integrated collection of data.

 Models real-world enterprise.
 Entities (e.g., students, courses)
 Relationships (e.g., Madonna is taking CS564)
 A Database Management System (DBMS) is a software
package designed to store and manage databases.

SIMCA 2009 Lecture 2 165

Why Use a DBMS?

 Data independence and efficient access.

 Reduced application development time.
 Data integrity and security.
 Uniform data administration.
 Concurrent access, recovery from crashes.

SIMCA 2009 Lecture 2 166

Why Study Databases??
 Shift from computation to information
 at the “low end”: scramble to web space (a mess!)
 at the “high end”: scientific applications
 Datasets increasing in diversity and volume.
 Digital libraries, interactive video, Human Genome project, EOS
 ... need for DBMS exploding
 DBMS encompasses most of CS
 OS, languages, theory, “A”I, multimedia, logic

SIMCA 2009 Lecture 2 167

Data Models
 A data model is a collection of concepts for describing
 A schema is a description of a particular collection of
data, using the a given data model.
 The relational model of data is the most widely used
model today.
 Main concept: relation, basically a table with rows and
 Every relation has a schema, which describes the columns, or

SIMCA 2009 Lecture 2 168

Levels of Abstraction
 Many views, single conceptual
(logical) schema and physical View 1 View 2 View 3
 Views describe how users see Conceptual Schema
the data.
 Conceptual schema defines
logical structure Physical Schema
 Physical schema describes the
files and indexes used.

☛ Schemas are defined using DDL; data is modified/queried using DML

SIMCA 2009 Lecture 2 169

Example: University Database
 Conceptual schema:
 Students(sid: string, name: string, login: string,
age: integer, gpa:real)
 Courses(cid: string, cname:string, credits:integer)
 Enrolled(sid:string, cid:string, grade:string)
 Physical schema:
 Relations stored as unordered files.
 Index on first column of Students.
 External Schema (View):
 Course_info(cid:string,enrollment:integer)

SIMCA 2009 Lecture 2 170

Data Independence
 Applications insulated from how data is structured and
 Logical data independence: Protection from changes in
logical structure of data.
 Physical data independence: Protection from changes in
physical structure of data.

☛ One of the most important benefits of using a DBMS!

SIMCA 2009 Lecture 2 171

 Atomicity - to guarantee that either all of the tasks of a transaction
are performed or none of them are.
 Consistency - ensures that the database remains in a consistent
state before the start of the transaction and after the transaction is
over .
 Isolation - that other operations cannot access or see the data in an
intermediate state during a transaction
 Durability - guarantee that once the user has been notified of
success, the transaction will persist, and not be undone.

SIMCA 2009 Lecture 2 172

The Log
 The following actions are recorded in the log:
 Ti writes an object: the old value and the new value.
 Log record must go to disk before the changed page!
 Ti commits/aborts: a log record indicating this action.
 Log records chained together by Xact id, so it’s easy to undo a
specific Xact (e.g., to resolve a deadlock).
 Log is often duplexed and archived on “stable” storage.
 All log related activities (and in fact, all CC related activities such
as lock/unlock, dealing with deadlocks etc.) are handled
transparently by the DBMS.

SIMCA 2009 Lecture 2 173

Databases make these folks
happy ...
 End users and DBMS vendors
 DB application programmers
 E.g. smart webmasters
 Database administrator (DBA)
 Designs logical /physical schemas
 Handles security and authorization
 Data availability, crash recovery
 Database tuning as needs evolve

Must understand how a DBMS works!

SIMCA 2009 Lecture 2 174
These layers
must consider

Structure of a DBMS concurrency

control and
 A typical DBMS has a layered
architecture. Query Optimization
 The figure does not show the and Execution
concurrency control and Relational Operators
recovery components.
 This is one of several possible Files and Access Methods
architectures; each system has
its own variations. Buffer Management

Disk Space Management


SIMCA 2009 Lecture 2 175

 DBMS used to maintain, query large datasets.
 Benefits include recovery from system crashes, concurrent
access, quick application development, data integrity and
 Levels of abstraction give data independence.
 A DBMS typically has a layered architecture.

 DBMS R&D is an exciting area.

SIMCA 2009 Lecture 2 176

State of Art in Databases
 Expanding domain of databases:
 Spatial Data
 Timeseries Data
 Text Data
 Music, Video, …
 Data Streams.
 Internet evolution and databases:
 yahoo!, Google, expedia, B2B, P2P, B2C,...
 Performance and Tuning!!!
 Future: Sensor networks.

SIMCA 2009 Lecture 2 177

DBMS Components
 Data Definition Language – DDL
 Data Manipulation Language – DML
 Data Dictionary

SIMCA 2009 Lecture 2 178

Objectives of Today’s Businesses
 Access and combine data from a variety of data stores
 Perform complex data analysis across these date stores
 Create multidimensional views of data and its metadata
 Easily summarize and roll up the information across
subject areas and business dimensions

SIMCA 2009 Lecture 2 179

These objectives cannot be met
 Data is scattered in many types of incompatible structures.
 Lack of documentation has prevented from integration
older legacy systems with newer systems
 Internet software like searching engine needs to be
 Accurate and accessible metadata across multiple
organizations is hard to get

SIMCA 2009 Lecture 2 180

Four Levels of Analytical
 In modern organization, at least four levels of analytical
processing should be supported by information systems
 First level: Consists of simple queries and reports against current and
historical data
 Second level: Goes deeper and requires the ability to do “what if”
processing across data store dimensions
 Third level: Needs to step back and analyze what has previously
occurred to bring about the current stat of the data
 Fourth level: Analyzes what has happened in the past and what needs to
be done in the future in order to bring some specific change

SIMCA 2009 Lecture 2 181

Data Warehouse Technology
 A strategy to build the basic constructs of the IDSS with
today’s technologies
 Definition given by W.H.Inmon
 The data warehouse is a collection of integrated, subject-oriented
databases designed to support the DSS (decision support)
function, where each unit of data is relevant to some moment in

SIMCA 2009 Lecture 2 182

Data Warehouse Technology (Con’t)

 The data should be well-defined, consistent, and

nonvolatile in nature.
 The quantity of data should be large enough to support
data analysis, querying, reporting, and comparisons of
historical data over a longer period of time.
 The data warehouse must be user driven.

SIMCA 2009 Lecture 2 183

Data Warehousing
 Subject Driven
 Non Volatile
 Time Varying
 Integrated

SIMCA 2009 Lecture 2 184

Operational Data Store vs. Data
Warehouse Technology
Issue Operational Warehouse
How built One application at a time in the One or more subject areas at a
legacy environment or one subject time
area at time in the ODS
Daily business operation Management decisions that may
Critical to Smaller numbers of rows retrieved affect profitability
Data access in a single call Large sets of data scanned to
retrieve results
Volume needed for daily operationLarger volume needed to support
statistical analysis, forecasting, ad
Data volume hoc reporting, and querying

SIMCA 2009 Lecture 2 185

Operational Data Store vs. Data
Warehouse Technology
Issue Operational Warehouse
Data retention Data retained to meet daily Data retained longer to support
requirements historical reporting, comparison,
analysis, etc.
Usually represents a static point in
Data currency Must be up to minute time; usually important that data
does not change minute by minute
Usually does not require as high
availability as the production
environment unless worldwide
Data Availability High availability may be needed access is necessary

SIMCA 2009 Lecture 2 186

Data Flow in a Single Organization

SIMCA 2009 Lecture 2 187

Data Mining: Introduction

SIMCA 2009 Lecture 2 188

Why Mine Data? Commercial Viewpoint
 Lots of data is being collected
and warehoused
 Web data, e-commerce
 purchases at department/
grocery stores
 Bank/Credit Card

 Computers have become cheaper and more powerful

 Competitive Pressure is Strong
 Provide better, customized services for an edge (e.g. in Customer Relationship

SIMCA 2009 Lecture 2 189

Why Mine Data?
Scientific Viewpoint
 Data collected and stored at
enormous speeds (GB/hour)
 remote sensors on a satellite
 telescopes scanning the skies
 microarrays generating gene
expression data
 scientific simulations
generating terabytes of data
 Traditional techniques infeasible for raw data
 Data mining may help scientists
 in classifying and segmenting data
 in Hypothesis Formation

SIMCA 2009 Lecture 2 190

Mining Large Data Sets - Motivation
 There is often information “hidden” in the data that is
not readily evident
 Human analysts may take weeks to discover useful information
 Much of the data is never analyzed at all


The Data Gap


Total new disk (TB) since 1995
Number of
SIMCA 2009 Lecture 2 191
From: R. Grossman, C. Kamath, V. Kumar, “Data Mining for Scientific and Engineering Applications”
What is Data Mining?
Many Definitions
 Non-trivial extraction of implicit, previously unknown and potentially useful
information from data
 Exploration & analysis, by automatic or
semi-automatic means, of
large quantities of data
in order to discover
meaningful patterns

SIMCA 2009 Lecture 2 192

What is (not) Data Mining?
●What is not Data ● What is Data Mining?
– Look up phone – Certain names are more
number in phone prevalent in certain US
directory locations (O’Brien, O’Rurke,
O’Reilly… in Boston area)
– Query a Web – Group together similar
search engine for documents returned by
information about search engine according to
“Amazon” their context (e.g. Amazon
SIMCA 2009 Lecture 2 Amazon.com,) 193
Origins of Data Mining
 Draws ideas from machine learning/AI, pattern recognition, statistics, and
database systems
 Traditional Techniques
may be unsuitable due to
 Enormity of data Statistics/ Machine Learning/
 High dimensionality AI Pattern
of data Recognition
 Heterogeneous,
distributed nature Data Mining
of data


SIMCA 2009 Lecture 2 194

Data Mining Tasks
 Prediction Methods
 Use some variables to predict unknown or future values of other

 Description Methods
 Find human-interpretable patterns that describe the data.

From [Fayyad, et.al.] Advances in Knowledge Discovery and Data Mining, 1996
SIMCA 2009 Lecture 2 195
Data Mining Tasks...
 Classification [Predictive]
 Clustering [Descriptive]
 Association Rule Discovery [Descriptive]
 Sequential Pattern Discovery [Descriptive]
 Regression [Predictive]
 Deviation Detection [Predictive]

SIMCA 2009 Lecture 2 196

Classification: Definition
 Given a collection of records (training set )
 Each record contains a set of attributes, one of the attributes is the
 Find a model for class attribute as a function of the
values of other attributes.
 Goal: previously unseen records should be assigned a
class as accurately as possible.
 A test set is used to determine the accuracy of the model. Usually, the
given data set is divided into training and test sets, with training set
used to build the model and test set used to validate it.

SIMCA 2009 Lecture 2 197

Classification Example
cal l
ca us
ri or
i o
go g ti nu
t e te n ss
ca ca co
cl a
Tid Refund Marital Taxable Refund Marital Taxable
Status Income Cheat Status Income Cheat

1 Yes Single 125K No No Single 75K ?

2 No Married 100K No Yes Married 50K ?
3 No Single 70K No No Married 150K ?
4 Yes Married 120K No Yes Divorced 90K ?
5 No Divorced 95K Yes No Single 40K ?
6 No Married 60K No No Married 80K ? Test
7 Yes Divorced 220K No


8 No Single 85K Yes

9 No Married 75K No
10 No Single 90K Yes Model

Set Classifier
SIMCA 2009 Lecture 2 198
Classification: Application 1
 Direct Marketing
 Goal: Reduce cost of mailing by targeting a set of consumers likely to
buy a new cell-phone product.
 Approach:
 Use the data for a similar product introduced before.
 We know which customers decided to buy and which decided otherwise.
This {buy, don’t buy} decision forms the class attribute.
 Collect various demographic, lifestyle, and company-interaction related
information about all such customers.
 Type of business, where they stay, how much they earn, etc.
 Use this information as input attributes to learn a classifier model.

From [Berry & Linoff] Data Mining Techniques, 1997

SIMCA 2009 Lecture 2 199
Classification: Application 2
 Fraud Detection
 Goal: Predict fraudulent cases in credit card transactions.
 Approach:
 Use credit card transactions and the information on its account-holder as
 When does a customer buy, what does he buy, how often he pays on time,
 Label past transactions as fraud or fair transactions. This forms the class
 Learn a model for the class of the transactions.
 Use this model to detect fraud by observing credit card transactions on an

SIMCA 2009 Lecture 2 200

Classification: Application 3
 Customer Attrition/Churn:
 Goal: To predict whether a customer is likely to be lost to a
 Approach:
 Use detailed record of transactions with each of the past and present
customers, to find attributes.
 How often the customer calls, where he calls, what time-of-the
day he calls most, his financial status, marital status, etc.
 Label the customers as loyal or disloyal.
 Find a model for loyalty.

From [Berry & Linoff] Data Mining Techniques, 1997

SIMCA 2009 Lecture 2 201
Classification: Application 4
 Sky Survey Cataloging
 Goal: To predict class (star or galaxy) of sky objects, especially visually
faint ones, based on the telescopic survey images (from Palomar
 3000 images with 23,040 x 23,040 pixels per image.
 Approach:
 Segment the image.
 Measure image attributes (features) - 40 of them per object.
 Model the class based on these features.
 Success Story: Could find 16 new high red-shift quasars, some of the farthest
objects that are difficult to find!

From [Fayyad, et.al.] Advances in Knowledge Discovery and Data Mining, 1996
SIMCA 2009 Lecture 2 202
Courtesy: http://aps.umn.edu
Classifying Galaxies
Early Class: Attributes:
• Stages of • Image features,
Formation • Characteristics of
light waves received,
Intermediate etc.


Data Size:
• 72 million stars, 20 million galaxies
• Object Catalog: 9 GB
• Image Database: 150 GB

SIMCA 2009 Lecture 2 203

Clustering Definition
 Given a set of data points, each having a set of attributes,
and a similarity measure among them, find clusters such
 Data points in one cluster are more similar to one another.
 Data points in separate clusters are less similar to one another.
 Similarity Measures:
 Euclidean Distance if attributes are continuous.
 Other Problem-specific Measures.

SIMCA 2009 Lecture 2 204

Illustrating Clustering
❘ Euclidean Distance Based Clustering in 3-D space.

distances Intercluster
minimized are

SIMCA 2009 Lecture 2 205

Clustering: Application 1
 Market Segmentation:
 Goal: subdivide a market into distinct subsets of customers where any
subset may conceivably be selected as a market target to be reached
with a distinct marketing mix.
 Approach:
 Collect different attributes of customers based on their geographical and
lifestyle related information.
 Find clusters of similar customers.
 Measure the clustering quality by observing buying patterns of customers in
same cluster vs. those from different clusters.

SIMCA 2009 Lecture 2 206

Clustering: Application 2
 Document Clustering:
 Goal: To find groups of documents that are similar to each other
based on the important terms appearing in them.
 Approach: To identify frequently occurring terms in each
document. Form a similarity measure based on the frequencies of
different terms. Use it to cluster.
 Gain: Information Retrieval can utilize the clusters to relate a
new document or search term to clustered documents.

SIMCA 2009 Lecture 2 207

Illustrating Document Clustering
 Clustering Points: 3204 Articles of Los Angeles Times.
 Similarity Measure: How many words are common in these
documents (after some word filtering).

Category Total Correctly

Articles Placed
Financial 555 364

Foreign 341 260

National 273 36

Metro 943 746

Sports 738 573

Entertainment 354
SIMCA 2009 Lecture 2 278 208
Clustering of S&P 500 Stock Data
❚ Observe Stock Movements every day.
❚ Clustering points: Stock-{UP/DOWN}
❚ Similarity Measure: Two points are more similar if the
events described by them frequently happen together on
the same day.
❚ We used association rules to quantify a similarity measure.
Discovered Clusters Industry Group

Applied-Matl-DOW N,Bay-Net work-Down,3-COM-DOWN,
Natl-Semiconduct-DOWN,Oracl-DOWN,SGI-DOW N,

Apple-Co mp-DOW N,Autodesk-DOWN,DEC-DOWN,
ADV-M icro-Device-DOWN,Andrew-Corp-DOWN,
Co mputer-Assoc-DOWN,Circuit-City-DOWN,
Co mpaq-DOWN, EM C-Corp-DOWN, Gen-Inst-DOWN,
Motorola-DOW N,Microsoft-DOWN,Scientific-Atl-DOWN

Fannie-Mae-DOWN,Fed-Ho me-Loan-DOW N,
MBNA-Corp -DOWN,Morgan-Stanley-DOWN Financial-DOWN

Louisiana-Land-UP,Phillips-Petro-UP,Unocal-UP, Oil-UP
Schlu mberger-UP

SIMCA 2009 Lecture 2 209

Association Rule Discovery:
 Given a set of records each of which contain some number of items
from a given collection;
 Produce dependency rules which will predict occurrence of an item
based on occurrences of other items.
TID Items
1 Bread, Coke, Milk
2 Beer, Bread
3 Beer, Coke, Diaper, Milk {Diaper,
4 Beer, Bread, Diaper, Milk
5 Coke, Diaper, Milk

SIMCA 2009 Lecture 2 210

Association Rule Discovery: Application 1

 Marketing and Sales Promotion:

 Let the rule discovered be
{Bagels, … } --> {Potato Chips}
 Potato Chips as consequent => Can be used to determine what should be
done to boost its sales.
 Bagels in the antecedent => Can be used to see which products would be
affected if the store discontinues selling bagels.
 Bagels in antecedent and Potato chips in consequent => Can be used to see
what products should be sold with Bagels to promote sale of Potato chips!

SIMCA 2009 Lecture 2 211

Association Rule Discovery: Application 2

 Supermarket shelf management.

 Goal: To identify items that are bought together by sufficiently
many customers.
 Approach: Process the point-of-sale data collected with barcode
scanners to find dependencies among items.
 A classic rule --
 If a customer buys diaper and milk, then he is very likely to buy
 So, don’t be surprised if you find six-packs stacked next to diapers!

SIMCA 2009 Lecture 2 212

Association Rule Discovery: Application 3

 Inventory Management:
 Goal: A consumer appliance repair company wants to anticipate the
nature of repairs on its consumer products and keep the service vehicles
equipped with right parts to reduce on number of visits to consumer
 Approach: Process the data on tools and parts required in previous
repairs at different consumer locations and discover the co-occurrence

SIMCA 2009 Lecture 2 213

Sequential Pattern Discovery:

Given is a set of objects, with each object associated with its own timeline of events, find rules that predict strong sequential dependencies among different events.

(A B)
Rules are formed by first discovering patterns. Event occurrences in the patterns are governed by timing constraints.

(C) (D E)

(A B) (C) (D E)
<= xg >ng <= ws

<= ms

SIMCA 2009 Lecture 2 214

Sequential Pattern Discovery:
 In telecommunications alarm logs,
 (Inverter_Problem Excessive_Line_Current)
(Rectifier_Alarm) --> (Fire_Alarm)
 In point-of-sale transaction sequences,
 Computer Bookstore:
(Intro_To_Visual_C) (C++_Primer) -->
 Athletic Apparel Store:
(Shoes) (Racket, Racketball) --> (Sports_Jacket)

SIMCA 2009 Lecture 2 215

 Predict a value of a given continuous valued variable based on the
values of other variables, assuming a linear or nonlinear model of
 Greatly studied in statistics, neural network fields.
 Examples:
 Predicting sales amounts of new product based on advertising
 Predicting wind velocities as a function of temperature, humidity, air
pressure, etc.
 Time series prediction of stock market indices.

SIMCA 2009 Lecture 2 216

Deviation/Anomaly Detection
 Detect significant deviations from normal behavior
 Applications:
 Credit Card Fraud Detection

 Network Intrusion

SIMCA 2009 Lecture 2 217

Typical network traffic at University level may reach over 100 million connections per
Challenges of Data Mining
 Scalability
 Dimensionality
 Complex and Heterogeneous Data
 Data Quality
 Data Ownership and Distribution
 Privacy Preservation
 Streaming Data

SIMCA 2009 Lecture 2 218

Exam Review
 Date: __.10.2009 - Time: 3 hours - Marks: 70
 All Questions are Compulsory
 A) Attempt any 7 Questions. Each Question carries 2
marks. (14 Marks)
 B) Attempt any 6 Questions. Each Question carries 4
marks. (24 Marks)
 C) Write Short Notes on any 4 topics mentioned below.
Each Question carries 8 marks. (32 Marks)

SIMCA 2009 Lecture 2 219

 IT
 Subsystems
 Data Processing
 Classical Management / Strategic Management
 Networks
 Data Mining / Data warehousing
 Applications
 Outsourcing
 Business trends / Models /

SIMCA 2009 Lecture 2 220

SIMCA 2009 Lecture 2 221