Vous êtes sur la page 1sur 147

Customer Analysis for Software XploRe

From Data Mining to Marketing


Strategy

Diplomarbeit

zur Erlangung des akademischen Grades eines


Master of Science

an der Wirtschaftswissenschaftlichen Fakultat


der Humboldt-Universitat zu Berlin

Eingereicht von
Jianqiu Wang
Am 27. Mai 2003
Matrikel-Nr.: 161426
Pr
ufer: Prof. Dr. Wolfgang Hardle

Contents
Abstract

Introduction

1. Customer analysis

1.1

1.2

1.3

Customer Behaviour . . . . . . . . . . . . . . . . . . . . . . . . .

1.1.1

Customers Black Box . . . . . . . . . . . . . . . . . . .

1.1.2

Consumer buying process . . . . . . . . . . . . . . . . . .

1.1.3

Customer behaviour model . . . . . . . . . . . . . . . . . .

1.1.4

Factors influencing customer buying behaviour . . . . . . .

10

Market Segmentation and Profiling . . . . . . . . . . . . . . . . .

12

1.2.1

Market segmentation . . . . . . . . . . . . . . . . . . . . .

13

1.2.2

Customer profiling . . . . . . . . . . . . . . . . . . . . . .

22

Market targeting and Positioning . . . . . . . . . . . . . . . . . .

23

1.3.1

Market Targeting . . . . . . . . . . . . . . . . . . . . . . .

23

1.3.2

Positioning . . . . . . . . . . . . . . . . . . . . . . . . . .

24

2. Data Mining
2.1

2.2

26

The process of Data mining . . . . . . . . . . . . . . . . . . . . .

26

2.1.1

Data Collection and Selection . . . . . . . . . . . . . . . .

26

2.1.2

Data Preparation . . . . . . . . . . . . . . . . . . . . . . .

28

2.1.3

Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

29

2.1.4

Result Interpretation . . . . . . . . . . . . . . . . . . . . .

29

The Aspects of Data Mining . . . . . . . . . . . . . . . . . . . . .

29

2.2.1

Applications . . . . . . . . . . . . . . . . . . . . . . . . . .

30

2.2.2

Operations . . . . . . . . . . . . . . . . . . . . . . . . . . .

31

2.2.3

Data Mining Techniques . . . . . . . . . . . . . . . . . . .

31

ii

Index of contents

3. XploRe user and customer analysis

39

3.1

About XploRe . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

39

3.2

XploRe user(2002) and customer descriptive analysis . . . . . . .

39

3.2.1

Data collection . . . . . . . . . . . . . . . . . . . . . . . .

39

3.2.2

Data cleaning and preparation . . . . . . . . . . . . . . . .

41

3.2.3

Data descriptive analysis and result . . . . . . . . . . . . .

42

3.2.4

Comparing the user and customer of XploRe . . . . . . . .

46

3.2.5

Measures of Improvement . . . . . . . . . . . . . . . . . .

46

Cluster analysis for XploRe user data 2002 . . . . . . . . . . . . .

47

3.3.1

Cluster analysis of categorical data . . . . . . . . . . . . .

47

3.3.2

Clustering with IBM intelligent Miner . . . . . . . . . . .

53

3.3.3

Cluster analysis with XploRe . . . . . . . . . . . . . . . .

59

3.3.4

Comparison of Cluster Analysis Results: IBM Intelligent


Miner versus XploRe . . . . . . . . . . . . . . . . . . . . .

63

Analysis of the latest User data (2003) . . . . . . . . . . . . . . .

63

3.4.1

Results of analysis of 2003 data . . . . . . . . . . . . . . .

63

3.4.2

Comparison of historical user data . . . . . . . . . . . . .

72

Complementary analysis . . . . . . . . . . . . . . . . . . . . . . .

78

3.5.1

Analysis of regrouped data . . . . . . . . . . . . . . . . . .

78

3.5.2

Analysis of high profitable sector . . . . . . . . . . . . . .

82

3.3

3.4

3.5

4. Suggested marketing strategy for XploRe


4.1

4.2

85

Marketing Strategy and Marketing mix . . . . . . . . . . . . . . .

85

4.1.1

marketing strategy . . . . . . . . . . . . . . . . . . . . . .

85

4.1.2

Marketing Mix . . . . . . . . . . . . . . . . . . . . . . . .

85

Develop the marketing strategy for XploRe . . . . . . . . . . . . .

91

4.2.1

Niche market strategy . . . . . . . . . . . . . . . . . . . .

92

4.2.2

Target Market . . . . . . . . . . . . . . . . . . . . . . . . .

92

4.2.3

Product position of XploRe:103 . . . . . . . . . . . . . . . .

92

Index of contents

iii

4.2.4

General XploRe marketing strategy pyramids . . . . . . .

93

4.2.5

General Marketing Mix . . . . . . . . . . . . . . . . . . . .

96

4.2.6

Special marketing mix for clusters . . . . . . . . . . . . . . 101

4.2.7

Marketing research - suggestions for further analysis . . . . 103

References

107

Appendix

116

Appendix 1: User 220702 Frequency Analysis . . . . . . . . . . . . . 117


Appendix 2: Customer Frequency Analysis (Nov. 05) . . . . . . . . . . 120
Appendix 3: Customer Registration form. . . . . . . . . . . . . . . . . 121
Appendix 4: Characteristics of User220702 Clusters by XploRe . . . . . 122
Appendix 5: User 130303 Frequency Analysis . . . . . . . . . . . . . 123
Appendix 6: User 13032003 Intelligent Miner Cluster Analysis . . . . 126
Appendix 7: Comparison of User and Regrouped User Data . . . . . . 128
Appendix 8: User 130303 (Regrouped) Frequency Analysis . . . . . . 129
Appendix 9: Regrouped User Intelligent Miner Cluster Analysis . . . 132
Appendix 10: Institute Users Frequency Analysis . . . . . . . . . . . 134
Erkl
arung zur Urheberschaft

137

iv

Index of contents

List of Figures
1.1

The customers Black box. . . . . . . . . . . . . . . . . . . . . .

1.2

A sequential model of the buying process . . . . . . . . . . . . . .

1.3

Consumer Behaviour model. . . . . . . . . . . . . . . . . . . . . .

1.4

Factors influencing consumer behaviour. . . . . . . . . . . . . . .

10

1.5

The process of marketing segmentation. . . . . . . . . . . . . . . .

14

1.6

Alternative consumer demand categories. . . . . . . . . . . . . . .

15

1.7

SAGACITY. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

20

1.8

Targeting strategies. . . . . . . . . . . . . . . . . . . . . . . . . .

24

3.1

Sample of online survey questionnaire. . . . . . . . . . . . . . . .

40

3.2

Clustering of Users 2002. . . . . . . . . . . . . . . . . . . . . . . .

55

3.3

Clustering of user 2003. . . . . . . . . . . . . . . . . . . . . . . . .

67

3.4

Software used in 2000 and 2003. . . . . . . . . . . . . . . . . . . .

74

3.5

Information resource in 2000 and 2003. . . . . . . . . . . . . . . .

75

3.6

Clustering of regrouped user data. . . . . . . . . . . . . . . . . . .

81

4.1

4P of marketing mix . . . . . . . . . . . . . . . . . . . . . . . . .

86

vi

Index of contents

List of Tables
23

1.1

Broad- based ACORN classifications

1.2

National readership survey socio-economic groups

. . . . . . . .

19

2.1

The aspects of data mining . . . . . . . . . . . . . . . . . . . . . .

30

3.1

Summary and decription of the varibale of User 22/07/02 data . .

44

3.2

Summary and descripiton of the variables for customer data . . .

45

3.3

Comparison of XlopRes Users and Customers . . . . . . . . . . .

47

3.4

Character characteristics of User IBM Intelligent Miner Clusters


(2002) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

57

Comparison of Clustering results with IBM Intelligent Miner and


XploRe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

64

3.6

Summary and description of the variables for User data 2003 . . .

65

3.7

Comparison of User 220702 and User 130303 . . . . . . . . . . .

72

3.8

Comparison of software used in 2000 and 2003 . . . . . . . . . . .

73

3.9

Comparison of information resources in 2000 and 2003 . . . . . .

74

3.10 Comparison of country in 2000 and 2003 . . . . . . . . . . . . . .

76

3.11 Comparison of continent in 2000 and 2003 . . . . . . . . . . . . .

76

3.12 Comparison of User clusters of 2000 and 2003 . . . . . . . . . . .

77

3.13 Summary and description of the variables of regrouped User data


2003 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

80

3.14 Comparison of Institute user and General user

84

3.5

vii

. . . . . . . . . . . . . . .
24

. . . . . . . . . .

18

viii

Index of contents

Abstract
This thesis paper presents a case study of customer analysis with the purpose
of to developing a marketing strategy for the statistical software XploRe. The
customers analysed include the users, who downloaded XploRe free trial version
through web site and the actual customers, who bought XploRe. Descriptive
analysis was conducted for both data, which leaded to the conclusion that research institutes represent is the high- profit able sector for of XploRe. For users
data, data mining method clustering was undertaken to identify the customer
segments. Two different clustering methods were tested on the same users data
set with different software IBM Intelligent Miner and XploRe. As the a result,
the users of XploRe were divided into four clusters by both methods, Internet
surfer,Academia, Linux user and Home worker. Through the comparison
of historical data for of user data 2003 and data 20020, more facts and trends
of XploRe market and customers were discovered regarding the software used,
information resource, new market and the undergoing changes in customer segments. Based on the results of customer analysis, the suggestions for marketing
strategy, marketing mix and further analysis were outlined.
Key words: customer analysis, market segmentation, data mining, clustering,
marketing strategy, marketing mix

Abstract

Introduction
Customer analysis is a crucial step for the development of marketing strategy.
Only when the company has a clear view of its customers could , the proper
strategy and actions could then be undertaken to gain competitive advantage in
the market.
In the current time, together with the development of digital data management
systems, the capability for of gathering, storing and accessing to the information
has improved dramatically. This trend brings the difficulty for companies when
they confront the huge amount of data. Data mining is a important technology
for the companies to conduct customer analysis for large data set. It discoveries
valuable information which is useful for marketing.
The research presented in this paper tried to segment the customers and find
the trends and facts of XploRe market, so that the suggestions for marketing
strategy could be derived based on the results. XploRe is a statistical software
which aims at sophisticated users who are looking for a flexible, programmable
statistics package with an emphasis on more advanced procedures.1 It is important for XploRe marketer to understand its customer and market. The customer
data studied here include the data of XploRe users (the potential customer) and
actual customers (the buyers). The user data was collected through an online
questionnaire preceding the downloading process of XploRe trial version, while
through the returned registration forms the customer data was gathered. With
the purpose of comparison, two sets of user data were analysed and two clustering methods were tested with two software IBM Intelligent Miner and XploRe.
The user data 2002 is from October 11, 2001 to July 22, 2002 and with 1734
profiles. The raw data of user data 2003 contains 2593 profiles and is collected
from October 11, 2002 to March 13, 2003. The customer data includes data of
32 profiles from July 1, 2000 to August 30, 2002.
Only descriptive analysis was taken for customer data due to its low amount
of records. For user data, the data mining process of clustering was conducted
to segment the market. The mining run for user data consists of several steps:
cleaning the raw data with MS Excel, transferring data to IBM Intelligent Miner
or XploRe, performing cluster analysis. The clustering identified four groups
of XploRe customers, namely Internet surfer, Academia, Linux user and
1

Hardle, Klinke and M


uller, 1999, P17.

Introduction

Home worker. Each cluster possesses its distinguishable features.


The comparison of customer and user 2002 leaded to the discovery of high profitable sector research institute. XploRe and IBM Intelligent Miner (IM) delivered
similar clustering results for user data, but IM performed better in visualisation
and computational efficiency. Comparing the results of historical data between
user data 2003 and user data 2000, some trends were identified. More professional
users switched to command driven software. XploRe made progress in communicational channels. Asia, especially Japan emerged as new market. From the
aspects of segments, Internet surfer is a brand-new group in 2003, which indicates
the entering of Internet age. The appearance of Home worker in 2003 instead of
Researcher in 2002 gives hint in the problem in the survey questionnaire. More
Academia take non-personal channels to get information. This again confirms
the improvement made by XploRe in communication channels. Linux users were
very stable during the period.
Based on the findings of analysis, some suggestions for marketing strategy and
further analysis were made for XploRe marketer.
This paper consists of mainly four parts. The first two sections following the
introduction lay the theoretical foundation for the customer analysis and data
mining. Section three is presents engaged for the analysis and results. Marketing
strategy and suggestions are developed in the fourth section. At the end, the
summary gives a brief overview for the whole paper.

1. Customer analysis
In the current market space, the competition is intensive. The market is abundant
with all kinds of products. To win the decision of customers to their products, the
companies should get a deep sight into what the customers really need and how to
influence their purchasing e decision. Therefore, the companies should now have
a customer focus conducting business with the emphasis on the understanding
of the customers and the market.
Customer analysis is the study of customers and their behaviour, which is central
to achieve a customer focus. 2 The purpose of conducting customer analysis is
to achieve marketing goals, such as the following: 3
Customer acquisition finding the new customer
Customer cross sell further sales of different products to the same customer
Customer up sell the customer makes greater use of the same product or
service
Customer retention keeping the customer loyal

1.1

Customer Behaviour

In order to understand the customer buying behaviour, we should first understand


the customer behaviour.

1.1.1

Customers Black Box

Customer behaviour here means that the behaviour of individuals who purchase
for private or household consumption. These customers buy goods which are not
a part of the value chain, and the purpose of purchasing is not to generate profit.
Buying behaviour depends on the individual reaction to the internal and external
stimuli; therefore, it is difficult to predict. Black box is the item that describes
2
3

WWW14
Heygate, Richard, 1998.

1. Customer analysis

the customer purchasing decision, which is difficult to access but is crucial for the
purchasing determination.
In order to develop appropriate products that are attractive to the customers,
firms need to have an insight into what happens in the black box. Figure ??
presents the customers black box. In the customers black box, the customer
actually gather information, evaluate and compare, then come to a decision, which
is called the Consumer buying process.
Consumer

Black box
Aspirations
Motivation
Education
Personality
Beliefs

External stimuli

- Identification of needs
- Evaluation of offers
that Satisfy need
- Comparsion of substitute
products and brands
- Purchase
- Post-purchase evaluation

People Place - Promotion --

7Ps

- Social pressure
- Legal requirments
- Physical factors
- Economic cycle

-- Product Price Process Physical environment

Marketer

Fig. 1.1: The customers Black box.

1.1.2

Consumer buying process

Buying decision process


The buying process starts with the customers desire of a product. This want
might be the result of internal stimuli like hunger and thirsty or the result of
external stimuli, such as advertisement.
Next step is the search for information. The consumers may collect information
consciously or unconsciously from various resources. There are four kinds of
information resources:
1. Personal sources such as family, friends, colleagues and neighbours;
3

Bannes, E., McClelland, B.,etc., 1997, P139.

1. Customer analysis

Recognition of
the problem

The search for


information

Evaluation of the
alternatives

The purchase
decision

Post-puchase
behaviour

Fig. 1.2: A sequential model of the buying process

2. Public sources such as the mass media and consumers organisation;


3. Commercial sources such as advertising, sales staff and brochures;
4. Experimental sources such as handling or trying the product.

Through information gathering, the customers get aware become aware of the various products and brands in the market, then they will evaluate the alternatives,
and finally make the purchase decision.
After purchasing major items or expenditure, many people experience cognitive
dissonance also called post purchase anxiety. They wonder whether they have
made the correct purchasing decision. To reduce this anxiety, they will look for
confirmation. For example, they might ask friends to approve that their purchase
is a right choice.
Figure 1.2 summarises the stages of consumer buying process: Recognition of the
problem, The search for information, Evaluation of the alternatives, The purchase
decision and Post-purchase behaviour.
Companies should present themselves in each buying process stage and try to
be distinguished among all other products and brands of competitors. To let
a brand or product be the final choice of customer, companies need to have
clear understanding of the evaluative criteria used by consumers in comparing
products, which was mentioned before.
3

Wilson, R. W. S. and Gilligan, C., P170.

1. Customer analysis

Five buying roles


The purchase process normally involves several persons, each has his distinct role.
Each role doesnt necessarily require to be the a different person. One person can
play several roles in a purchasing process.
The five roles in a purchasing process are:
The Initiator: The person who suggests buying the product or service.
The influencer: Person whose comments can affect the decision of purchasing.
The decider: The person who decide whether to buy and which product to
buy.
The buyer: Who executes the purchase.
The user: The final consumer of the product or service.
For example, a mother buys ice cream for her child. The child is the user; the
mother is the decider and buyer. The company should understand the function
that each role plays in the buying process in order to put effective influence on
customers buying decision through proper action.

1.1.3

Customer behaviour model

The customer behaviour model indicates the procedure and basic elements, which
happens inside the customers black box or consumer buying process.
The most basic, simplest and best known model of buyer behaviour is the AIDA,
which stands for Awareness, Interest, Desire and Action.4
The model introduced here composes of six interrelated components.5
1. Information or facts: refers to the precept caused by stimulus.
2. Product recognition defines to what the extent the buyer knows about the
product to distinguish it from others products.
4
5

Baker, M. and hart, S., 1999, P63.


Howard, J. A., 1994, P31-56.

1. Customer analysis

Fig. 1.3: Consumer Behaviour model.

3. Attitude towards the product refers to what the customer expects from the
product to satisfy their particular needs.
4. Confindence in judging the product is the customers degree of certainty that
his or her evaluative judgement of a product is correct.
5. Intention to buy is the mental state that reflects the customers plan to buy
some specific number of products from a particular brand in some specified
time period.
6. Purchase is caused by the intention to buy. It is defined as when the customer has paid for a product or has made some financial commitment to
buy some specified amount during some specified time period.
F- Information R- product recognition C-Confidence A-Attitude I-Intention PPurchase
When consumers evaluate a product, they also employ certain evaluative criteria,
which have several aspects:
1. The products attributes such as its price, performance, quality, and styling.
2. Their relatively importance to the consumer.
3. The consumers perception of each brands image.
4. The consumers utility function for each of the attributes.
These evaluative criteria come cross with the elements in the consumer behaviour
model. For instance, product recognition, attitude towards the product and confidence in judgement are the three parts in the buyers image of a product. They
all have vital impact on the consumers buying decision.

10

1. Customer analysis
Cultural

Environmental

Culture
Sub-culture
Social class

Economic cycle
Social pressure
Legal requirement
New technology

Social
Reference groups
Family
Roles and status

The buyer

Personal

Psychological

Age and life cycle stage


Occupation
Economic circumstance
Lifestyle and personality

Motivation
Learning
Perception
Beliefs and attitudes

Fig. 1.4: Factors influencing consumer behaviour.

1.1.4

Factors influencing customer buying behaviour

Various factors influence customer buying behaviour. Generally we could put


them into five categories: Psychological factors, Cultural factors, Social factors,
Personal factors and Environmental factors. 6 78
1.P sychologicalf actors
Human needs include the basic needs, like shelta, food and drink, and higher
level needs, such as friendship and achievement. People purchase goods to satisfy
their needs. The purchasing behaviour can be considered as the result of internal
and external stimuli.
Maslow (1943) has suggested that behaviour can explained by a hierarchy of
needs. He grouped peoples needs into five levels and argued that when a person is
satisfied with one level of needs, he will strive for another level of needs. Maslows
five levels of needs are Physiological needs, Safety needs, Social needs, Esteem
needs and Self-actualisation needs.9
Physiological needs are the basic needs for human being to survival, such as food
and drink. Only after these needs are satisfied, the other level of needs will be
6

WWW11
Bannes, E., etc., 1997, P139-149.
8
Environmental factors are external factors, while the other four factor categories are internal
factors that influence consumer buying behaviour.
9
Bannes, E., Mcclelland, B., etc., 1997, P139-184.
7

1. Customer analysis

11

desired.
Safety needs refers to peoples needs for security, stability and predictability. Services, such as insurance, guarantees, etc. are the products to satisfy humans
safety needs.
Social needs explain the humans desire of love and sense of belonging. At this
level, people will seek to join association and clubs.
Self-actualisation is the highest level of needs. It demonstrates itself in the search
of status, esteem, achievement and recognition. To satisfy this level of needs,
people turn to the luxurious products, like perfumes, high-tech products, cars,
etc..
Only after people achieve all these level of needs, they will then turn to the
realisation of their potential, which is expressed in concern for external issue, like
volunteer work.
2. Personal factors
Personal factors are the set of buyers personal characteristics, including age,
occupation, lifestyle, personality, and economic circumstances.
3. Cultural factors
Culture factors include culture, sub-culture and social class.
Culture is a set of shared values, which define peoples behaviour. Language is
the best example of culture difference. Not rightly using a language will cause
misunderstanding. And also there are attitude differences between eastern and
western culture towards family and individual.
A large society or culture is normally divided into subculture groups, which define
more subtle behaviour norms. Subculture groups include ethnic groups, religious
groups, racial groups and geographical groups etc.. They exhibit the difference
in culture preference, ethnic taste, attitudes, life style and taboos.
Social class is also called socio-economic group. It is decided by the income level,
education and occupation. The often-used social class model divides the society
into upper class, upper middle class, lower class, upper working class, working
class and others.
4. Social factors
Social factors includes reference groups, family, social role and status.
Reference groups are defined as all groups that have a direct (face-to- face) or

12

1. Customer analysis

indirect influence on the persons attitude or behaviour.10 Reference groups can


be divided into four types.

1. Primary membership groups are generally informal, and interact within the
members, such as family, neighbours, colleagues and friends.
2. Secondary membership groups are more formal than primary memberships,
and the interactions between members are less. These include religious
groups, professional groups, trade unions.
3. Aspirational groups are groups that one would like to belong to.
4. Dissociating groups are groups, whose values and behaviour are rejected by
the individual.

5. Environmental factors
Environmental factors consist of economic, social, political, technological aspects.
Economic cycle, social pressure, legal requirements, new technology all will influence consumers purchase decision on which product to buy and the way to buy
it.

1.2

Market Segmentation and Profiling

When firms try to sell their products in customer markets, they should not only
try to identify the factors that influence the customers black box, but also to
estimate whether there is enough number of customers who need their offer. It
is important for the companies to compare their capabilities and the objectives
of customers, so that they can decide whether they are able to serve the market
with appropriate products profitably. Therefore, firms must identify market need,
segment the total customer into potential customer groups, which are likely and
able to purchase the offer, and also position the product or service as attractive
alternative to other offers of the target groups.
10

Wilson, Gilligan and Person, 1994, P160.

1. Customer analysis

1.2.1

13

Market segmentation

Market segmentation is the subdivision of a market into distinct subsets of


customers, where any subsets may conceivably be selected as a target market to
be reached with a distinct marketing mix.11
Market segmentation is inspired by Kotlers Targeting marketing. As Kotler
said, that in target marketing, the seller distinguishes the major market segments, targets one or more of these segments, and develops products and services
tailored to each selected segments. 12
Because each individual has different preference, characteristics, taste and interest, their buying behaviour patterns are various and heterogeneous, it is almost
impossible or unprofitable for a company or single product to serve all of the
needs. Furthermore, the communication of marketing mix to a non-homogenous
group will also be inefficient. Therefore, the companies search for the groups
with attractive attribute, then concentrate on them to develop specific products,
services and to utilise specific marketing resources to gain the maximal market
return.
Segmentation identifies the subsets of buyers who share the similar needs and
demonstrate the similar buying behaviour. It subdivides a heterogeneous total
customer market into smaller, manageable and homogenous clusters by criteria.
The similar patterns of buyers needs and buying behaviour, which are identifiable
and relevant to the buying decision, exist in each cluster.
Customer segmentation brings major benefits to the companies:13
Efficiency
Because the customers are subdivided, companies could only focus on the
interested markets. Therefore, they could allocate and utilise their resources
more efficiently.
Effectiveness
Through segmentation, the needs of each customer segments could be better identified and examined. Thus, the understanding and awareness of the
customer needs could be enhanced. The companies could tailor their products and marketing measures to meet customer needs more effectively. Due
11

Kotler, 1995, p286.


Kotler, 1991, P262.
13
WWW29.
12

14

1. Customer analysis

Defining the market

Selecting the base for segmentation

Dividing the market and profiling


Fig. 1.5: The process of marketing segmentation.

to the improved marketing effectiveness, the response rate of customer will


also increase, thus, the return and profit from marketing investment will
also be improved.
New Market
Segmentation could help companies to identify the new market opportunities. The needs and characteristic of the total customer /market are so
various diverse that some unique feature of a small group are not distinguishable. After segmentation, company could discover those markets with
unique features. They could offer the valuable opportunities for companies
to enter new markets.

The process of market segmentation14


The process of market segmentation is composed of three steps.
1. Defining the market
The total market for a product or service comprise oses all of the consumers who
14

Bannes, E., McClelland, B., and Meyer, R, 1997, P181-185.

1. Customer analysis

15

Homogeneous demand
Consumers have relatively similar needs or desires for
a product or service category

Diffused demand
Consumersneeds and desires are so divers that
no clear clusters (segments) can be identified

Clustered demand
Consumersneeds and desires can be grouped
into two or more idenitifiable clusters (segments),
each with its own set of purchase criteria

Fig. 1.6: Alternative consumer demand categories.

desire or potentially desire it, and willing to and able to buy it. It is necessary
to analyse the market in terms of its size and pattern of demand.
There are three patterns of demand categories:

15

1. Homogeneous demand
All consumers in a market have similar needs and wants.
2. Diffused demand
Consumers needs are diverse and no clear segments can be identified. This
suggests the need for customisation.
3. Clustered demand
Consumers need and desires can be grouped into several identifiable segments. Each has its own set of purchase criteria.
2. Selecting the approach and bases for segmentation
Identification of market segmentation could be conducted based on detailed market research, or on basic analysis of customer data held within a company. Many
companies keep customer records detailing information such as age and gender.
15

Bannes, E., McClelland, B, etc. , P181-183.

16

1. Customer analysis

There are generally two types of methods for of market segmentation.16

17

1. A Priori methods:
In a prior approach, the basis for segmentation is set in advance. The primary
market research is not necessary. Thus, the analysis of second data resources,
the customer information at hand, manger intuition and other methods will be
employed to set the segmentation basis for the buyers according to their usage
patterns (heavy, medium, light and non-user), demographic characteristics (age,
sex, income) or psychographic profiles (personality). After the basis setting, a
research will be conducted to identify the size, location and potential of each
segment. The marketing decision will be based on which segment the marketing
efforts should be concentrated. For example, classification is a prior approach.
2. Post hoc methods:
Post hoc approach segments the market depending on the research finding, rather
than decides the segmentation basis in advance. The primary market research is
conducted to collect the classification and descriptor variables. Segments will be
defined only after all the relevant information is collected and analysed. The research might highlight the particular attributes, attitudes or benefits, with which
particular groups of customers are concerned. The result then becomes the basis
for dividing the market.
3. Dividing the market and profiling the segments
Based on the data gathered, the process of dividing the market into identifiable
market segments is carried out. The information obtained will give details regarding to the nature of customer segments. This is called segment profiling.
Profiling associates tapes each segment with certain characteristics, and aggregates the customer with similar characteristics into group and separates them
from those with different characteristics.
Criteria of customer segmentation
A market could be segmented in various ways. There are problems with segmentation, such as the relevance and quality of the data, intuition, continuous process
16
17

WWW31
Han, J. and Kamber, M, 2001, P281-319.

1. Customer analysis

17

and over-segmentation. A good segmentation should be relevant for buying behaviour and satisfy the following requirements:18 19
Size: the market should be big enough to guaranty a good segmentation.
It is dangerous to over segment an already very small market.
Difference: the difference between the member of the segments should exist
and could be measured through data collection approach.
Measurability: The company is able to collect information that measures
the nature of buying behaviour for the segmentation.
Substantiality: The selected segmentation should be profitable regarding to
the marketing mix resources designed especially for it.
Accessibility: The extend that the marketing effort could reach the segmentation.
Stability over time: The segmentation should last a certain period without
dramatic change in major features.
Responsive to communication means: The segmentation sensitive to the
marketing mix and communication means.
Variables for customer segmentation
Almost all factors which affect customers buying process and decision can be
used as the variables of customer segmentation. Generally the variables for
customer segmentation can be put into five categories: Demographic, Socioeconomic Grade, Psychographics and life style, Behavioural, Geographic and
Geo-demographics. 20 21
1. Demographic variables
Demographic variables categorise the market according to the population characteristics and population profiles. Customers are subdivided into groups based
on one or more demographic variables such as age, sex, religion, race, nationality,
family size and stage of family life cycle. For example, the custom seller groups
18

WWW20
Wilson, R. and Gilligan, C., 1997, P275.
20
Kalakota, R. and Whinston A. B..
21
McDonald M. and Dunbar I., P85-91.
19

18

1. Customer analysis
ACORN Group
A
B
C
D
E
F
G
H
I
J
K
U

Agricultural areas
Modern family housing, higher incomes
Older housing of intermediate status
Older terraced housing
Better - off council estates
Less well-off council estates
Poorest council estates
Multi-racial areas
High-status non-family areas
Affluent suburban housing
Better-off retirement areas
Unclassified

1981
Population
1, 811, 485
8, 667, 137
9, 420, 477
2, 320, 846
6, 976, 570
5, 032, 657
4, 048, 658
2, 086, 026
2, 248, 207
8, 514, 878
2, 041, 338
388, 632

Tab. 1.1: Broad- based ACORN classifications

%
4.3
16.2
17.6
4.3
13.0
9.4
7.6
3.9
4.2
15.9
3.8
0.7

23

customer regarding their ages. Like age of 20-30, this group are the customers,
who are more like to purchase trendy items.
2. Geographic and Geo-demographics
Geographic segmentation divides the market into different geographic units such
as countries, regions, counties, cities and postcode etc. Geographic system is
based on the proposition that the neighbourhood area in which you live will
be reflected in your professional status, income, life stage and behaviour. The
neighbourhood types are initially identified using national census data.
ACORN (A Classification of Residential Nneighbourhoods) is an example of geographic systems. ACORN classifies consumers into 43 demographic and behaviourally distinct clusters. The clusters are based on the type of neighbourhood,
socio-economics status and the buying behaviour and preference.22 A Broadbased ACON classification is conducted in Great Britain in 1981. It segments
the residents in Great Britain into 12 categories.
3. Socio-economic Grade
The buying behaviour is often influenced by the social class of a person The
factors include income, status, education etc. National Readership Survey scales
22
23

Kurs, M., Ryan, B., Lamb, G. etc., 2001.


Bannes, E., McClelland, etc., 1997, P201.

1. Customer analysis
Grade
A
B
C1
C2
D
E

Social Classification
Upper Middle Class
Middle Class
Lower middle class
Skilled working class
Working class
Subsistence level

19
Occupation
Higher managerial, professional or administrative jobs
Middle managerial, professional or
Supervisory or clerical jobs, Junior management
Skilled manual workers
Unskilled and semi-skilled manual workers
Pensioners, unemployed, casual or low grade workers

Tab. 1.2: National readership survey socio-economic groups

24

is one of the popular classifications, which and is based on the occupation of the
main wage earner of the household.
A further development of the life stages socio-economic grade model is SAGACITY, developed by Research Services Ltd.. This model combines life stages with
income and social class.
4. Psychographic variables
Psychographics attempts to classify individuals by their attitudes, personality
and life styles.
(1)Personality
Personality is used as variable to segment the market. The earliest segmentation
was conducted by Riesman et al (1950) in early 1950s. It identified three distinct
types of social characterisation and behaviour: 25
1. Traditional directed behaviour, which changes little over time and which as
a result, is easy to predict and is used as a basis for segmentation.
2. Other directness, in which the individual attempts to fit in and adapt to
the behaviour of the peer group.
3. Inner directness, where the individuals is seemingly indifferent to the behaviour of others.
(2) Attitude
Attitude includes the customers attitudes towards risk, degree of loyalty, the
24

Kurs, M., Ryan, B., Lamb, G. etc., 2001


Blois Keith, 2000, P389.
25
Wilson, Gilligan and Pearson, 1994, P291
24

20

1. Customer analysis

Life Cycle

Income

Occupation

White-collar

Dependent

Blue-collar
Pre-family

White-collar
Blue-collar

Family

Better off

White-collar
Blue-collar

Worse off

White-collar
Blue-collar

Late

Better off

White-collar
Blue-collar

Worse off

White-collar
Blue-collar

Fig. 1.7: SAGACITY.

1. Customer analysis

21

likelyhood of taking new products, etc. Many of the personality variables could
also use as the descriptor of the attitude.
(3) Lifestyle
The consumers behaviour is determined by the way we live our lives as well. It
arises from a complex relationship between our aspirations, surest situation, and
perception of self, income and attitudes. Life style market segmentation offers a
detailed view of buyers because it composes of numerous characteristics related
to their activities, interests and opinions. The life style consist mainly of three
dimensions: 26
1. Activities: Work, hobbies, social events, vacations, entertainment, club,
membership, community, shopping, sports.
2. Interests: Family, home, job, community, recreation, fashion, food, media,
and achievements.
3. Opinions: Selves, social issues, politics, business, economics, education,
products, future, culture.
5. Behavioural variables
(1) Benefit sought variables
This group of variables for segmenting customer considers the motive for a purchase. It groups consumers according to specific benefits that they seek in a
product. Even if two customers bought exactly the same products, the benefit
they expected may vary. Benefit segmentation is therefore based on behaviour
processes, involving thought and action, as opposed to age and socio-economic
class, which are defined according to individual characteristics. It closely identifies the customers needs and represents a powerful method of understanding and
influencing behaviour.
In applying for this approach, a company should begins by attempting to measure
consumers value systems and their perceptions of various brands within a given
product class. The information gathered is then used as the basis of marketing
segmentation. Benefiting segmentation begins by determining the principal benefits that the customers are seeking in the product, the kinds of people who look
for each benefit and the benefit delivered by each brand. For example, for teeth
26

McDonald, M. and Dunbar, I., 2000, P89.

22

1. Customer analysis

paste market, four segments are identified according to benefit: Seeking economy,
Decay prevention, Cosmetic and Taste benefits.
(2) User status
The market can be divided into five segments, according to user status: nonusers, ex-users, potential users, first-time users and regular users. First-time user
and potential users can be further subdivided on the basis of usage rate.
(3) Loyalty Status and Brand Enthusiasm
Loyalty status categorises the customers on the basis of the extent and depth
of their loyalty to particular brands or products. Most typically there are four
categories: Hard core loyals, soft-core loyals, shifting loyals and switchers.27
1. Hard core loyals are customers who consistently buy the same brands or
product.
2. Soft-core loyals are those who are willing to choose from a limited brand
set. Their Loyalty is divided among the limited brands or products.
3. Shifting loyals consists of consumers who shift their loyalty from one brand
to another. After they shift the brand, they will not buy the ex-brand any
more.
4. Switcher loyals are those who show no loyalty to any single brand. Their
buying pattern is typically determined either by the special offers available
or by their search for variety.
(4) Critical events
Major or critical events generate ones needs, which can be satisfied by the provision of a special collection of products and/or services. Typical examples are
marriage, the death of someone in the family, unemployment, illness, retirement
and moving house, etc..

1.2.2

Customer profiling

Customer segmentation and customer profiling are two elements of Customer Relationship Management (CRM). Customer Profiling is performed after customer
segmentation. Customer Profiling is to locate clusters within the customer file
that outperform the average.28 It creates customer segment profile, which labels
27
28

Wilson, Gilligan and Pearson, 1994, P291.


WWW18

1. Customer analysis

23

the customers with their attributes.


Identifying the characteristic of the customers helps the company to decide which
segments will respondse best to their marketing effort. When companies get
clearer overview about the attributes and demands of the customer segments,
they could then decide what action and what resource should be taken and located
to the selected customer segments. Furthermore, according to pre-built models,
customer profiling can also be used to find potential customers and delete inactive
or bad customers.
The profiling attributes are similar as the segmentation attributes. For example,
the profiling attributes include: Geographic, Cultural and e and ethnic, Economic
conditions (Incomes and /or purchasing power), Age, Values, attributes, beliefs,
Lifestyle Knowledge and awareness, Lifestyle, Media, Recruitment method. For
acquired customer, the variable of customer behaviour could also be employed as
profiling variables, such as shopping frequency, complaining, frequency, satisfied
degree of satisfaction and preferences, etc.

1.3
1.3.1

Market targeting and Positioning


Market Targeting

The next task after customer segmentation and profiling is market targeting.
Companies choose one segment or several segments as the target market. The
target market is the market that company decides to serve. Specific marketing
mix and resources will be developed to serve the target market.
The companies normally adopts on e of the three targeting strategies:29
Undifferentiated strategy: Company ignores the difference between each customer segments, and regards the whole market as a single market. Single
marketing mix is adopted for the whole market. This is the so called mass
marketing.
Differentiated strategy: The whole market is divided into several segments.
The company develops different marketing mix for different segments.
28
29

Keith Blois, 2000, P398.


Amstrong, G.and Kotler, P., 2002, P255-258.

24

1. Customer analysis
Undifferentiated Strategy

Organisation

Marketing
Mix

Entire
market

Concentrated Strategy
Segment 1
Organisation

Marketing
Mix

Segment 2
Segment 3

Differentiated Strategy

Organisation

Marketing Mix 1

Segment 1

Marketing Mix 2

Segment 2

Marketing Mix 3

Segment 3

Fig. 1.8: Targeting strategies.

Concentrated strategy: The company chooses one or several market segments, but only take the single marketing mix. Under this strategy, the
company tries to have a high market share in one or several niches markets,
instead of struggling to have a small share in the whole market. For the
firms with limited resource, this strategy is very appealing.

1.3.2

Positioning

The purpose of target marketing is to focus on the selected target market, finetune the market mix to provide a group of potential customers with superior
value, therefore, to build up unique position of product in the customers view.
A products position is the complex set of perceptions, impressions, and feeling
that it induces in consumers, compared with competing products.30 Positioning
refers to the how customer think about proposed and /or present brands in a market. 31 The fundamental idea of positioning is competitive advantage. 32 Through
30

Bannes, McClelland, Meyer and Wiesehofer, 1997, P230.


WWW33
32
WWW30
31

1. Customer analysis

25

the differentiated market mix, the special needs and demands of customers could
be satisfied. Thus, the customers will view the product or brand as superior to
the others, and place the product or brand with a distinct position. To position
a product, the marketer must appeal to the target customers strongly with its
strength and differences using proper marketing mix.

2. Data Mining
Data mining, which is also known as Knowledge Discovery in Database KDD,33
is a powerful new technology, which help company to identify the important
information among the sea of data. Data mining technology is commonly used
for customer analysis.
Fayyad defined data mining as a non-trivial process aimed at identifying, valid,
novel, potentially useful and ultimately understandable pattern in data.34 While
Grameier and Rudolph consider data mining in terms of all methods and techniques, which allow to analyse very large data sets to exact and discover previously unknown structures and relations out of such huge heaps of details. These
information is filtered, prepared and classified so that it will be a valuable aid for
decisions and strategies.35
Data mining extract the implicit, previous unknown and potentially useful data
from the data in order to automate the process of discovering the significant
pattern and trends.

2.1

The process of Data mining

The process of data mining could be summarised in as the four stages: Data collection and selection, Data preparation, Data mining, and Result interpretation.36
37

2.1.1

Data Collection and Selection

The Ways of data collection include:


In-house customer database: Companies normally keep records of customers. The information of customer could be gathered from mailing list,
receipt, memberships, warranty registrations, etc.
33

Kotala, P., Perera, A., Kai Zhou, J.,ect.


Fayyad, U., Piatetsky-Shapiro, G. et. al., P6.
35
Grameier, J., and Rudolph A..
36
IBMs Data Mining Technology, 1996
37
Bounsaythip, C. and Rinta-Runsala, E., 2001
34

26

2. Data Mining

27

External resource: There are resources, from which one could obtain information such as demographic information.
Research survey: The often-used way to collect particular information is
to conduct a survey. The survey could be conducted through face-to-face
interview, telephone interview, and postal questionnaire or via Internet.

During the collection of data, two types of variables should be collected:38 Classification Variables classify the data set into groups. Most demographic, geographic, psychographic or behavioural variable can be used to classify customer
into segments.

Demographic variables: Age, gender, income, ethnicity, marital status, education, occupation, household size, length of residence, type of residence,
etc.
Geographic variables: City, state, zip code, census tract, county, region,
metropolitan or rural location, population density, climate, etc.
Psychographic variables: Attitudes, lifestyle, hobbies, risk aversion, personality traits, leadership traits, magazines read, television programmes
watched, etc.
Behavioural variables: Brand loyalty, usage level, benefits sought, distribution channels used, reaction to marketing factors, etc.

Descriptor variables are variables used to describe and distinguish each subgroup from each other in a data set. We could say that the descriptor variables
stand for the characteristic of the represented data set. Descriptor variables must
be easily obtainable variables that already exist in or appended to the customer
files. Many classification variables could be used as descriptor variables.
The data is normally stored in a data warehouse. As the data warehouse contains
all diverse types of data, so that to conducting data mining, the data that will
be used in analysis should be selected in the first step.
38

WWW7

28

2. Data Mining

2.1.2

Data Preparation

Before data can be analysed, the original collected data must be prepared first
prepared in order make to let it suitable for the analysis. Data preparation
consists of the following stages:
1. Data cleaning:

Check out abnormal, out of bounds or ambiguous items.


Strip out unwanted fields or items. Some attributes are useless for analysis
purpose, such as version numbers, email address, etc.
Resolve inconsistent data formats, data encoding, geographical spellings,
abbreviations and punctuation
2. Data description

Supply meta data such as row or value counts or variables


3. Data Transformation:

Convert string variables into numeral or numeric categorical variables, or


interpreting or replacing codes into text.
Check missing values. Delete or replace them by default values.
Add computed field as input or target.
Combine data from multiple sources under a common code.
Identify Find out multiple used fields that are multiple times.
Convert continuous variable into category variable for some methods.
Convert nominal data into metric data.

2. Data Mining

29

4. Data Sampling39

Required for training or model building


5. Data pruning

Identify dependent, independent and correlated columns or variables

2.1.3

Mining

At the mining stage, various techniques could be used to extract the valuable information from the final prepared data. For example: To create an accurate, symbolic classification model to predict whether a reader will continue to subscribe
for a newspaper. First, clustering technique should be conducted to segment
the subscribers database; then, the rule is introduced to create a classification
model automatically for each desired cluster, through which one could predict
the behaviour of a customer.

2.1.4

Result Interpretation

Result interpretation is not only to visualise (graphically or logically) the output


of data mining, but also to filter the information and identify the most valuable
and proper result, which will help in the decision making. If the interpreted result
is not satisfactory, the data mining stage or even the whole data mining procedure
should be repeated. The final extracted information must be comprehensible.

2.2

The Aspects of Data Mining

Data mining could be distinguished between the aspects of applications, operations, techniques and algorithms.40 41
39

Ferguson, Mike
WWW 4
41
IBMs Data Mining Technology, 1996
40

30

2. Data Mining
Applications

Operations

Techniques

Database marketing
Customer segmentation
Customer retention
Fraud detection
Credit checking
Web site analysis
Prediction and classification modelling
Link analysis
Database segmentation
Deviation detection
Supervised Induction
Clustering
Association discovery
Sequence discovery

Tab. 2.1: The aspects of data mining

2.2.1

Applications

Data mining is widely used in customer analysis and marketing. The following
areas cover the main application of data mining.42
Customer segmentation: Data mining tools automate the process of find predictive information in large database. The companies, especially the retailers,
banks, are interested in knowing if there are sub-group customers who exhibit
certain characteristics. They could use data mining to clustering the customers,
discover interested groups. For example, companies use data mining to analyse
the historical mailing list in order to find out the high return to investment group,
so that they could determine the new mailing target groups. Banks and credit
companies classify the credit scoring to identify the customer segments, which
has lower risks.
Relationship management: Data mining discovers and identifies the previous
unknown relationships hiding in the data. The buying patterns of a customer
are of interested to by the retailers and advertisers. Combined with customer
segmentation, data mining could help them to find out the relationship between
the purchase of product items, and customer types, or to improve the conduction
of a advertisement campaign on special media for specific group of customers.
42

Carbone, Patricia L.

2. Data Mining

2.2.2

31

Operations

Predictive and classification modelling: Predictive model uses the contents


of database, which reflect historical data to automatically generate a model
that can predict a future behaviour. Classification sub-divides a data set
according to number of special outcomes. The goal of modelling operation
is to create the generalised character characteristics description for the data.
For instance, a marketing executive may be interested in predicting whether
a particular consumer will switch to a new product.
Link analysis: The goal of link analysis is to establish the relationship
between the records in database. The retailers want to know which items
will be purchased by a customer together in order to make decision in the
items layout and goods purchasing. For instance, if it is found that customer
will buy a CD after the purchasing a CD Player, then the store manager
should decide to put the CD counter close to the CD player counter.
Database segmentation: The database often contains various types of data,
so that it is often necessary to segment the data into small groups with
related records. The purpose could be either to obtain a general description for each collection or to prepare for a further analysis, such as model
creation or link analysis. Suppose the store manager wants to know the
combination of goods purchased by customer in a particular visit period.
The database could first be segmented according to time period attribute,
such as Christmas sale. Then the link analysis could be conducted to
find out the relationship between the combined goods.
Deviation detection: The aim of deviation detection is to identifying the
outlier in a particular dataset whether its presentation is due to noise, impurities or causal reason. This operation is opposite to database segmentation, and is often carried out together with segmentation. Because outliers
express the deviation from some known expectation and norm, therefore,
deviation detection often is the source of true discovery.

2.2.3

Data Mining Techniques

Numerous techniques support the operations of data mining to find the desired
groups or relationships.

32

2. Data Mining

Classification and predictive modelling is supported by supervised induction techniques. Clustering supports database segmentation. Association discovery and
sequence discovery are used for the link analysis. The deviation detection is
supported by statistical techniques.
The desired relationships to be discovered by data mining are:43
Classes: in which the data items is located into predetermined groups.
Clusters: in which the data items are grouped by logical relationships.
Associations: data is mined to identify associations.
Sequential patterns: data is mined to anticipate the behaviour patterns and
trends.
Supervised Induction
Supervised induction is the process to automatically create a classification model
from a sets of records (example)44 , which is called the training sets. The records
in the training set must belong to a set of pre-defined classes. Each class has a
distinguishable pattern, which is generated from the existing records. Once the
model is set up and induced, a new record could be automatically put into a class
according to its pattern.
Supervised induction contains steps of classification and prediction to put elements into ppredetermined erformed groups according to some criterion. The
numbers of subgroups and the feature of each subgroup are defined at beginning.
Then, the feature of the observation will be compared with the criterion and then
be put into corresponding ed group.45 This is usually done in two steps:
Step 1: Build a model to describe the predetermined data set groups or
classes. The model contains a set of classification rules (labels).
Step 2: If the accuracy of the model or classifier is acceptable, the model
can be used to classify the new unlabeled data groups or elements.
Clustering Clustering is a method of grouping data elements into homogenous
groups. It divides a heterogeneous data set into disjoint sub-groups, so that the
elements in any ner one cluster is highly similar, while the elements in different
43

Chung, H. M., Gray, P. and Manino, M., 1998


IBMs Data Minging Technology, 1996.
45
Han, J. and kamber M., 2001, P279-325
44

2. Data Mining

33

clusters are with highly dissimilarity. Clustering is an unsupervised technique and


is employed when you wan to find groups of similar records without any preconditions. The elements inside a cluster are highly similar to each other, while the
elements between clusters are highly dissimilar according to some criterion. The
difference between clustering and classification is that in clustering, the numbers
of subgroups and the features (label) of each subgroup are unknown in advance,
while in classification, the numbers of subgroups and the feature of each subgroup
are defined at the beginning.
Cluster analysis has two steps:46
Choose a proximity measure
A proximity measure decides the similarity or closeness of objects. The
homogenous objects are more similar and closer.
Choose a clustering strategy
In this step, the clustering algorithm and/or initial parameters are decided.
According to the chosen proximity measure and method, the whole data
set is divided into groups (clusters). The elements within a group should
be as closer as possible and the dissimilarity between groups should be as
large as possible.
After the clusters are built, normally some descriptive methods could will be
employed to describe each cluster in order to get a comprehensive overview of the
dissimilarity between clusters.
1. Proximity measure
The commonly used proximity measures include Jaccard, Tanimoto, Simple
Matching, Minkowski Kulczynski and Euclidean distance.
2. Clustering strategy (method)
The clustering methods generally belong to several major family:47

1. Hierarchical algorithms
2. Iterative partitioning
3. Density search
46
47

Hardle, W. and Simar, L, P295-313.


Aldenderfer M. S. and Blashfield, R. K., P35.

34

2. Data Mining
4. Factor analytic
5. Clumping
6. Graphic theoretic

Here we only discuss two basic clustering algorithm methods: Hierarchical algorithms and Iterative partitioning algorithm.
(1) Hierarchical algorithms
Hierarchical clusteringc can be performed using algorithm is composed of two
main types different of procedures: Agglomerative procedure and Splitting procedure.
Agglomerative procedure starts from the finest partition. It considers each
observation as a cluster, then puts groups together to form new clusters.
At each stage in the procedure, the number of clusters is reduced by one,
by through the joining or fusing two groups into one, which are considered
to be the closest or most similar groups. Aggolomerative algorithm is a
frequently used procedure. It contains the following steps:48 49
1. Construct the finest partition. Normally each observation is a group.
2. Compute the distance or dissimilarity matrix.
3. Find out the closest or most similar groups.
4. Put the two most similar groups together to form a cluster.
5. Computer the distance or dissimilarity between the new groups, get a
reduced distance or similarity matrix.
6. Repeat the step 3 to step 5, until the optimal clusters are formed.
Splitting procedure is opposite to the agglomerative procedure. It considers
the whole data set as a cluster to start with, then splits the cluster into sub
groups to form new clusters.
The linkage for Agglomerative algorithm There are many linkages to measure the proximity or similarities of elements and groups. The frequently
normally used linkages are:
48
49

Mardia, K.V., Kent, J.T. and Bibby, J.M., 1979, P360-390.


Everitt, B. S. and Dunn, G., 1991, P99-126.

2. Data Mining

35

Single linkage defines the smallest distance of individual as the distance of


two groups.
Complete linkage is opposite to the single linkage, defines the largest distance of individuals as the distance of two groups.
Average linkage (non-weighted and weighted) computes the average distance.
Centroid linkage uses the natural geometrical distance as the distance of
groups.
Median linkage chooses the median of individual distances as the distance
of groups.
Ward Linkage is related to the centroid linkage, but it uses rather an interia distance rather than a geometric distance.
(2) Iterative Partitioning algorithms
Partitioning algorithms starts with given groups. Then the elements exchange
between groups until the highest homogeneity within groups and highest heterogeneity between groups or some criterion is reached.
The iterative partitioning algorithms are normally undertaken according to the
following steps :50
1. Begin with an initial partition of a chosen certain numbers of clusters.
Compute the centriods of these clusters.
2. Allocate each data point to the cluster that has closest centroid.
3. Compute the new centroids for new clusters. The clusters are not changed
until a complete pass through of the data.
4. Iterated the steps of (2) and (3) until no data points change clusters and
reach the highest similarity inside the cluster.
Association rule discovery
Association rule discovery is an iterative approach, also known as level-wise
search. Association rule methods try to discover interesting relationships between the items in data and identify the customers behaviour patterns. The A
typical association rule example is the Marketing basket analysis. This analysis
tries y to find out when the customers do shopping, what kinds of products are
50

Aldenderfer M. S. and Blashfield, R. K., P45-49.

36

2. Data Mining

more likely to be put into the shopping basket together. Through this analysis,
retailers are able to identify which items are frequently purchased together by the
customers.
An association rule is the relationship of the form X Y , where X is the
antecedent item set and Y is the consequent item set. For example: customers
who purchased item X are very likely also to purchase item Y at the same time.51
There are two measures for each rule: support and confidence.52
Support (or prevalence) indicates the occurrence frequency of an itemset.
s(A B) = P (A B)
Confidence (Certainty or Predictability) measures the validity of the pattern. It indicates, denotes how strong the strength of the relationship between the items, and to what degree an item depends on the others.
For example: Among the customers who buy computers, only 5% customers are
students. and buy laptop. But if a customer is also a student, the possibility
of his buying a computer is 20%. In this rule: 5% is support and 20% is the
confidence.
Two other important measures for association rule discovery are: Expected confidence - the possibility of an items purchasing regardless what other items haves
been bought together. For instance, customers buy a computer 40% of the
time, 40% is Expected confidence.
Lift - refers to the difference between the confidence of a rule and the expected
confidence, either in the form of absolute difference or in the form of ratio. When
Lift is negative or less than one, it means the itemset of the rule are unlikely to
happen or two products are unlikely to be purchased at the a same time.
The goal of association discovery is to find out all the associations with s% support
and c% confidence in the data of transaction.
1. Data format
Two types of format are used to form the data for association discovery:
1. Horizontal format: each entry as a row, each attribute is a column.
51
52

Kotala, P. K, Perera, A., Kai Zhou, J., etc., 2001


WWW4

2. Data Mining

37

2. Vertical format: Only one column for attributes. Different entries are denoted by different ID. Attributes belonging ed to the same entry will be
assigned the same ID number.

2. Apriori Algorithm
The most often used algorithm of association rule is called Apriori algorithm. It
uses the prior knowledge of itemset features to explore their further associations.
The steps are as following:

Step 1: Set percentage of support and confidence as s% and c%.


Step 2: Find out all the items with frequency percentage above the set
minimal support.
Step 3: Generate the association that have the same or higher set confidence
level based on the set of frequent items.
Step 4: Scan all the items to identify all the items with , which at have at
least s% support.
Assign them as L1
Step 5: Form item pairs from L1 , assign these candidate set as C2 .
Step 6: Scan all the item pairs to find all the pairs in C2 at least with s%
and c% confidence. Denote Let these sets as L2 ;
Step 7: Iteration: Do Step 5 and Step 6 iteratively, until there are no more
sets satisfying the constraints.

The general description for Step 5 and Step 6 is:

Build sets of k items from Lk1 , let it to be Ck .


Scan all transactions and find out all frequent set in Ck with at least s%
support and c% confidence level, let it be Lk .

38

2. Data Mining

Sequential pattern discovery


Sequential pattern methods can be seen as an extended association rule method
that analyses the sequenced data. It extends association by adding time to the
transactions. For each transaction, there is a transaction time. Therefore, not
only the attributes of each transaction, but should be considered the , time when
of the transaction took place happening should also be taken into account. Sequential analysis searches temporal links between items, rather than relationships
between items in a single transaction.53
Sequential ce pattern method can find out the relationship patterns between the
items or itemsets in a time episode. For example, a typical sequence pattern
could be Six percent of customers who bought a CD player bought a CD within
a week.
1. Data format
To start a sequential pattern discovery, each time series is converted into a multiitem entry and duplicated items are deleted. Afterwards, the association rule can
be used. The constraints of sequential pattern that are all sequential patterns
satisfy the customer specified minimal support.
The sequential data is composed of sequences, or customer sequences. Each
sequence is a list of customer orders. Each transaction contains a set of items.
The length of a sequence is the number of itemsets that are contained in it. A
sequence of length k is call k-sequence.
2. Procedure
Sequential pattern discovery could be conducted by using the following steps:

54

Step 1: Sort phase. Sort he database according to customer id and transaction id.
Step 2: Itemset phase. Find all large sequences of length 1.
Step 3: Transformation phase. Transform each item in the sequence into
integer.
Step 4: Sequence phase: Find all large sequences.
Step 5: Maximal phase: delete all non-maximal sequences.
53
54

Wojciechowski, Marek
Han, J and Kamber M, 2001, P225-271.

3. XploRe user and customer analysis55

3.1

About XploRe

XploRe is a professional statistical software for high-end statistical analysis, advanced research and interactive teaching. It was developed in 1999 by Prof. Wolfgang Hardle and his team at Humboldt University of Berlin, Germany. XploRe
is a module structured, command driven software. The statistical methods of
XploRe are supported by various libraries. Therefore, one can incorporate his/her
ones own methods in XploRe and easily extend the environment. The competitive
advantage of XploRe lies on rather advanced methods, particularly smoothing.
The purpose of XploRe lies in the exploration and analysis of data. According to
Prof. Hardle (1999), it aims at sophisticated users who are looking for a flexible,
programmable statisticals package with emphasis on more advanced procedures.
The Internet is currently the main marketing instrument of XploRe. A free trail
version with limitations of XploRe (with limitations) could be downloaded from
the net.

3.2

3.2.1

XploRe user(2002) and customer descriptive analysis


Data collection

XploRe user data collection


XploRe users refer to the XploRe downloaders, who have downloaded XploRe
from the website. They are the potential customers of XploRe.
The collected raw data of XploRe users consists of 1734 profiles of individuals
who have downloaded the statistic software XploRe from October 11, 2001 to
July 22, 2002. The data was collected through an online survey. A free trail
version of XploRe could download via the homepage http://www.xplore-stat.de.
55

User refers to the person who downloaded XploRe from Internet, while Customer refers
to the person who bought XploRe.

39

40

3. XploRe user and customer analysis

All trial versions of XploRe (except for the Linux local version) do not include all
function and commands of XploRe, will expire after two months, and are limited
to 1000 observations. The Linux local version has no expiration date and no limit
on the size of observations.

Fig. 3.1: Sample of online survey questionnaire.

Before the downloading, users are asked to participate in an a online survey.


The online questionnaire composes mainly has two parts. All questions (except
for E-mail address) are answered by selecting from a set of items from possible
responses.
The first part of the questionnaire is Personal information, in which the information about personal identity and preference are inquired. Some questions in this
part, such as e-mail address and country, ask for the personal identity of downloaders identity. We call them Identity questions. The other kind of questions
inquire about the preferences of downloaders, such as the way they learnt about
XploRe, the work place where they use XploRe, the software they currently use,
and the statistical methods they look for in XploRe, etc.. The answers to these
questions are important to reveal the preferences of users and play a prominent
role in user analysis. We call these questions substantive questions, because
they provide the basic factors needed to subdivide the total user group into small
homogenous groups for our statistic user analysis.
The second part of the questionnaire are contains technical questions. The

3. XploRe user and customer analysis

41

downloaders are asked to choose the preferred versions of XploRe56 and the operating system, on which XploRe will be installed, such as Windows, Linux, Sun
etc.. An example questionniare is attached in the Appendix.
During downloading, the date and IP-address are automatically recorded. They
are very helpful in in data cleaning procedure.
XploRe Customer data collection
XploRe customer here refers to who haves actually bought XploRe. I call them
also call them actual customers. The data of XploRe customer is collected
through registration forms, which are sent to customer together with XploRe.
The return of the registration form is not compulsory. The customer data is from
1 July 2000 to 30 August 2002. Because of the change in registration form, the
data after this date was not used. In the Appendix, the new registration form is
attached for the reference.
The registration form includes the questions about the identity of the customer
like country, language and the questions about their fields, as well as the operating
systems.
As a the result, we get 8 variables of customer data: country, federal state (Germany), language, title, operating system, profile sector, profile branch and sex.

3.2.2

Data cleaning and preparation

A analysis based on poor quality or wrong data could deliver erroneous results
no matter how sophisticated the statistical method is. Therefore, the raw data
are thoroughly cleaned before using them for analysis.
XploRe user data cleaning
When people download XploRe, obviously they would like to complete the download process as quick as possible and answer the question as promptly as possible.
If the questionnaire is too tedious or too complicated, the downloader may get
impatient so that they give wrong or incomplete answers. In addition, in survey
56

XploRe has three versions: Local version, Java-Client version and ReX, which is a Excel
add-in.

42

3. XploRe user and customer analysis

it often happens that the questionees are not very serious about the answer and
dont give actual information.
To avoid including the false information into the data, I used the personal questions as the indicators for the degree of seriousness to the questionnaire and the
possibility of false answers. Many people gave obviously wrong answers to the
personal questions. I assume that, if people gave false answer to the personal
questions, they would give false answer to substantive questions as well. Furthermore, according to the given IP addressed, the suspicious observations were
inspected and then deleted according to a set of criteria.
The cleaning process was carried out mainly automatically by Excel Visual Editor.
However, the whole process of data cleaning could hardly be carried out fully
automatically. Therefore, the manually cleaning work was also taken to delete
the false information that the computer program could not identify, for instance,
the matching of IP address and the deletion of the profiles of those from XploRe
team. At the end, there was 1181 profiles for analysis after the cleaning.
XploRe customer data cleaning
The cleaning procedure of customer data is relativelyly simple. We suppose that
the customer knows their answer will help XploRe to improve its service, therefore, they intend to provide right information. The cleaning process, therefore,
only include the deletion of doubled customer information.

3.2.3

Data descriptive analysis and result

In the first step, the descriptive analysis was conducted with XploRe to give an
overview of the data.
XploRe User descriptive analysis
From the Table in Appendix 1, XploRe user frequency analysis, we can see the
frequency and percentage of each variable.
Concerning the resources of getting to know XploRe, WWW/Newsgroup are
the main resource. 42.9% of the downloader first learn about XploRe through
Internet. The second main resource is Publications and Journals, 20% users use
these channels to know about XploRe.

3. XploRe user and customer analysis

43

49.4% of users work in a university, and 9.1% of users work in research institute.
The users from Private, Non-research Company have a percentage of 6.6%. The
interesting point is that a high percentage of users work at home. With 28.9% of
the users, this group is the second biggest group in this category.
Excel is the most popular software, which is used by 25.1% of total users. The
next are SPSS and MatLab, with 11.2% and 10.4% of users respectively. XploRe
is a command driven software, competitive in rather advance statistical methods.
The software such as S-Plus and GAUSS have more similar feature and scope
with XploRe, their users comprise 5.5% and 4% of the total respectively. This
fact shows that most users are more likely to choose more standard software
such as Excel and SPSS, because of the higher programming requirement and
difficulties in using a programmable matrix oriented software like XploRe. But
the relatively high percentage of MatLab user underlies a sign for opportunity for
XploRe because MatLab is also a program-oriented software. There is chance for
XolpRe marketing to get this type customer.
A great part of XploRe users work in the field of Econometrics. The other popular work fields are Mathematical Statistics, Finance and actuarial science, and
Physics and engineering. Each consists of about 10% of users.
The most often used statistical methods, corresponding to the users work, are
time series, followed by Basic statistics, Multivariate methods and Linear models.
But regarding to the methods that the users look for in XploRe, there are some
differences. The most wanted statistical method are Time series and Multivariate
methods, while Non- and semi- parametric methods, Graphics and exploratory
data analysis are ranked as the third and forth most wanted methods, respectively.
This difference indicates that the existing statistical software are weak at Nonand Semiparametric methods and Graphic/Exploratory methods. Therefore, the
users try to discover more powerful instrument related to these two methods.
XploRe could emphasis its strength in these two analysing methods, thus, expand
its customer base.
86.5% of users downloaded the local version of XploRe, 9.3% downloaded ReX
version of XploRe, which is a statistical Microsoft Excel 2000 add-in. Only 4.1%
of users downloaded the XploRe - Java - Client version.
Windows-NT is the dominant platform of local version with 84.1% of users. Linux
is also relativelyly popular, 13.2% of users downloaded XploRe Linux version.
Concerning Client version, windows- NT is still the dominant platform. Linux
only account for 6.1%. Other platforms account for very small fractions.

44

3. XploRe user and customer analysis

Name
First Learn
Work Place
Software
Work Field
Method Used
Method Looked for
Xversion
Platform L
Platform C
OS Platform
Country
Continent

Type
Categorical
Categorical
Categorical
Categorical
Categorical
Categorical
Categorical
Categorical
Categorical
Categorical
Categorical
Categorical

Modal Value
WWW, Newsgroup
University
Excel
Econometrics
Time Series
Time Series
Local
Windows NT
Windows NT
Windows NT
Germany
Europe

Modal Freq.
42.9%
49.4%
25.1%
24.1%
18.7%
17.3%
86.5%
84.1%
87.8%
84.2%
16.9%
52.7%

No. of Values
5
6
17
10
12
12
3
4
4
4
77
4

Tab. 3.1: Summary and decription of the varibale of User 22/07/02 data

XploRe Users are with various national backgrounds. Users from Germany
(16.9%), USA (15.7%) and Japan (8.6%) consist of half of the population.
More than half users are from Europe, 52.7%. The following are America and
Asia-Pacific, with 24.5% and 20.5% respectively. The reason might be that
XploRe origins from Germany. The information and marketing are more active
in Europe than in other areas.
Since the variables are categorical, we could draw a picture of the typical user of
XploRe. The modal user of XploRe is some one who is from Germany, works in
a university, learnt about XploRe through Internet. He uses excel as the main
software for statistical, and he works in the field of econometrics. Time series are
his main analysis method, and he looks for the software that performs better in
Time series methods. He downloads the local version of XploRe and windows-NT
is his platform.

XploRe Customer descriptive analysis


The result of the descriptive analysis of XploRe customer is summarised in the
Table of Appendix 2.
The customers of XploRe are come mostly from Germany, which compose 34.4%
of the total customers. Customers from USA are the second biggest group, with

3. XploRe user and customer analysis


Name
State
Federal State
Sex
Language
Title
OS Platform
Sector
Branch

Type
Categorical
Categorical
Categorical
Categorical
Categorical
Categorical
Categorical
Categorical

Modal Value
Germany
Baden-W
urttenberg
Man
English
Prof.
Windows
Research Institute
Economics

45
Modal Freq.
34.4%
3.1%
21.9%
18.8%
9.4%
31.1%
34.4%
9.4%

Missing value
3.1%
84.4%
0.0%
59.4%
78.1%
68.8%
62.5%
78.1%

Note: 1. Federal state refers to the states of Germany


2. Federal state has no modal value, because all the value have the
same percentage (3.1%).

Tab. 3.2: Summary and descripiton of the variables for customer data

percentage of 25%. The following are Japanese customers, 9.4%. The customers
from Italy consist of 6.2% of the total customers. There are customers from
Denmark, France, Norway, The Netherlands, UK, China and Taiwan, they each
have 3.1% percentage of the customers. Therefore, Europe is the main customer
market of XploRe, followed by America and Asia.
78.1% of XploRe customers are men. Women have a relativelyly lower percentage,
only 21.9%. This is in correspondence with the facts of the XploRe users.
English is the main language used among the customers, followed by German,
French and Italian.
The customer of XploRe are highly intellectual, 21.8% of them own the title of
Prof., Dr, or Prof.Dr..
34.4% of customers work in research institutes. 3.1% of them work in companies.
Windows is the most popular platform. 21.3% of the customers use Windows as
their computing platform.
The professional fields, in which the customers work, are diverse. Econometrics
has a higher percentage of 9.4% among them. The other professional fields indicated in the data are statistics, biostatistics, mathematics and computer science.
Because of the high appearance of missing value, we could only get a vague
image of the customer of XploRe. The model customer of XploRe is some one
from Germany, works in research institute, and speaks English. He is likely to

46

3. XploRe user and customer analysis

have the title of Professor or Dr., and works in the field of econometrics and use
windows as platform.

3.2.4

Comparing the user and customer of XploRe

From the above analysis, we now can have a vision of the user and customer of
XploRe. What is the relationship between them? How to change the user into
actual customer? Which marketing instruments should be employed? And how
do we stimulate this change? All of these questions should lie in the mind of
XploRe marketer.
The results show that the feature of the XploRe user and customer are quite
similar, such as the origin, computing platform, and field of work etc. However,
differences exist as well.
The customers from Germany have a much higher percentage of the total customer than the Users from Germany. This indicates that Germany is the main
market for the XploRe.
The XploRe users come mainly from university, but the customers are mainly
from research institutes and companies, especially from research institute, which
consist 34% of purchaser. This indicates that research institute and private company could be a active target market that could provide a high turnover than
other markets. Therefore, the further work should be carried out to determine
the needs of customers from research institutes and private companies, and to
which marketing instruments they are sensitive.
Due to the deficit of customer data, further analysis is constricted. Thus, the work
to build a quality data bank of customer should be placed in the top agenda of
XploRe marketer.

3.2.5

Measures of Improvement

To impove the situation, the measure of collecting high quality customer information should be conducted, because only through the analysis of the actual
customer behaviour, the marketer is able to understand them and thus determine
the right action to reach them.
One possible way could be to reform the registration form, delivering questionnaires together with the products, or conducting follow up suvery via telephone

3. XploRe user and customer analysis


User
Country
Germany
USA
Japan
Work Place
University
Home
Research Institute
Private Company
Work Field
Econometrics
Statistics
Finance & Actuarial Sc.
Platform
Windows
Missing

16.9%
15.7%
8.6%
49.4%
28.9%
9.1%
6.5%
24.1%
11.9%
11.0%
84.2%
68.8%

47
Customer
Country
Germany
USA
Japan
Sector
Research Institute
Company
Missing Value

34.4%
25.0%
9.4%
34.4%
3.1%
62.5%

Branch
Economics
Missing Value

9.4%
78.1%

Operation System
Windows

31.3%

Tab. 3.3: Comparison of XlopRes Users and Customers

etc. Certain promotion measures should be conducted together, such as rewards,


discounts, to encourage the feedback from the customers.

3.3
3.3.1

Cluster analysis for XploRe user data 2002


Cluster analysis of categorical data

Cluster analysis organises data by abstracting underlying structure either as a


grouping of individuals or as a hierarchy of groups. The representation can then
be investigated to see if the data group according to preconceived ideas or to
suggest new experiments. 57
Cluster analysis groups the data into disjoint sub-clusters, so that the inner cluster
similarity is maximised and the between clusters similarity is minimised. Therefore, the individuals inside one cluster are very similar, while the individuals in
different clusters are very different. The similarity or dissimilarity of observations
57

Anil K. J., Richard C. Dubes, 1988

48

3. XploRe user and customer analysis

is measured by distance functions. The distance between two observations could


be calculated according to selected distance function. The clustering algorithm
strategy decides the method to building up groups depending on the distance.

The characteristics of categorical data


Clustering of categorical has attracted some attention recently. Because of the
nature features of categorical data, it is a challenge to cluster categorical data.
The features of categorical data could be summarised as following:58

59

The distance functions of categorical data attributes60 are not naturally


defined. It is easy to reason that one colour is similar or dissimilar to
another colour using real numbers.
Categorical data have no single ordering. It could be ordered by several
ways, but no one is apparently better than the others.
Categorical data can be visualised with a special ordering.
Categorical data has no a priori structure to work with.61
Categorical data can be mapped onto special numbers. For instance, Hamming distance, Euclidean distance could prescribe their proximities.
The categorical domain62 typically has a small number of attribute values.
The categorical attribute with large domain size normally does not contain
useful information to group tuples63 into classes.
The data that we analysed here are categorical data. Therefore, the unique
features and problems with categorical data should be carefully handled, such as
the size of the categorical domains.
58

Andritsos, 2002, P21


Ganti, Gehrke, Ramakrishnan, 1999, P1
60
Attributes whose domains are totally ordered are numeric; those whose domains are not
ordered are categorical.
61
Gibson, Keinberg, Raghavan, 1998
62
Domain is the distinct value that the data objects may assume.
63
Tuples are the considered sets of categorical values.
59

3. XploRe user and customer analysis

49

Procedure for clustering categorical data


Clustering of the categorical data could be carried out with the procedure for
the clustering of numerical data. The procedure of clustering categorical data
includes the following steps:64
Data Collection: Collect and carefully extracting the relevant data objects from
data resources. Data objects are distinguished by their individual values for a
set of attributes.
Initial Screening: This step is also referred to as Data Cleaning, which is closely
related to Data Warehousing.65 In this stage, the noisy data is deleted.
Representation: This step includes the proper preparation of data to make it sutiable for the clustering algorithm, such as the examination of the characteristics
and dimensionality of the data and the choise of similarity measure etc.
Clustering Tendency: Check if there is natural tendency of the data to form
clusters, but this step is often omitted, especially regarding to large dataset.
Clustering Strategy: Choose the clustering algorithm and/or initial parameters.
Validation: Validation is often carried out by manual examination and visual
techniques. While the amount of data and dimension grows, there are no means
yet to compare the result with presumed ideas or other clustering.
Interpretation: Draw conclusion and make suggestions to further analysis based
on the result.
There are a number of problems in conducting cluster analysis. One should make
decision in the choice of distance functions, partition criterions and optimisation
of strategy.
Measure of similarity and distance for nominal objects
1. Simple matching
The most common way to measure the similarity and dissimilarity between two
nominal objects i, j is to use the simple matching approach:66 67 68
64

Periklis Andritsos, 2002, P2


Jarke, M. and Lenzerini, M., 1999
66
Sokal and Michener, 1958
67
Kauffman and Rousseeuw, 1990, P19.
68
Andritsos, 2002, P6
65

50

3. XploRe user and customer analysis

Similarity: s(i, j) = u/p


Dissimilarity: d(i, j) = (pu)/p = 1 s(i, j)
u is the number of matches, that is, the numbers of variables of i and j to be the
same. p is the total numbers variables. 69
2. Hamming distance
Hamming distance could be defined as follows: Let x and y be two binary
sequences of the same length. The Hamming distance between these two codes
is the number of symbols in which they disagree.70
The number of bits which differ between two binary strings. More formally, the
P
distance between two strings A and B is |Ai Bi |. The Hamming distance
can be interpreted as the number of bits which need to be changed (corrupted)
to turn one string into the other.71
Hamming distance simply adds up the number of different attributes between
two observations. Consider the following two tuples:
(a) {1, 1, 0, 1, 0}
(b) {0, 1, 1, 1, 1}
Matching the attribute of (a) and (b), if the attribute of (a) and (b) are the same,
the distance between the two attributes is 0, otherwise the distance is 1. The
hamming distance of the two tuples could be calculated as:
1+0+1+0+1=3
3. Euclidean distance
Euclidean distance could not only be used as the distance function for the numerical data, but also for categorical data.72
Euclidean distance function could be written as:
d(x, y) =

n
X

(xi yi )2

i=1

Take the above two tuples above (a) and (b) as example, we could calculate the
69

In the situation with missing value, p is the number of variables that are available both in i
and j.
70

WWW1
WWW2
72
H
ardle W., Simar L., 2000, P298
71

3. XploRe user and customer analysis

51

Euclidean distance between (a) and (b).

(12 + 02 + 12 + 02 + 12 )1/2 =

Clustering criterion for categorical data


1. New Condorcets Criterion (NCC)
New Condorcet Criterion is useful for categorical attributes.73 NCC is based on
Condorcet Criterion, which was initially inspired by Condorcets solution in 1785
to ranking the votes in an election.
Condorcet Criterion considers that if an alternative or a candidate is ranked
ahead of all other alternatives by an absolute majority of votes, it should be
declared the winner.74 P. Michaud and J.F. Marcotorchino applied Condorcets
solution in data analysis in 1982. 75 New Condorcet Criterion, which is developed
by Michaud, is utilised in many applications. The cluster algorithm of IBM
Intelligent Miner, for example, is based on NCC.
The distance between two observations, which defined by the NCC is a modified
hamming distance. If the attribute values of two observations is different (i.e.,
disagree), then the distance is 1; if they are the same (agree), the distance is
0. The NCC combines intraclass disagreement and interclass agreement for a
given partition to reach a higher value of criterion function. Intraclass distance
is the within-group distance, while the interclass distance is the between-group
distance.
The goodness criterion for a given partition P is:
NCC(P ) =

p X
X
k=1 iLk

76

jLk ,i6=j

(m di,j ) +

dij

j6Lk

Where, m is the number of observation attribute. As discussed above, for hamming distance of categorical data, the distance d between two observation i and
j is the disagreements between the two observations. That is the number of variables that take different values. The term m dij is then obviously the number
73

See Gupta, Smbasiva Rao and Bhatnaga


Truchon, Michel, 1998
75
Grabmeier, J., and Rudolph, A., P336
76
Michaud, Pierre, 1987.
74

52

3. XploRe user and customer analysis

of agreements between observations, which could measure the similarity of two


observations. The part inside the parentheses sums up the similarity of i with
the observations within the same cluster and the dissimilarity of the observation
in different clusters. As to the two Sums at the left end, the inner Sum adds up
the distances of observations inside and outside a cluster Lk . The leftmost Sum
runs the distances over all clusters P .
Therefore, The NCC actually is the sum of the distances (the similarities and
dissimilarities). The NCC(P) is a global criterion for the goodness of a particular
partition, which sums up all the intercluster disagreements and all intraclusters
agreements. Obviously the higher the NCC is, the better the partition is. The
criterion ensures that the partition has higher similarity within clusters and dissimilarity between clusters.
The best partition of clustering is decided through ranking the different partition with the value of NCC. In this way, we could found find the application
of Condorcet Criterion as follows: Consider the sum of intracluster agreement
and intercluster disagreement as total votes, and use the given partition as the
candidate, the partition with highest value among all partitions is just like the
candidate who has most votes. Thus, the partition with the highest NCC value
is the best partition, and should win the election.
2. Ward Linkage
Ward linkage is proposed by Ward (1963), which merges groups leading to smallest increasing in the information loss. This clustering algorithm seeks to form
partition which minimises the loss associated with each grouping, and to qualify
that loss in a form that is readily interpretable. 77
Ward defined Information loss in terms of an error-of squares criterion, ESS. It
can be expressed as the total sum of the square of the distance from each point
to its cluster centroid.
ESS =

n
X

(xi + x)2

i=1

An increasing Information loss is indicative of increasing heterogeneity. Therefore, the ward procedure fusions two groups with minimal increase of heterogeneity. The aim of ward procedure is to unify groups without dramatically increasing
the variation inside this group, thus, to reach the most homogeneous partition
groups.
77

Evritt, 1993, P65-66

3. XploRe user and customer analysis

53

The inertia inside a group is used as the measure of heterogeneity.78


IR =

nR
1 X
d2 (xi , xR )
nR i=1

xR is the centre of gravity (mean) of the groups. IR , therefore, presents a scalar


measure of the dispersion of the groups around its centre of gravity. If the Euclidean distance is applied, IR is the sum of the variances of the p components of
xi inside the group R.
When two elements or groups are joined together, the inertia of the new group will
increase. The increase of inertia is calculated according to the following formula:
(P, Q) =

nP nQ 2
d (P, Q)
nP + nQ

Ward algorithm joins two groups that gives the smallest increase in (P, Q).

3.3.2

Clustering with IBM intelligent Miner

Data mining software IBM Intelligent Miner


We analyse the XploRe users downloaded data with the data mining software
IBM Intelligent Miner.79
We choose IBM Intelligent Miner to execute this analysis based on several reasons.
IBM Intelligent Miner employs the New Condorcet Criterion (NCC) as the
clustering algorithm, which is proposed by Michaud and is typically used in
the analysis of categorical data. Because the XploRe users data contains a
set of discrete (categorical) variables, the IBM Intelligent Miner is suitable
to conduct this analysis.
The visualisation tool of IBM Intelligent Miner solves the problem of cluster
analysis in the interpretation of the cluster. It helps to characterise the
feature of each cluster and therefore lead to meaningful customer analysis.
78
79

Hardle W. and Simar, L., 2000, P307-308.


Sofyan, H., and Werwatz, A., 2001.

54

3. XploRe user and customer analysis


The algorithms of Intelligent Miner achieve high computational efficiency.80
IBM Intelligent Miner is free available to academic institute, which give the
financial motivation.

Distance measure and clustering Strategy


IBM intelligent Minder adopts New Condorcet Criterion and a modified hamming
distance for it clustering function for categorical data.
In addition, to find the optimal partition, IBM Intelligent Miner employs the
Demographic Clustering Algorithm. This strategy consider observations one by
one to decide if to put it into already existed clusters or to be used a start point
for a new cluster. If there is already N clusters, N + 1 times should be calculated.
This process is repeated in order to decrease the risk of putting an element into
wrong cluster.
The Demographic Clustering Algorithm doesnt require the user to define the
number of clusters before analysis. However, the IBM Intelligent Miner allows
the analyser to give constrains in number of clusters in result in order to reach
an optimal result.
Results
We conduct cluster analysis here aiming to identify the customer groups, which
are meaningful and can be targeted in marketing.
The previous analysis81 has indicated that the optimal partition of the data is
not necessary to be the one, which has highest statistical goodness value but
has no meaningful characteristics for marketing. According to the segmentation
criteria, the chosen best partition should have relatively high goodness value of
statistics, and at the same time deliver a handful groups, that can be handled and
targeted by the marketer. In addition the targeted groups should be sensitive to
and within the reach of the marketing instruments. Therefore, I considered the
clustering segmentations carefully to decide whether the partition is meaningful
for the development of the marketing strategy for target markets.
80

Computational efficiency refers to the amount of time (and computer memory) used by a
software or an algorithm to perform the required calculations in order to produce the desirable
results.
81
Sofyan, Werwatz, 2001

3. XploRe user and customer analysis

55

To initialize the clustering algorithm, two inputs were given at the start: the
variables of analysis and the number of clusters for the partition. In order to find
out the suitable partition, heuristic backwards selection strategy was adopted,
which started with relatively larger number of variables and maximum number of
clusters. I have tried various combinations of variables and maximum number of
clusters in order to locate a handful and meaningful partition. With the reference
to the results of previous analysis of Sofyan and Werwatz, I conducted the analysis
starting with the maximum number of clusters with six, then five, four and three.
The final chosen segmentation has five variables and four clusters. The five variables are Work field, Work Place, Resource of First learn, XploRe version and OS
platform. With these five variables, the four-cluster segmentation achieves relatively high NCC value (0.6002)82 and a good interpretation of the data comparing
the other segmentations. As mentioned before, the final chosen segmentation not
only achieves the high statistical value, but also could deliver a rational description of the data.
The final segmentation presented by Figure and Table 3.4. The Figure shows the
visual result of the clustering. The Table presents the details of each cluster.

Fig. 3.2: Clustering of Users 2002.


82

The NCC value in IBM Intelligent Miner is called Global Condorcet Value.

56

3. XploRe user and customer analysis

The Figure displays 4 rows, each row represents one of the four clusters identified
by the mining run. The figure at the left end indicates the percentage of each
cluster among the whole sample. The Pie chart represent the active variables
used in the clustering. The importance of variables in forming the cluster is
indicated by the position of the pie chart from left to right. That is, the variable
of pie chart more on the left has higher influence in the cluster formation. Each pie
chart composes two rings. The inside ring shows the distribution of the associated
cluster, while the outside ring represents the distribution of the entire sample.
The first cluster contains 39% of total users. To understand the information that
the pie chart delivered, first take a look at the left end pie in the first row. This
pie shows the distributions of variable First learn in first cluster (inner pie) and
in the whole sample (out ring). 100% of users in first cluster have a value of 1
for variable First learn, which is the numerical code of Internet and means
that 100% of XploRe users of cluster 1 got to know XploRe through Internet.
Comparing the out ring of pie chart (the corresponding segment in out ring
with same colour), according category among the whole sample has a smaller
percentage, 42.93%. The users with other information resources (publication,
friends and conference) consist of the rest part.
In the cluster four, First learn is represented by the third left pie. This means
that the variable is less influential in forming Cluster four. The distribution of
variable First learn in cluster four (inner pie) is very similar to the distribution
in the whole sample (out ring). In contrast, for Cluster four, variable Platform
is the most influential variable, which is presented by the first left pie in the forth
row.
The characteristics of each cluster is summarised in Table. The first two clusters
are relatively bigger, with 36% and 30% each of all observations. And the Cluster
Condorcet Values83 are 6.339 and 6.119 respectively. The third and forth clusters
are rather small, each with 20% and 14% of all observations, and their Cluster
Condorcet Values are 0.5205 and 0.4904.
The Table also indicates the detailed distribution of each related variable in each
cluster, for instance, the inter-cluster modal model frequency of respective variable and value of Chi-squared.84
83

Cluster Condorcet Value is the standardised measure of agreement among the observations
within a cluster.
84
Chi-Squared showed in the Table indicates to what extent the intracluster distribution
differs from that of the whole sample. The closer is it to 1, the more is the difference between

3. XploRe user and customer analysis

Cluster Internet Surfer


Cluster Character
Variable
Similarity: 0.6339 First learn
Size(abs.) 420
Fieldwork
Size(rel.)
35.56% Work Place
Xversion
Platform
Cluster Home worker
Cluster Character
Variable
Similarity: 0.5205 First learn
Size(abs.) 236
Size(rel.)
19.98% Fieldwork

Cluster Academia
Cluster Character
Similarity: 0.6119
Size(abs.) 359
Size(rel.)
30.40%

Cluster Linux User


Cluster Character
Similarity: 0.4904
Size(abs.) 166
Size(rel.)
14.06%

57

2
0.33
0.01
0.00
0.00
0.05

Attributes
WWW, Newsgroup
Econometrics
University
Local
Windows

Freq.
100%
18%
45%
87%
98%

2
0.24

Freq.
44%
31%
19%
17%
68%
81%
98%

Work Place
Xversion
Platform

0.23
0.02
0.05

Attributes
Other resources
Publications, Journals
Others
Finance & Actuarial Sc.
At Home
Local
Windows

Variable
First learn
Fieldwork
Work Place
Xversion
Platform

2
0.22
0.03
0.12
0.01
0.05

Attributes
Friends, Colleagues
Econometrics
University
Local
Windows

Freq.
42%
43%
87%
88%
98%

Variable
First learn

2
0.02

Attributes
WWW, Newsgroup

Freq.
52%

Fieldwork

0.04

Work Place
Xversion
Platform

0.01

Others
Biometrics & Biostatistics
0.01
University
0.01
Local
0.84
Linux

25%
19%
49%
90%
88%

Tab. 3.4: Character characteristics of User IBM Intelligent Miner Clusters (2002)

58

3. XploRe user and customer analysis

The characteristics of each cluster


Cluster 1 is dominant by the value WWW/ Newsgroup of variable First learn.
Therefore, I refer to the segment of Cluster 1 as Internet surfer. This group of
users are more like to look for information through Internet. They download the
local version of XploRe, use windows as platform. They work in diverse spread
field. Their work places are similarly distributed with the whole sample, mainly
in a university and at home.
Cluster 2 and Cluster 3 are determined by two dominant variables, First learn
and Work Place. Users from Cluster 2 work mainly at a university (87%)
and their main information resources are Friends/Colleagues (42%). Users from
Cluster 3 work mainly at Home ( 68%) and they first learnt about XploRe through
some unidentified resources (44%). Thus, Cluster 2 is Academia, who works at
a university and Cluster 3 is Home worker. Academia and homer workers also
mainly download local version of XploRe and use windows as platform. But the
Academics work mainly in field of Econometrics, while the Home workers work in
different fields, with finance and actuarial science in a relatively high percentage
17%.
XploRe users from Cluster 4 are Linux user, who are characterised by the
variable of Platform. 88% of them use Linux as platform. Linux users prefer
to download the local version of XploRe as well. They work in a wide variety of
fields and place, get information from different resources, among which Internet
has a relatively dominant position, 52%.
Other characters of the clusters
Because the purpose of the clustering is to find out the features of the customer
groups and help in the determination of marketing strategy, some interesting,
though not dominant, features of the sub groups also deserve some attention.
Inspired by the finding of descriptive analysis of XploRe customer, the work place
of the user group was analysed. The interesting point is that Linux user has a
relatively higher percentage of users who work at research institutes based on
the whole sample. With the knowledge that high percent of customers are from
research institute, the group of Linux user should be given more attention.
cluster and whole sample. For more details, see Grabmeier & Rudolph (2002).

3. XploRe user and customer analysis

3.3.3

59

Cluster analysis with XploRe

The methodology Distance measure and clustering strategy


Since XploRe has the functions for cluster analysis as well, we have composed a
program with the language of XploRe and conducted a cluster analysis for the
User data. Details of the program is presented in Appendix 3.
The purpose of this anaylsis is to compare the outcomes. We wonder if the results
of the IBM Intellgigent Miner and XploRe will be the same to each other and to
what extent the results from the two different methods are similar or different.
There are some basic differences in the method employd in our clustering programme and that of IBM Intellgient Miner. Similar to NCC, the distance of
elements is 1 if two elements are in different values. But we adopt Euclidean
distance instead of Hamming distance to calculate the distances between elements. (See Section 3.3.1.3) . Furthermore, in our procedure the Ward linkage is
employed afterwards to build up the clusters.
Analysis and result
To conduct cluster analysis by XploRe, the number of clusters in the result and
the determinant variables were again defined first as input. I adopted the chosen
five variables (First learn, Work Place, Work Field, Xversion and Platform) based
on the result of IBM Intelligent Miner and defined the number of final clusters
as four. Further descriptive analysis was followed for each cluster to find out the
feature of each group, so that the characteristic of each cluster could be observed.
After analysing the results carefully, it was found that XploRe deliver the
similar result as IBM Intelligent miner. The data was sub-grouped into four
clusters, which were characterised again as Internet surfer, Homer worker,
Acadeamia and Linux User.
Appendix 4 presents the details of variable distribution of each cluster.
1. A general description of each cluster
Cluster 1
652 users are grouped in to Cluster 1, which is with 55.2% of total users.
39.3% users from this group use Internet as information resource to get to know
XploRe; publication and friends consist of 15.3% and 14.1% of the users each. A
small number of users in this group use conference as information resource, 3.2%.

60

3. XploRe user and customer analysis

In variable Work Place, the users who worked at home compose 44.2% of total
users in this group. Those who work at university are 20.2% of the cluster user.
The most popular software in this group is Excel, with 31.7% of users. SPSS and
MatLab are the following, with 10.3% and 9.5% of users respectively.
Econometric is the main working field for Cluster 1 (23.8%). 14% and 11.8% of
users in this cluster work in Finance and actuarial science and Statistics.
18.9% of Cluster 1 users use method of Time series. 16.6% of them use Basic
statistical methods. Multivariate methods and Linear models users consist each
12.9% and 11.2% of users.
Time series are also the most preferred method, 18.1% of Cluster 1 users seek
better performance in time series methods in XploRe. Multivariate methods are
at second place, with 12.7% of users. With 11.5% and 9% of users each, Graphics
and exploratory data analysis and Non-and semiparametric methods are the third
and forth methods that the cluster 1 user preferred. Basic statistical methods
have a much lower percentage, only 9%.
Local version of XploRe is the dominant downloaded version (77.1%). Windows is
the dominant platform (96.5%). And the users are mainly from Europe (54.3%).
Cluster 2
15.6% of total users (184) make up Cluster 2.
Half of the users in Cluster 2 knew XploRe through the Internet, (51.6%). They
use other resource as well, 13.6% use publication as information resource, 10.9%
attend conference, only 8.7 % get information from friends.
Users from Cluster 2 also work mainly in university (58.2%) and at home (28.8%).
Users from research institute and private company have much smaller partition,
8.7% and 2.7% respectively.
The most popular software for Cluster 2 users are Excel (16.3%), R (15.2%) and
SPSS (13.6%). MatLab and S/S-Plus follow, them with 7.6% and 6% of users.
Cluster 2 users has no dominant The Work Field: 19% of them work in Biometrics
or Biostatistics. Those who work in Econometrics and Physics and engineering
compose equally 13.6% of users each. Fileds of Social science and Statistics have
relatively high percentage, 12% and 10.9% respectively.
Many users in Cluster 2 conduct basic statistics (19.6%). There are also relatively high percent of users who use method of Time series (16.3%), Multivariate
methods (15.8%) and Graphics and exploratory data analysis (12%). Linear

3. XploRe user and customer analysis

61

model and Non-semiparametric methods have relatively lower percentage, 8.2%


and 7.1% each.
The methods which Cluster 2 users look for are again different from what they
use. 20% of them look for Multivariate methods in XploRe, (20.1%). Graphics
and exploratory data analysis, Time series, Basic statistics are equally with 14.7%
of users. Non-parametric method users compose 10.9% of Cluster user.
Local version of XploRe is the dominant version among Cthe cluster 2 users with
94.6%.
The dominant platform in this cluster is Linux, 78.3% of cluster users use Linux
as platform.
The users of this cluster are mainly from Europe (51.6%).
Cluster 3
13.1 % of total users (155) belong to Cluster 3.
In Cluster 3, 99.4% of users get XploRe information from Internet. All of them
work in a university.
19.4% of Cluster 3 user use Excel. MatLab and SPSS users compose 16.1% and
13.5% of them.
Cluster 3 user mainly works in Econometrics, 20%. Biometrics or Biostatistics,
Physics and engineering, and Mathematical Statistics are also main working field,
with 15.5%, 14.2% and 12.3% of users each.
19.4% and 17.4% of Cluster 3 users apply methods of Time series and Multivariate
methods. Methods in Basic statistics are used by 12.3% of cluster users. Users
of Graphics and exploratory data analysis only consist 8.4% of Cluster users.
Graphics and exploratory data analysis methods and Time series are the two
most wanted methods for users in this cluster (16.8% each). 14.8% of Cluster 3
users search for Multivariate methods in XploRe. Another 11.6% of the cluster
users seek software conducting better in Non- and semiparametric methods.
All of the Cluster 3 user download Local version of XploRe and use Windows as
platforms. They come mainly from Europe (45.2%).
Cluster 4
190 of XploRe users make up Cluster 4. This cluster contains 16% of total XploRe
users.

62

3. XploRe user and customer analysis

Friends and Publication are two main resources for Cluster 4 users (51.1% and
47.9% respectively).
100% of them work at a university.
Cluster 4 users use different software. The mainly software they use are Excel
(15.8%), MatLab (11.6%), E-views (10.5%) and SPSS (10%).
Econometrics is the main field of work for Cluster 4 users (40.5%). 14.7% of the
cluster users work in Mathematical statistics.
The Cluster 4 users apply mainly methods of Time series (20%). 13.7% and 12.1%
of them conduct Multivariate methods and Non-and Semiparametric methods.
The users who use methods of Basic statistics and Linear models are 11.6% each
of cluster users.
Non-and semiparametrics methods are the most wanted methods of Cluster 4
users. They also look for methods of Time series (17.4%), Multivariate methods
(11.6%) and Basic statistics (10%).
Cluster 4 users are mainly from Europe (54.2%), use Windows as platform
(98.9%) and download Local version of XploRe (100%).
2. The modal user of each cluster
Based on the general description of each cluster, I identified the modal user and
the main characteristics of each cluster.
Cluster 1 Home worker
User of Cluster 1 is a home worker, who works at home in Europe in field of
Econometrics. He gets information mainly from the Internet. Excel is his mainly
used software. He uses methods of Time series and Basic statistics, but looks
for better performance in Time series, Multivariate methods and Graphics and
exploratory data analysis. His platform is Windows and he downloads Local
version of XploRe.
Cluster 2 - Linux user
Cluster 2 user is a Linux user, who come from Europe, works in a university.
His professional field is Biometrics and Biostatistics. Internet is his main information resource. The present software that he mainly uses are Excel and R. He
normally applies Basic statistics, and searches for Multivariate methods. Linux
is his operation platform. He downloads the local version of XploRe.
Cluster 3 Internet surfer
Internet surfer is the user from Cluster 3. He works at university in Europe in the

3. XploRe user and customer analysis

63

field of Econometrics. Excel is his mainly used software. He applies Time series
and Multivariate methods, but seeks better software for Graphics and exploratory
data analysis and Time series. He uses Windows as platform and download Local
version of XploRe.
Cluster 4 - Academia
Users in Academia make up Cluster 4. He works in a university in Europe, and
conducts research in Econometrics. He mainly uses Excel, applies methods in
Times series, but the methods that he searches for are methods in Non-and semiparametric methods. Windows is his platform, and he downloads Local version
of XploRe.

3.3.4

Comparison of Cluster Analysis Results: IBM Intelligent Miner versus XploRe

The Table bellow shows that the results of the two softwares are very similar.
The main difference exsits in the cluster of Home worker. Beacuse of the group
size are different with the two software, it is easy to understand that the subcharacters of them are different. However, we could come to the conclusion that
the two methods deliver similar outcomes, although some minor differences exist.
However, as a mining tool, IBM Intelligent Miner performed better in visulation
and reached a higher computational efficiency comaped with XploRe. Therefore,
I chose IBM Intelligent Miner to conduct the further cluster analysis.

3.4
3.4.1

Analysis of the latest User data (2003)84


Results of analysis of 2003 data

Descriptive analysis
In order to have the view of the development of the users and market, an anlysis was undertaken for the latest user data. The raw latest user data contains
2593 records (11 October 2001 - 13th March, 2003). After data cleaning and
preparation, the final data has 1945 items.
84

I conducted the analysis of user data 2002 (11 Oct. 2001 - 22 July 2002) with the purpose
to compare it with the customer data, which was collected in similar period (1 July 2000 - 30

64

3. XploRe user and customer analysis


IBM Intelligent Miner

Cluster Internet Surfer


Size 35.56%
Variable
Attributes
First learn
WWW, Newsgroup
Fieldwork
Econometrics
Work Place
University
Xversion
Local
Platform
Windows
Cluster Home worker
Size 19.98%
Variable
Attributes
First learn
Other resources
Publications, Journals
Fieldwork
Others
Finance & Actuarial Sc.
Work Place
At Home
Xversion
Local
Platform
Windows
Cluster Academia
Size 30.40%
Variable
Attributes
First learn
Friends, colleagues
Fieldwork
Econometrics
Work Place
University
Xversion
Local
Platform
Windows
Cluster Linux User
Size 14.06%
Variable
Attributes
First learn
WWW, Newsgroup
Fieldwork
Others
Biometrics
Biostatistics
Work Place
University
Xversion
Local
Platform
Linux

XploRe

Freq.
100%
18%
45%
87%
98%

Freq.
44%
31%
19%
17%
68%
81%
98%

Size 13.1%
Attributes
WWW, Newsgroups
Econometrics
University
Local
Windows

Freq.
99.4%
20.0%
100%
100%
100%

Size 55.2%
Attributes
WWW, Newsgroups
Others
Econometrics

Freq.
39.3%
28.1%
23.8%

At Home
Local
Windows

44.2%
77.1%
96.5%

Freq.
42%
43%
87%
88%
98%

Size 16%
Attributes
Friends, colleagues
Econometrics
University
Local
Windows

Freq.
51.1%
40.5%
100%
100%
98.9%

Freq.
52%
25%
19%
49%
90%
88%

Size 15.6%
Attributes
WWW,Newsgroup
Others
Biometrics & Biostatistics
University
Local
Linux

Freq.
51.6%
21.7%
19.0%
58.2%
94.6%
76.6%

Tab. 3.5: Comparison of Clustering results with IBM Intelligent Miner and XploRe

3. XploRe user and customer analysis


Name
First Learn
Work Place
Software
Work Field
Method Used
Method Looked for
Xversion
Platform L
Platform C
OS Platform
Country
Continent

Type
Categorical
Categorical
Categorical
Categorical
Categorical
Categorical
Categorical
Categorical
Categorical
Categorical
Categorical
Categorical

Modal Value
WWW, Newsgroup
University
Excel
Econometrics
Time Series
Time Series
Local
Windows NT
Windows NT
Windows NT
Germany
Europe

65
Modal Freq.
43.5%
47.8%
25.9%
22.0%
19.2%
17.0%
84.5%
82.0%
95.9%
85.4%
16.8%
50.5%

No. of Values
5
6
17
10
12
12
3
4
4
4
93
4

Tab. 3.6: Summary and description of the variables for User data 2003

The modal values of the data is summarised by the above table. Again we could
locate the features of current modal user of XploRe. He works in a German University, majors in Econometrics.85 His computer is with OS platform of Windows
NT. The software he uses is Excel, with which he conducts research with Time
Series methods. He searches information through Internet, and the purpose he
downloaded XploRe is to find better software for Time Series methods.
Appendix 5 presents the detatail information for User data 1 3/03/03. Comparing
the methods used and methods needed looked for by the XploRe user, I found an
interesting point.
The methods used by users were at first Time series 19.20%, then Basic Statistics
17.1%. Multivariate methods were at the third place with 14%. The Graphics
and exploratory data analysis is only 6.8%. But the methods of preference looked
for by the users are different. Time series were still the primary methods needed
(preferred) wanted by the users. Mutlivariate methods were at same place, but
with same percentage of the usage 14%. Graphics and Exploratory data analysis
jumped to the third position with 11.7%, while methods of basic statistics were
only preferred wanted by 10% of the users.
This discovery indicates that most software have basic statistics functions, thereAugust 2001).
85
Referencing to Appendix 6, the major Work Field of General XploRe user is defined as
Practical /Applied Econometrics.

66

3. XploRe user and customer analysis

fore, the users have no extra needs in those functions. They dont search the
benefit from XploRe, but Graphics and Explortatory data analysis are more in
demand. XploRe could focus more and improve on these functions, and marketing
and send this information to the customers.
Clustering for 2003 data
The clustering with IBM Intelligent Miner was undertaken again to find the clusters for the new data of 2003. The final chosen clustering variables were the same
as those for 2003 user data, namely Firstlearn, OS Platform, Work Place, Fieldwork and Xversion. Because IBM Intelligent Miner offers the opportunity for
input some other variables as supplementary variables and in order to have the
view of the distribution of some interested variables in each cluster, the variables
of Software, Method used and Method looked for were adopted as the supplementary variable during clustering. As the result, the users again were subgrouped
into four clusters, namely as Internet surfer, Home worker, Academia and
Linux user. The clustering reached a Global Condorcet value of 0.5940. The
following graphic86 and the Table (Appendix 6) present the outcome of the clustering.
1. Summary from Variable perspective
Information resource (First learn)
Form the summary table of the four customer clusters, we can see that the Internet plays an important roll as communication channel for Internet surfers and
Line users. Linux users depend partly also on publications as information resource. Academia depends highly on personal communication channels. They
get information mainly from Friends and Colleagues, Publications is the another
important information resource for them. Home workers have a mixed information resources. They get more information from other resources. Publication and
Friends/Colleagues are two main resources for them to get information.
Conference plays a minor roll in all of the groups. This might means that our
participations in the conferences havent made strong impact on the customer or
XploRe still lack enough appearance in the conferences.
Working place
Academias mostly work in Universities. Internet surfer and Linux user are mainly
composed of people who work at university or at home. The presentation of
86

Showed in the graphic, the variables in brackets are supplementary variables.

3. XploRe user and customer analysis

67

Fig. 3.3: Clustering of user 2003.

people working in research institutes and private companies are allocated fairly
in Internet surfer and Home worker with the similar distribution. Linux user
has a relatively high percentage of those working in the research institute. More
people work in private companies than in research institute in group of Home
worker.
Home worker is a mixed group. A highly percentage of them work at home, they
might be students or people who work some where but use XploRe at home. It
also contains some percentage of people who work in institutes and companies
Software
Excel is the first choice of software in all the groups except Linux user. MatLab
and SPSS are at the second and the third places for Internet surfer and Academia.
SPSS users are more than MatLab users in Homer worker group. MatLab user
will need more sophisticate knowledge in programming than the SPSS user, which
indicates a chance for XploRe.
Linux user s first choice is R. It is because R is also an online non-profit software

68

3. XploRe user and customer analysis

the same as Linux. There are less SPSS users in this group than MatLab users
as well.
Fieldwork
Internet surfers work mainly on Econometrics and Finance /Actuarial science.
Academia also work mainly on Econometrics but followed by Mathematical statistics. This shows that the Internet surfer may be engaged more in practical
financial study and Academia devote himself more in theoretical statistical research. Homer worker works mainly on Financial and actuarial science, followed
by Econometrics. This hints an even higher degree of engagement in the financial
practice. Linux user works more on Biometrics or Biostatistics and Physics or
engineering. Therefore, comparing the other groups, Linux user is more natural
science oriented.
Methods looked for
Internet surfer and Homer worker are both interested in Time Series and Multivariate methods because of their strong involvement in financial practice. But
the difference in the third method, Internet surfer in Non- and Semi-parametric
methods and Homer worker in Graphics and exploratory data analysis, shows
that the Home worker group emphases even more on the practical side than the
Internet surfer. Academia concentrates on the theoretical development, therefore,
their interested methods are more research oriented. Academia pays more attention in Non- and Semi- parametric methods, which are relatively new methods.
Linux users are more interested in the Graphics and exploratory data analysis
and Basic statistics, which complies with their demand in natural science data
analysis.
Methods used
All of the groups use Time series and Basic statistics as main methods. Comparing the methods they search in XploRe, we could be indicated that the users
apply basic statistical methods in the software that they possess, but they try to
find more sophisticate software which has better performance in Time series and
Multivariate methods. The requirement for Non- and Semi- parametric methods
and Graphic and exploratory data analysis also motivate them to look for the
new software.
Platform
Except for Linux users, Windows is the dominant platform for all the other three
groups.
XploRe Version

3. XploRe user and customer analysis

69

Most users in all the groups downloaded the local version of XploRe. ReX has
a higher presence in Homer worker group, which accords to the high utilisation
of Excel in this group. They use it as complementary software for Excel. Client
version has low percentage in all of the groups. The low percentage of Client
version shows the demand of Client version for more promotion.
2. Summary from cluster perspective
(1) Modal user of each cluster
Internet surfer
Internet surfer gets information absolutely from Internet. He conducts research in
Econometrics with Time Series Methods in a University. The software he uses is
Excel, but he looks for software with better performance in Time Series Methods.
Windows NT is his OS Platform. He downloaded Local version of XploRe.
Academia
Academia works in University, conducts research with software Excel in Econometrics. The methods he employed are Time Series Methods. He got the information of XploRe through Friends/Colleagues. The benefit he sought in XploRe
is Non-and Semiparameter methods. Windows NT is his OS platform, Local
version is the XploRe version he downloaded.
Home worker
Home worker has a mixed character. He works mainly at home. Publication/
Journals are the important Information resources for him87 . The fields he works
in are mainly in Finance and actuarial analysis. He currently adopts Time series
Methods, but searches for better performance in Time Series methods and Multivariate Methods. He downloaded Local version of XploRe onto his Windows NT
OS platform.
Linux User
Linux user works at University in the field of Biometrics or Biostatistics. He
uses R as the software to conduct his work with Basic Statistics methods. He
primarily wants software with good performance in Graphics and Exploratory
data analysis. He got to know XploRe through Internet, and downloaded the
Linux version of XploRe. His platform is Linux.
(2) Special features for clusters
Modal values indicate the main characters for each cluster. But when comparing
87

Because the value of other contains various options, therefore, I did not consider it as
modal value, even it rates at the first position for the variable.

70

3. XploRe user and customer analysis

some sub-features of each cluster with those of the Total Users (See Appendix
6), some very interesting characteristics are found, which distinguish each cluster
from each other and from the whole user population.88
Internet surfer
Internet is the sole information quell for the Internet surfers. They use MatLab and SAS more than the other groups, which are the software similar with
XploRe. The fields they work in are mainly Econometrics and Finance/ Actuarial
analysis. We could say that they conduct the practical work in Econometrics.
The downloading of Client version by this group is surprisingly the same as the
whole user group, which is different from my expectation of a higher percentage,
because they have a better Internet access than the other groups.
Academia
Academia works mainly in a university. The users from Academia group get
information mainly through Friends/Colleagues and Publication/ Journals. Internet doesnt play any role in information gathering. They conduct researches
in Econometrics and Mathematical Statistics. This means that they work more
in Academic research of Econometrics. Non-surprisingly, the benefit they sought
in XploRe is also more academic research oriented and advanced - the Non- and
Semiparametric methods.
Home worker
This group is a mixed group. To name it as home worker maybe not proper,
because actually only two third of them work at home, the rest work in various
places except for University. They work relatively more in Private Company, but
None of them work in a university.
They get information actually mainly from Other resources. However, Publications/Journals are a more importance information resources for them than for
the other groups (except for Academia). None of them use WWW/Newsgroups
as Information resource. Thus, it might be more proper to call them as a group
with extra Information Resources.
The field they engaged in are Finance/ Actuarial analysis and Others, so that
they conduct Practical work in Finance. Graphics and Exploratory data analysis
methods are more needed by them than others.
They use Excel as statistical software with a very high percentage. Therefore,
88

The features not specially mentioned below for each cluster are the features , which are
similar to those of the whole user group.

3. XploRe user and customer analysis

71

ReX version of XploRe is very popular among them, because ReX is an aid
instrument to Excel.
Linux User
Linux users use Linux as operation systems. R is naturally their choice of primary
software, because similar as Linux, R is a free statistical software for Linux system. Linux users work more in the fields of Biometrics/Biostatistics and Physics/
Engineering. Therefore, I define their main work field as Biological research and
Practical Engineering. They apply mainly Basic statistics, software for Graphics
and Exploratory data analysis are what they look for. They have high demands
in high quality graphic presentation.
Each group of XploRe users have different focus in their work field
The General User of XploRe devotes himself mainly in Practical /Applied
Econometrics.
Internet surfers mainly involve in Practical/ Applied Econometrics.
Academia majors in Academic/Theoretical Econometrics
Home workers engage in Practical/Applied Finance
Linux users are not from economic background, their work fields are Biological research and Applied Engineering.
(3) Similarity between clusters
Above we focused more on the differences between groups, but there are some
similarities exist in some groups as well. These facts might be helpful in Marketing
Mix design. The same tool could be then applied to different groups.
Academia and Home worker both dont use Internet as information resource,
while Publications/Journals play a very importance roll in both groups.
Internet surfer and Home worker both search for software of Time series
methods and Multivariate Methods, which are similar with the Total group.
This might be the result of them both conducting practical work. The difference exists in the third needs, Home works have higher needs in Graphics/exploratory data analysis, while Internet surfer place the needs of Nonand Semiparametric Methods as the third need. This indicates their different demands based on their work, Home work for Finance and Internet

72

3. XploRe user and customer analysis

Name
First Learn
Work Place
Software
Work Field
Method Used
Method Looked for

Xversion
Platform L
Platform C
OS Platform
Country
Continent

User220702
Modal Value
Modal
Freq.
WWW, Newsgroup
42.9%
University
49.4%
Excel
25.1%
Econometrics
24.1%
Time Series
18.7%
Time Series
17.3%
Local
86.5%
Windows NT
84.1%
Windows NT
87.8%
Windows NT
84.2%
Germany
16.9%
Europe
52.7%

No. of
Values
5
6
17
10
12
12
3
4
4
4
77
4

User130303
Modal Value
Modal
Freq.
WWW, Newsgroup
43.5%
University
47.8%
Excel
25.9%
Econometrics
22.0%
Time Series
19.2%
Time Series
17.0%
Local
84.5%
Windows NT
82.0%
Windows NT
95.9%
Windows NT
85.4%
Germany
16.4%
Europe
50.5%

No. of
Values
5
6
17
10
12
12
3
4
4
4
93
4

Tab. 3.7: Comparison of User 220702 and User 130303

surfer for Econometrics. From this point, Internet surfers are more similar
to the Total users, who have the same needs. The general user of the whole
XploRe users engages also mainly in the Practical work in Econometrics.

3.4.2

Comparison of historical user data

First, compare the modal user of 2002 and 2003.


The Table indicates that the model values and frequencies of the two datasets
are quite similar. Therefore the major features of the users dont have dramatic
changes in this period. The only difference is that XploRe is got known in 93
countries in 2003 data while there are only 77 countries in 2002. That means that
XploRe spreads very quickly in half year, people from 16 other countries started
to use XploRe.
Comparing the description of each variable in the two datasets, there are also no
significant differences. I notice that there is a small development of the software
used. In 2002 data, SPSS (11.2%) had a higher percentage than MatLab (10.4%).
In 2003 data, MatLabs percentage (11%) is slightly higher than that of SPSS
(10.5%). If only within the comparison of 2002 and 2003 data, this subtle change
could be not identified as a trend. But interesting enough, if we compare the
current data with the historical backward to the 2000 data, which was analysed
by Sofyan and Werwatz (2001), the finding is very inspiring.

3. XploRe user and customer analysis


2000
Software
Excel
SPSS
SAS
MatLab
GAUSS
S/S-Plus

Percentage
15.0%
14.0%
11.0%
8.0%
7.0%
5.0%

73
2003
Software Percentage
Excel
25.9%
Other
11.4%
MatLab
11.0%
SPSS
10.5%
R
7.5%
Eviews
6.5%

Tab. 3.8: Comparison of software used in 2000 and 2003

1. Comparison of software
Comparing the statistics software used by the users in 2000 and 2003, I find that
there is a clear trend of increase in using of Excel and MatLab. The share of
SPSS, SAS, GAUSS and S/S-Plus are declining. GAUSS and S/S-Plus have the
percentage of 4.1% and 5.1% in 2003 respectively. Other software such as R and
Eviews belong to the top five.
The findings have some indications. The dramatic increase in Excel users might
reflect that more and more people conduct statistics analysis with Excel. Because
Excel is an applied software, which performs basic statistical analysis, the increase
in the percentage of Excel users might also hint that the user base of XploRe in
2003 is less professional than in 2000. This could be the consequence of the
Internet. Through Internet people can access professional software as XploRe
much easier than before.
Another change I have already mentioned before is the rising of MatLab. MatLab
taking place of SPSS is now at the second position instead of the fourth in 2000.
Because MatLab, like XploRe, is a command driven software, while SPSS is much
more application oriented. Therefore, its rising might indicate that the statistics
professionals are now more interested in the programming based software. This
change might benefit XploRe.
2. Comparison of Information resource
We could find the change in the information resources in the categories of Publications/Journals and Colleagues/Friends. In 2000, the communication channel of XploRe was more personal, Colleagues/Friends was 25% and Publications/Journals was only 15%. But in 2003, their positions are changed. Publications/Journals are now at 18.2%, Colleagues/ Friends declines to 16.7%.

74

3. XploRe user and customer analysis

M
at
La
b
G
AU
SS
S/
SPl
us

SP
SS

Ex
c

SA
S

20.0% 15.0% 14.0%


11.0%
15.0%
8.0% 7.0%
10.0%
5.0%
5.0%
0.0%

el

Percentage

2000

Software

Percentage

Software

Ev
ie
w
s

SP
SS

at

La

er
th
O

Ex
ce

30.0% 25.9%
20.0%
11.4% 11.0% 10.5%
7.5% 6.5%
10.0%
0.0%

Percentage

2003

Percentage

Fig. 3.4: Software used in 2000 and 2003.

2000
Info. Resource
www/newsgroupes
Colleagues, friends
Publications/Journals
Others
Conferences

2003
Percentage Info. Resource
44.0%
www/newsgroupes
25.0%
Publications/Journals
15.0%
Others
13.0%
Colleagues, friends
2.0%
Conferences

Percentage
43.5%
18.2%
18.2%
16.7%
3.4%

Tab. 3.9: Comparison of information resources in 2000 and 2003

3. XploRe user and customer analysis

75

44.0%

Pu

re
.
fe

C
on

bl

2.0%

..

er
s

13.0%

O
th

at
ic

ag
le
C
ol

w
w
w
/n

15.0%

...

25.0%

...

50.0%
40.0%
30.0%
20.0%
10.0%
0.0%
...

Percentage

2000

Info. resource
Percentage

43.5%
18.2% 18.2% 16.7%

..

..
C
on

fe

re
.

u.
ag

C
ol

le

O
th

at
ic
bl
Pu

w
w
w
/n

er
s

3.4%

...

50.0%
40.0%
30.0%
20.0%
10.0%
0.0%

...

Percentage

2003

Info. rsources
Percentage

Fig. 3.5: Information resource in 2000 and 2003.

This trend reflects the efforts of XploRe in improving its communication channels.
The width of the personal channel is limited. Comparing personal communication
channels, non-personal communicational channels are more effective to broaden
the potential user base and spread the information in a much wider scope.
3. Comparison of Country and Continent
The geographic trend that happened in the past two years presents that users
in Asia, especially Japan, increase dramatically. Asia becomes a more important
base for the potential customers of XploRe.
4. Comparison of Clusters in 2000 and 2003
The obvious difference between the clustering of 2000 data and 2003 data is that

76

3. XploRe user and customer analysis


2000
Country
Germany
USA
Italy, France, UK

Percentage
22.5%
18.8%
17.5%

2003
Country
Germany
USA
Japan

Percentage
16.4%
15.8%
8.6%

Tab. 3.10: Comparison of country in 2000 and 2003

2000
Continent
Europe
America
Asia
Africa, Australia

Percentage
59.0%
29.8%
10.0%
3.2%

2003
Continent
Europe
America
Asia-pacific
Africa

Percentage
50.5%
25.0%
21.6%
2.9%

Tab. 3.11: Comparison of continent in 2000 and 2003

in 2000 there are only three user clusters, namely Academia, Unix/Linux and
Researchers,89 while 2003 clustering subdivided the user data into four clusters:
Academia, Linux user, Home worker and Internet surfer.
The appearance of a new cluster Internet surfer in 2003 reflects the trend of
popularity of Internet among the users and the development of E-commerce and
E-shopping. Internet becomes increasingly a powerful communication tool. It
enables the potential customers to access the product information much easier
than before. Internet surfer is the dominant tool of communication channel for
this group. It is important to understand more about the behaviour and features
of this group.
Two groups exist in both clustering, Academia and Linux user. For Academia
2003, publications/Journals are a much more important information resources
than in 2000. In 2003, 39% of Academia use Publication/Journals as Information
Resource comparing only 19% of them in 2000. This indicates that XploRe was
successful in improving communication channels in the last two years. More
users access the information about XploRe through non-personal channels than
personal channels. Internet is not the choice for them in 2003 compared to 37%
in 2000. This could be explained by the appearance of Internet surfer group in
89

Sofyan H. and Werwatz A., 2001,P475-477.

3. XploRe user and customer analysis

77

2000

2003

Cluster Academia
Variable
Attribute
Kind of Work University
OS Platform
Windows
Get Info.
Journals
Friends, colleagues
WWW, newsgroups
conferences

Freq.
80%
97%
19%
32%
37%
6%

Cluster Academia
Variable
Attribute
Work Place University
Platform
Windows
First Learn Friends, colleagues
Journals
others
WWW, newsgroups

Freq.
88%
98%
39%
31%
23%
0%

Cluster Unix/Linux
Variable
Attribute
S Platform
Unix/Linux
Get Info.
WWW, newsgroups
Kind of Work University

Freq.
99%
62%
46%

Cluster Linux user


Variable
Attribute
Platform
Linux
First Learn WWW, newsgroups
Journals
others
Friends, colleagues
conferences
Work Place University
At Home
Research Institute

Freq.
86%
56%
17%
17%
8%
3%
43%
31%
12%

Cluster Researchers
Variable
Attribute
Kind of Work Research
OS Platform
Windows
Get Info.
other source
Friends, colleagues
WWW, newsgroups

Freq.
78%
97%
20%
27%
34%

Cluster Home Workers


Variable
Attribute
Work Place At Home
Platform
Windows
First Learn others
Journals
Friends, colleagues
conferences
WWW, newsgroups
Cluster Internet surfer
Variable
Attribute
First Learn WWW, newsgroups
Work Place University
Platform
Windows

Tab. 3.12: Comparison of User clusters of 2000 and 2003

Freq.
67%
100%
44%
32%
19%
5%
0%
Freq.
100%
44%
99%

78

3. XploRe user and customer analysis

2003. Some members of Academia 2000 were resegmented into Internet Surfer
group in 2003.
Linux user in 2000 is similar to the Linux user in 2003. This is the most stable
user group of XploRe. The main characteristics of this group remain almost no
change. They are mainly from universities, get the information through Internet
and use Linux as platform.
Researchers group disappeared in the segmentation of 2003. Instead in 2003
segmentation, there is another new group Home worker. This phenomenon is
the consequence of the change in the values for the variable.
In 2000, variable Kind of Work contains values of University, Research institutes and Private/Non-research companies. Among the total users, 34% of them
work in research institutes. But in 2003 questionnaire, the variable of Work
Place consists of five choices, University, At home, Private company, Research
Institute and Government /International organisations. In the results of 2003
survey, 29.6% of total users work at home instead of research institute ranking
at the second position after University. The users working in research institutes
declines to only 8.9%. This consequence is the result from the new value of
At Home in 2003 survey. Actually, this is not the result we prefer. Because
Researchers are the users with high potential to become a true customer, who
actually buy XploRe. We are more interested in studying their features and behaviour patterns. Home worker is a mixed group. It is difficult to identify
them and reach them. And Home worker has also mixed characters. They
might be the people using XploRe at home, but work in other places. This fact
brings actually the confusion to the answers. Furthermore, to identify where people actually use XploRe, at home or working place, has no obvious marketing
reasoning. Therefore, the improvement in the questionnaire should be made to
some questions and choices.

3.5

Complementary analysis

3.5.1

Analysis of regrouped data

During the clustering, the variable with a large domain will lead to low Condorcet
value. As the consequence some variable with large domain could not be used
as clustering variables. In order to solve the problem of large size domain, the

3. XploRe user and customer analysis

79

attempt of regrouping the value of variable was pursued.


The values of variables with large domain were regrouped to reduce the size.
The reformed variables were software, Fieldwork, Methods used and Methods
looked for. The software were grouped into six categories: (1) Applied software, (2) Excel, (3) Statistical software, (4) Econometrics software, (5) Rest
software90 and (6) the Other software. Reformed Work Field includes seven categories: (1) Mathematical statistics, (2) Finance and Actuarial science/Risk analysis, (3) Physics and Engineering, (4) Biometrics or Biostatistics/Epidemology,
(5) Social science/Marketing and Survey research, (6) Econometrics and (7) the
Other fields. The methods used/looked for are now divided into seven group values: (1) Time series, (2) Multivariate methods/Non- and Semiparametric methods/Generalised linear and limited dependent variables/Linear models/Survival
analysis, (3) Graphics and exploratory data analysis/Tools for learning or teaching statistics, (4) Basic statistics, (5) Panel data/Cross-sectional time series, (6)
Resampling and simulation methods and (7) the other methods.

Descriptive analysis
The table below gives the modal values of the regrouped data. As the result of
the regrouping, the Software that the users used became Statistical software.
They engage in field of Econometrics. The methods they used and looked for are
now Multivariate /Non-semiparametric methods group. Appendix 7 compares
the modal values of the user data with regrouped user data. From the Appendix
7, we could see that the modal value of Fieldwork keeps the same. The software
changes from Excel to Statistics software. Both of the methods used and looked
for are also changed from Time series to Multivariate/ Non-and Semiparametric
methods group.
Appendix 8 gives the detailed information for the descriptive analysis of regrouped
data. Look again into the variables of methods used and looked for, the same
fact was found as that in the user data. The Graphics and Exploratory data
analysis are more wanted while Basic statistical methods were more employed.
Multivariate methods group and Time series keep high in both data sets for the
both categories.
90

Rest software refer to the software, which were listed in questionnaire, but not grouped
into the given software groups.

80

Name
First Learn
Work Place
Software
Work Field
Method Used
Method Looked for
Xversion
Platform L
Platform C
OS Platform
Country
Continent

3. XploRe user and customer analysis

Type

Modal Value

Modal Freq.

Categorical
Categorical
Categorical
Categorical
Categorical
Categorical
Categorical
Categorical
Categorical
Categorical
Categorical
Categorical

WWW, Newsgroup
University
Statistics
Econometrics
Multi./Non-Semipara.meth.
Multi./Non-Semipara.meth.
Local
Windows NT
Windows NT
Windows NT
Germany
Europe

43.5%
47.8%
30.0%
22.0%
40.1%
40.5%
84.5%
82.0%
95.9%
85.4%
16.4%
50.5%

No. of
Values
5
6
6
7
7
7
3
4
4
4
93
4

Tab. 3.13: Summary and description of the variables of regrouped User data 2003

Clustering with IBM Intelligent Miner


The clustering with IBM Intelligent Miner was conducted for the regrouped data
as well. First try was with the five variables of Work Place, Firstlearn, Xversion,
Fieldwork, OS platform as the clustering variables. The result was with relatively
high Condorcet value (0.5711), but the clusters didnt have much marketing reason. It was difficult to use them as target groups because the characters were
not distinguished. The further search of proper clustering was taken. As last,
I chose a clustering as final result, which took Work place, First learn, Work
Field, Methods looked for and OS Platform as the clustering variables and the
Condorcet value is 0.5034. This clustering segmented the XploRe users again into
four groups of Internet surfer, Home worker, Academia and Linux user.
Following graphic shows the result of clustering by IBM Intelligent Miner.
result of clustering for regrouped user data is presented in Appendix 9.

91

The

General user92
The general users of XploRe adopt Internet as main information resource. They
work mainly in universities, use statistics software to conduct research in Econometrics and Finance filed. Time series, Multivariate methods group and Basic statistics are the methods they employed, but they look for software that
91

Variables of Xversion, Software, Methods used are supplementary variables in the clustering.
In order to present the special features of each cluster more clearly, here the characteristic
of the general user is also given.
92

3. XploRe user and customer analysis

81

Fig. 3.6: Clustering of regrouped user data.

have better performance in Time series, Multivariate methods group and Graphics/Exploratory data analysis group.
Internet surfer
Internet surfers use Internet as dominant information resources. They work
mainly at home, but a high percentage of them work in private companies and
research institutes. They are engaged mainly in Finance field, and use mainly
Excel to conduct their work. The methods they used and looked for are same to
the general users. They download Local version of XploRe onto their Windows
NT platforms.
Academia
Academia gets information mainly through Internet, but Friends/Colleagues play
a rather important roll as information resources. They work in Universities. They
use Statistic software and Excel to undertake their work in Econometrics and
Mathematical statistics. They use and look for the same methods as the general
users. They download Local version of XploRe on to Windows NT platform.

82

3. XploRe user and customer analysis

Home worker
Friends/ Colleagues and Publications/Journals are rather important information
resources for this group. They work mainly at home, but a high percentage of
them work in research institutes. Private companies are with a relatively high
percentage as well as work place. The fields they engaged in are Finance field
and Econometrics. The methods they used and looked for are the same as the
other groups. Most of them download Local version of XploRe on to Windows
NT platforms, but ReX has a relatively higher usage in this group.
Linux user
Internet is the primary information resource for Linux users. The software they
use are mainly Statistic software and Applied software. They work in the Biological fields. Graphics /Exploratory data analysis group are the methods they
most wanted after Multivariate methods group. Their platforms are Linux. Local
version of XploRe is the dominant version they downloaded.
The characteristics of the regrouped user clusters are similar with the users clusters. Here the further comparing will not be discussed. But in the future study,
the regrouping study is important. It is task for the future study to consider
more carefully with the possibility of regrouping.

3.5.2

Analysis of high profitable sector

Because high percentage of customers come from research institutes. As a high


profitable sector, the users from research institute attract the interest. A descriptive analysis was carried out for the users from research institutes in order to find
the features of this group.
The Table 3.14 presents the main features of institute user and compares it with
the General user of XploRe. From Table 3.14, we could see that the model
Institute user is someone who works in a Research institute in the field of Econometrics. Internet is his main information resource. He uses Excel to conduct
research with Time series methods. The methods he looks for are Time series.
Windows is his OS platform and he downloaded Local version of XploRe.
Comparing the general user of XploRe, Institute users have similar model characters. There are some differences in sub-characters. A relatively high percentage of
MatLab users presents in users from Institutes. They have also stronger natural
science backgrounds. These sub-characters are those of Linux users. Therefore,
the Linux user could be an interesting group, which deserves more study.

3. XploRe user and customer analysis

83

However, this study is rather simple. In order to understand more about the
Institute users, more study should be conducted in the future.

84

3. XploRe user and customer analysis


Institute users

General Users

First learn
WWW, newsgroups
publications, journals
Friends, colleagues
others
conferences

49.5%
20.6%
17.8%
9.3%
2.8%

WWW, newsgroups
publications, journals
others
Friends, colleagues
conferences

42.9%
18.3%
17.9%
17.4%
3.5%

Excel
MatLab
SPSS
other
S/S-Plus

18.7%
16.8%
12.1%
11.2%
8.4%

Excel
SPSS
Other
MatLab
R

25.1%
11.2%
11.2%
10.4%
7.5%

Econometrics
Physics & engin.
other
Biometric/Biostatistics
Social Science
Methods looked for
Time series
Multivariate meth.
Non/semipara.meth.
Graph./ explor.analy.
Basic statistics
Methods used
Time series
Multivariate meth.
General. Linear models
Basic statistics
Non/semipara.meth.
Platform
Windows NT
Linux
Xversion
Local
ReX
Client

21.5%
17.8%
15.9%
15.0%
7.5%

Econometrics
other
(Math.) Statistics
Finance & actuarial sc.
Physics & engin.

24.1%
15.8%
11.9%
11.0%
10.1%

18.7%
16.8%
15.9%
9.3%
7.5%

Time series
Multivariate meth.
Non/semipara.meth.
Graph./ explor.analy.
Basic statistics

17.3%
14.0%
13.1%
12.2%
9.6%

19.6%
16.8%
14.0%
11.2%
10.3%

Time series
Basic statistics
Multivariate meth.
Linear models
Graph./ explor.analy.

18.7%
15.7%
14.1%
10.3%
7.8%

80.4%
16.8%

Windows NT
Linux

84.2%
12.9%

90.7%
7.5%
1.9%

Local
ReX
Client

86.5%
9.3%
4.1%

Software

Work Field

Tab. 3.14: Comparison of Institute user and General user

4. Suggested marketing strategy for


XploRe

4.1
4.1.1

Marketing Strategy and Marketing mix


marketing strategy

Marketing Strategy is a functional strategy. It aims to effectively allocate and


co-ordinate marketing resources and activities to accomplish the firms objectives
with in a specific product market.93 Marketing strategy, therefore, focuses on
gaining competitive advantage in a selected target market in a coherent and goaloriented way through using the specified marketing mix instruments.

4.1.2

Marketing Mix

Marketing Mix is used to describe how businesses promote their products and
services or how customers learn about a businesss products and services.94
The basic marketing mix normally composes of four elements, that is so called
4Ps Product, Place/distribution, Price, Promotion.
The marketing mix can be expressed in a more customer oriented way, which is
called 4Cs.95
Customer Value or Solution: Product benefits from customers view
Cost to the customer: Price plus the customers cost, for example, travel,
fax.
Convenience for the customer:
place/distribution channel.

The effectiveness and efficiency of

Communication: A two-way dialogue rather than only one way- promotion.


93

Walker, O. C. , Boyd, H. W., etc., 2003, P12.


Zell, Alan. J.
95
WWW24
94

85

86

4. Suggested marketing strategy for XploRe

Place

Product
Target

Price

Promotion
Fig. 4.1: 4P of marketing mix.

Place
The Marketing mix element place is concerned with two aspects of a firms
function: distribution and logistics.
Distribution refers to the ways that organisations get the physical product to a
point where is most convenient for the customer to buy it.96 Logistics is concerned with the process of planning, implementing, and controlling the efficient,
cost-effective flow and storage of materials, in-process inventory, finished goods
and related information from one point of origin to point of consumption for the
purpose of conforming to customer requirements. 97
Distribution channel consists of the organisations that move the product from
producer to the end consumer. The members of distribution channel perform
many functions, which help the completion of the transaction. The functions
include gathering and distribution of the information that is needed for planning
and transaction, promoting product, obtaining making contacts with potential
customers.
Conventional distribution channel and vertical market system are the two major types for distribution channels. The vertical marketing system consists of
corporate, contractual and administered vertical marketing systems.
The distribution mix is composed of administration, order processing, inventory,
packaging, warehousing, receiving, dispatch and transportation.
96
97

Blythe, Jim, 2003, P209.


WWW32

4. Suggested marketing strategy for XploRe

87

Product
Product is the physical product or service offered to the customer. Normally the
offer of the company combined the both sides. Therefore, the product refers not
only to the tangible but also the service added to it.
A product is a combination of three levels:98
Core product: The core product/ benefit is the real purpose that a customer by
for buying the product.
Actual product refers to the components of a product, such as features, packing,
brand name, quality level and design.
Augmented product is the additional customer service and benefit offered to the
customer. For example: installation, delivery and credit, warranty and after sale
service.
The decision on product attributes concerns the following aspect: Branding,
Packaging and labelling, Product line and mix

Price
The traditional definition of price is that the price of a product or service is
the number of monetary units a customer has to pay to receive one unit of that
product or service. In 90s a broader concept of price is more customer oriented.
In this concept, the price is the cost of an industrial goods, which includes much
more than the sellers price.99
The broad concept of price has three dimensions:
1. Recognise the difference of objective price and perceived price. The customer will not always have complete information about the price, and the
price information may also affect the customers buying process of decision.
2. Price refers not only to the monetary amount the customer paid at the time
of purchase. There are costs involved before and after the purchaseing. For
example: the time a customer must wait while purchasing, the cost of
maintenance and repair etc.
98
99

Kotler, P. and Armstrong, G., 2003, P278-279.


Blois, Keith, 2000, P212-213.

88

4. Suggested marketing strategy for XploRe


3. Price consists of the parts effort and risk. The effort part refers to the
energy and, money that customer put into a product purchasing a product.
The risk part could be functional, such as the failure of product function,
or the poor technical and delivery support.

1. The factors affecting price


The factors, which affect the price, include:
Internal factors marketing objectives, cost
External factors type of market, consumer perceptions, price-demand relationship, competitors prices and offers, other external factors such as economic conditions, governments, political and legal policies.
2. General pricing approach
Cost-based pricing: cost plus pricing, break even analysis and target profit pricing.
Buyer-based pricing: perceived value pricing.
Competition based pricing: going rate pricing, sealed-bid / tender pricing.
3. Pricing strategies
New product pricing strategy: Market skimming, market penetration.
Product-Mix pricing strategy: product line pricing, optional product pricing,
captive- product pricing, by-product pricing, product-bundle pricing.
4. Pricing Tactics
After the setting of price, companies often adjust their price due to various customer and economic situations.
Discount and allowances are the rewards for customers with certain response. The
types of discounts and allowances include cash discounts, quantity discounts, functional discounts, seasonal discounts, trade-in discounts and promotion allowances.
Discriminatory pricing is adopted when the company sell a product in different
prices. The forms of discriminatory pricing are customer segment pricing, product forms pricing, location pricing and time pricing.
Other pricing tactics compose of psychological pricing, promotion pricing and
geographic pricing.
Promotion
Promotion or communication refers to the processes and ways, through which the
company communicates with the customers. The communication is a two-way
system. The customers not only receive the information from the organisation but

4. Suggested marketing strategy for XploRe

89

also give feedback to the producer. Therefore, the customers affect the behaviour
of the producer as well. The channels of communication can be grouped into
belong mainly to two groups: personal communication channels and non-personal
communication channels.
1. Push and pull strategy
Push strategy of promotion utilises the distribution channels to promote the products.
Pull strategy of promotion adopts heavy promotions directly aimed at the end
users. This strategy is normally used in a customers (buyers) market.
2. Promotion mix
Five major promotion tools make up the promotion mix also is called marketing
communication mix:
Advertising
Advertising is any paid form of non-personal presentation and promotion
of ideas, goods, or services by an identified sponsor.100 The forms of advertising include ambient advertising, press advertising, TV advertising, Radio
advertising, outdoor advertising, transport advertising.
Sales promotion
Sales promotion is the offer of short-term incentives to the customer with
the purpose of encouraging the immediate purchase of a product The three
main type of sales promotion are customer promotion, trade promotion and
sale force promotion.
Public relations
Companies utilise the favourite publicity to building up corporate images
and good relations with various publics, as well as handling unfavourable
events.
Personal selling
The sales force of a company conducts the sale effort and the communication
with the customers.
Direct marketing
Direct marketing adopts various promotion activities to create an immediate sale, the interaction with potential customers or to maintain a the
100

Armostrong, G. and Kotler, P., 2003, P470.

90

4. Suggested marketing strategy for XploRe


lasting relation with customers. Activities in direct marketing include of
sales promotion, direct mail and catalogue marketing, integrated database
marketing, direct-response television, radio, and print marketing, Telemarketing, Telesales, Automatic vending and teller machines, direct selling,
electronic shopping.

Customer Service
Customer service is a crucial component of product strategy. Customer will expect some level of service accompanying product offer, which could be in the form
of promotion delivery, instruction, warranties, and return policies etc. In todays
market, because of the similarities of in product quality, service increasingly becomes an important tool for companies to gain a competitive advantage.
The cost is less to maintain an existing customer base for repeat purchase than to
win new customers. The task of customer service is not merely to deal with customer complaints, but also to be pro-active, to identify the needs of the customer,
and to develop a proper product strategy.
Marketing mix for Service (7 Ps)
Services have several unique characters: intangibility, inseparability, variability,
perishability and non-transferability. These characteristics differentiate services
from products. The marketing strategy for service therefore has another three
components other than the original 4 Ps. They are people, process management,
and physical evidence.
People refers to the customers of the organisation, the service personnel of
the organisation and other customers.
Process management regards to the process how the service deliver to the
customers.
Physical evidence is what the customer can sense physically that contributes to their perception of the service.101 Physical evidence has essential and peripheral kinds. A service could not be conducted without the
101

WWW32

4. Suggested marketing strategy for XploRe

91

essential evidence. Peripheral evidence refers to those aspects beside the essential evidence, which will affect the customers perception and evaluation
of the service quality.

Specific to the case of XploRe, we could understand the other 3 Ps of customer


services as: People: The customer, service personnel, and other customers are
three groups, which should be well-organised in order to meet the needs of the
customers. For example, when a customer has questions about the product, the
attitude of the customer service personnel is an important factor in keeping the
customer satisfied.
Physical evidence: Customer can sense physically what could contribute to their
perception of the service. Essential evidence and peripheral evicence are the two
kinds of physical evidence. The service could not take place without the essential
evidence. Peripheral evidence is anything else the customer will evaluate as part
of the service quality.
Take for instance, the delivery of products. Positing is an essential evidence without that the order can not be completed. But the time to finish the order,
the outlook of the package are all peripheral evidence, and one could finish the
service without paying attention to them.
Process management is about how the service is delivered to the customer. To
deliver service effectively and efficiently to the customer, customer service personnel should be aware of what factors in the process affect the perception of the
customer, and well co-ordinate them to make them happen at the right time, and
in the right appearance.

4.2

Develop the marketing strategy for XploRe

The customer analysis above gives us an insight into the market and customers
of XploRe. Based on the facts and trends of XploRe market, some suggestions in
marketing strategy of XploRe are developed.102
102

The results of complementary analysis is only for reference. The marketing strategy is
developed based on the results of analysis of non-modified data.

92

4.2.1

4. Suggested marketing strategy for XploRe

Niche market strategy

XploRe is still a new product. The XploRe marketer has limited resources under
his command. To cover a large scope of total market is not wise, and less effective.
Niche market strategy could allocate the XploRe resources more efficiently and
utilise the resources more effectively.

4.2.2

Target Market

The user cluster of Internet surfer, Academia and Linux user could be identified
as the target markets for XploRe. Home worker is a mix group, but with high percentage of presence of private company, it could be regarded as a supplementary
target market.
Since almost half of the XploRe users are from the fields of Econometrics, Finance and Mathematical Statistics, and high percentage of customers come from
research institute, research institutes in Finance and Econometrics is a high profitable market. XploRe should focus on developing this market. Another niche
market XploRe could concentrate on is the market of Research institutes of Biological research and Applied Engineering. These two markets are the high profitable sectors of XploRe.
From a geographic viewpoint, almost all XploRe users are from three Countries:
Germany, USA, and Japan. XploRe could orient their resources more on the
market in these countries

4.2.3

Product position of XploRe:103

XploRe Windows series products could define itself as a Advanced statistic software for financial & econometrics analysis. This decision based on the facts that
command driving statistical software become increasingly popular, the target
market for XploRe is the research institute in Finance and Econometrics and
the users expect the advanced methods in XploRe. For Windows Series, XploRe
could focus on to improve and promote the strength in Time series methods,
Multivariate analysis methods, Non-Semi parametric methods and Graphics /
Exploratory analysis.
103

Because the competitor analysis was not conducted in this study, therefore, here the price
strategy and monetary position will not be discussed here.

4. Suggested marketing strategy for XploRe

93

Because most Linux users of XploRe are from natural science background with
the emphasis on biology and engineering / Physics. They are interested in the
methods of Basic statistics and Graphics/ Exploratory data analysis. The Linux
based XploRe products could define it as the Statistical software for Biological research and Applied Engineering, which is especially good at Graphics and
Exploratory analysis.

4.2.4

General XploRe marketing strategy pyramids

Strategy 1: Develop Brand recognition through effective and effective


marketing communication mix.
Tactic 1: Broaden the potential customer (user) base through effective marketing mix.
- Program 1: Improve Internet site, make good contact to statistical
software search engine so that the potential customer could access to
XploRe easier.
- Program 2: Print and electronic advertising campaign using professional publications and portal banner as primary media.
- Program 3: Utilise the personal communicational channels, organise
XploRe membership club. Promotion measures could be taken together with new membership recommendations.
- Program 4: Increase the appearances in conferences and exhibitions,
and improve the effectiveness by strengthening professional presentation, such as conducting talks, seminars and workshops etc. during
the conferences and exhibitions.
Tactic 2: Further improve communication channels and communication
mix, increase the share of non-personal communication channel
- Program 1: Increase the appearance in conference and exhibition.
- Program 2: Increasing article presentations in professional publications/Journals and web sites.
Strategy 2: Increase revenue by increasing the customer turning rate
of users and customer purchase.

94

4. Suggested marketing strategy for XploRe


Tactic 1: Improve the collection of customer data, conduct User and Customer satisfaction analysis, profitable sector analysis.
- Program 1: Conduct E-shopping, online user analysis to get to know
more of the behaviour and characters of online users and customer
sector.
- Program 2: Widen data collection methods. Customer satisfaction
survey could be conducted through telephone interview, or online along
with promotion measure. Paper questionnaire survey could take together with the product distribution.
- Program 3: Through Customer panel or membership Club to gather
feedback/ information from customer/users. The membership club offers exclusive membership for customer/ free member ship for users.
The membership could effectively locate customer and user, which
will help in gather information for customer and user analysis. The
Feed back gathered through online forum or panel from user and customer could also use for analysis The information about the strength
and weakness of XploRe (the satisfaction of customer and user with
XploRe) could be gather through customer panel as well. That information could also help to understand more of customer view and their
attitudes towards XploRe.
Tactic 2: Increase the influence of XploRe in high profitable sectors.
- Program 1: Campaign to reach more customers in research institute.
Direct marketing or mailing campaign for these sectors For example,
direct mailing campaign aiming at the target markets. This measure
could reach the target markets more effectively, increase the percentage
of potential customers in high profitable and high turning rate sectors,
thus, result in turning rate of whole customer base.
Tactic 3: Tailor marketing message for each group, delivering right information to right group, build up the image of high standard professional
solution.
- Program 1: More participation and appearances in high standards
publications, events such as conferences and events.

4. Suggested marketing strategy for XploRe

95

- Program 2: Organising professional events such as conferences, seminars, workshops, discussion forum etc. through membership clubs.
- Program 3: High standards and professional solution offered by customer service could also help assistant in to achieving e this goal.
Tactic 4: Improve customer service
- Program 1: Customer service through Internet customer panel. Deliver customer service online through online forum/ panel, answer questions of customer and encourage discussions. This measure could help
in building up good image as well.
- Program 2: Customer Membership club offers professional lecture,
seminar, organise regular meetings and discussions.
- Program 3: Provide aids for problem solving, offer high quality and
professional solutions.
Tactic 5: Management and active Customer base; Keep potential customer
and actual customer base alive and active. Improve the communication with
customers and users, especially those in target markets.
- Program 1: Establish customer membership club and online forum/panel. Organise Membership based club activities, such as seminar, workshops, lectures, regular meetings, discussions etc. The user
base of XploRe is a valuable resource. If we could use this resource
actively and effectively, it will lead to unexpected returns. It should
be good managed to keep them active and alive.
Strategy 3: Concentrate on and optimise the direct marketing and
distribution channels.

Tactic 1: Improve the performance the direct online sale, Internet shopping. Because the Internet is a ultimate communication channel for XploRe
user, and currently it is the main distribution channel of XploRe. Improving
the performance and management of this channel is crucial for XploRe.
- Program 1: Improve the management of E-shop.

96

4. Suggested marketing strategy for XploRe


- Program 2: Online user and buyer behaviour analysis.
Tactic 2: Establish direct sale channel through membership club.
- Program 1: Proper marketing mix for club members, such as discount,
promotion measures.
- Program 2: Special customer service or other offers for club members.

4.2.5

General Marketing Mix

Price
Discriminate price strategy: XploRe could use discriminate price strategy. It
could offer different prices to different groups. For example: XploRe Club members and academic researchers could have discounts when they buy XploRe. The
students could get a lower price as well when they buy XploRe.

Place (Distribution)
1. Direct distribution channel

- Internet shop and online direct sale


- Direct sale channel through membership club

2. Indirect distribution channel


Because indirect distribution channels involve more channel management and
design. It may be not the current options for XploRe. But as complementary
channels, XploRe could employe the traditional indirect sale channels and sell
XploRe products through the software retailers. The other way is to coorporate
with strategy alliances, to sale XploRe as a part of solutions that are integrated
into a whole solution package.

4. Suggested marketing strategy for XploRe

97

Product
To provide benefit to the customer, XploRe should develop a line of products
and modules, which can easily be adopted to reach the demands of different
customers.
To maximise the benefit it offers, XploRe should improve and promote its
strengths and product quality in the methods the user shows high interest, Time
series, Multivariate methods, Non- and Semi- parametric methods and in Graphics and exploratory methods towards certain groups.
Product position
XploRe windows version should define them as a product for econometric and
financial analysis, since three clusters of customers specialise in Econometric and
Finance. For XploRe Linux version, it is better to define the product more as
statistical software for natural science research or application.
Very few users use XploRe Client version. Client version can expand the capacity
of the user by using the Server of XploRe, but it needs a quite good Internet
connection to conduct the performance. Considering the advantage and feature
of this product, Client Version should promote to the target group of Internet
user, who might have Internet access with high speed and good quality.
Promotion / Communication
1. Communication channels
(1) Problem in XploRe communication channels
The range of downloaders might also be confined by the communication channels, that XploRe marketer uses. As the XploRe producer has good contact with
universities, more downloader get to know about XploRe through informal personal channels, which might skew the potential customer base. XploRe marketer
should use more formal channel to expand the downloader base, meaningfully
the potential customer base. The publication and conference should play also
important roles in the information search process.
Publicity channel: For the small percentage of presence of XploRe users from
institute and company, one reason might be that the lack effectiveness of the
publicity. Most of them get to know XploRe through Internet. The percentage
of professional publications and conferences as information resources is also very
low. Therefore, the strength and effectiveness of XploRe publicity should be

98

4. Suggested marketing strategy for XploRe

improved. For instance, to publish more articles regarding or concerning XploRe,


the reports about the application of XploRe, or the publicity of researches which
use XploRe as a research instrument, etc.. More researches in software customers
behaviour are needed. The general model of other software customers could be
helpful. The special channel of information resources, the benefits they sought in
XploRe, could give value information for the improvement of XploRe products
and effectiveness of marketing activities.
Internet as a important instrument for information should be given prior attention
by XploRe. XploRe should improve their Internet presentation and should easily
be located by search engines, such as Google, etc. by key words search or fields
search. For example, when some one looks for the statistic software for Time
series analysis, they should be able to locate XploRe when they give Time series
and Statistic Software as key words. Now there is a problem that XploRe is still
not easy located through search engines.
In the Home worker group, there is a large part of users who get to information about XploRe through the channel as the Other. This group could be
understood as students who know XploRe through lectures.
(2) Improve the effectiveness of communication
To effectively communicate with the customers, XploRe marketer should tailor
their information configuration to send the right and effective message to different
groups of customers.
Getting the right feedback from actual customers is the basis for developing an
effective marketing mix to turn the potential customers (the users) into actual
customers. Aware now of the deficit in actual XploRe customer data, the measures should be designed to collect actual customer information. One method is
to pack the questionnaire together with the products. Encourage the customers
to fill in the questionnaires. Maybe some promotion activities are needed.
The language to convey the message could be tailored to the customer. Because
English is the working language for most users, Germany is the nation with highest user and customer percentage, and Japan is the rapid developing market. The
languages which XploRe marketing use could be Germany, English and Japanese.
(3) Communication channel and media choosing, Message appeal
For Internet surfer and Home worker, XploRe marketer should more concentrate on sending the target group the message of the function in Time series and
Multivariate methods, and appeal to them in the need of software for Financial

4. Suggested marketing strategy for XploRe

99

analysis.
The program based software like MatLab is quite popular in XploRe s users.
This part of users should be effectively reached by offering them right modus
and letting them learn more about the program possibility of XploRe, like the
capacity of building the library and quantlets yourself. Cluster Internet surfer
and Academia have a relatively high present of MatLab users. This message
should be effectively passed to them.
In short, XploRe market could employ both personal and non-personal communication channels, such as Internet, Publications and member club etc. to send
customers more persuasive messages and try to turn them into actual buyers.
2. Promotion mix
(1) Advertising
Advertising is any paid form of non-personal presentation and promotion of
ideas, goods or services by an identified sponsor.104
The advertisements of XploRe are mostly informative advertising. More Persuasive advertising should be used to rise the turning rate of users into actual
buyers.
The media vehicles can be flyer, brochure, professional magazine, free demo CDs
and Internet. For Internet surfer, Internet is the solely media vehicle. Online
XploRe forum could reach and maintain the group.
The banner can be considered to put in the special software searching engine
portal. For Academia, the utilisation of publication should be increased. To keep
the cost low, articles about XploRe should be encouraged and published. And
XploRe activity should be regularly reported and appear in professional publications especially in Financial and econometric publications through the active
information announcement to the publishers. For Linux user, special publications
and Internet portals should be considered as the media to reach and persuade
Linux users.
(2) Personal sales promotion
Certain conferences should be considered as well. XploRe should contribute more
conference appearance to strengthen its image. For instance, more articles can be
submitted and increase the presentations in the conferences. In addition, XploRe
could participate at more exhibitions and organise free presentations to increase
its publicity and product image as well.
104

Kotler et al, 1994

100

4. Suggested marketing strategy for XploRe

Personal selling instruments, like free seminars, demonstrations could have more
impact on Academia group, because they rely heavily on the informal communication channel. And they could draw close the relationship of XploRe to its
customer, which will benefit to create and maintain a healthy customer relationship.
To create a XploRe Club or community could effectively promote the communication of XploRe members, which could also help the potential customers to turn
to actual customers.
(3) Direct marketing

Sales promotion
XploRe can employ more customer sales promotion to encourage customers
to buy. Trial version is already in use. Other tools, like cash related offers,
demonstrations and displays could be more utilised. Sales force promotion, like activities in exhibitions, trade shows could help XploRe to gain
awareness of the products and increase sales leads.
Other direct marketing instruments
In addition, XploRe could more actively engage in other direct marketing instruments, such as direct mail and catalogue marketing, integrated
database marketing, telemarketing, electronic shopping. All these instruments could keep the potential and customer base alive, increase the turning
rate into actual customer. They would also keep the actual customer base
alive, increase the rate of repeated buying and strengthen the image of the
products.
Customer service
As a part of product, customer service is a crucial issue. Concerning the additional 3P of service, XploRe could improve its customer service in the following
aspects:
People: Improve the attitude of customer service staff.
Physical evidence: Shorten delivery time, improve the access to service for customers.
Process management: Optimise the odering process and the process of problem
solving.

4. Suggested marketing strategy for XploRe

101

Because most of the customers of XploRe are professionals, they demand high
professional products. The service of XploRe should emphase on the expertise of
the XploRe. The aspects of people, physical evidence and process management
are the points the customer service stuff of XploRe should pay attention to,
such as quick response to customer, helping to solve their problem efficiently and
effectively and active in communication etc.
The membership club and customer online forum could be the instrument to offer
high quality customer service efficiently and effectively. But the traditional way
of offering customer service should also not be forgetten. The customer service
staff should can be reached by the customer in a personal way when needed. For
example, the service hot line. Although XploRe is a high technology product,
and most customers adopt Internet and communication tools, but the traditional
methods of personal contact could establish a most vivid and personal image of
XploRe. It is sometimes frustrating for customers when they face the machinery
world and could reach no human to hear from their special need.
The service of XploRe should company the customers all the time. Before the
purchase, to give them information and advice; during the purchase with process
instruction and quick delivery and after the purchase, get to know their needs
and help to solve the problem if there is any.

4.2.6

Special marketing mix for clusters

Besides the general marketing mix for whole XploRe products, we should also
recognise that differences exist in the clusters of XploRe users and customers. To
reach them effectively and efficiently, special marketing mix measures should be
designed for them as well.
Internet surfer

Product: This group searches information absolutely through Internet, this


might mean that they have better Internet access, which is fast and convenient for them. Client version of XploRe could be promoted for this group.
Communication message: MatLab usage has a high percentage in this
group. To motivate this group, XploRe should send them the message
about XploRes programming feature and its flexibility in construct customer own program library through quantlet etc.

102

4. Suggested marketing strategy for XploRe

Media vehicle: Internet is the dominant communication tool of this group.


The online forum or panel can effectively keep this group active and improve
the communication between them and XploRe, thus build up a good image
of XploRe.
Service: An online help system could offer service to effectively satisfy the
needs of this group.
Academia

Product: Academia wants that XploRe has good performance in Nonsemiparametric methods especially. To convince this group, XploRe should
improve their strength of products in these methods.
Communicational channels: Academia uses Friends/Colleagues and Publications/Journals as main Information resources. They are a group of closely
tied professionals. To expand the influence of XploRe in this group, XploRe
member club or forum could be an effective way. The member club could
utilise the advantage of personal communicational channels in the group
to expand the base of potential customers and build up a closer relationship between XploRe and the customers. More appearance in professional
publications and journals could effectively promote XploRe through the
non-personal channels.
Communication message: Academia is a group of professionals, who are
majored in econometrics and mathematical statistics. Comparing to other
groups, they are more academic research oriented. The methods they want
are more advanced methods, such as Non-semiparameter methods. To promote XploRe in this group, the message should express XploRe in a high
professional and analytical way. The advantage of XploRe in advanced
methods in non- and semiparametric methods should be emphased.
Media vehicle: Membership club and Professional publications/ Journals
should be the main vehicles for Academia.
Service: Because Academia put more value in personal contact, it is important for them that they could reach the service personally. Therefore,
service through direct personal contact such as customer hot lines and visits
are in need for this group.

4. Suggested marketing strategy for XploRe

103

Linux user

Product: the product for Linux user should have high capability in Graphics
and exploratory data analysis.
Communication: Linux users are with nature science background. A high
percentage of them comes from research institutes, which indicates that it
could be a high return group. To reach them, XploRe should convey more
information to them about the XploRe application in Biological research
and Engineering. The emphasis on the strength of XploRe in Graphics and
Exploratory data analysis is with high importance. The media conveying
the XploRe message should be more natural science and Linux specialised
media.
Because the computer expertise of Linux users and their focus on Internet, XploRe
marketers could inform them more about the XploRes ability in utilising programs, which are written in fast languages such as Fortran or C via Dynamically
linked libraries.
Home worker

Product: Home worker group has a high Excel usage rate. This shows the
good opportunity to sell ReX to them. Graphics and Exploratory data
analysis are important aspects when the Home workers evaluate XploRe.
Communication: This group of customers are more engaged in Applied Finance. They are less academic oriented. And a high percentage of them
come from private companies. To persuade this group, XploRe should convince them that XploRe is a valuable instrument for practical financial
analysis and with good performance in Time series, Multivariate methods
and Graphics/Exploratory data analysis. The media XploRe to chose for
this group could be Publications and Journals in Finance field.

4.2.7

Marketing research - suggestions for further analysis

1. Customer data improvement


High quality customer data is ultimately crucial for customer analysis and it is

104

4. Suggested marketing strategy for XploRe

the base to understand the market and to develop the marketing strategy. To
improve the customer data, different ways of data collection could be conducted.
The questionnaire survey could be delivered through the distribution channel
together with the products. Telephone interviews directly with the customer
could be undertaken as well. Online forum or membership club could act as
active communication bases; the feedback from customers and users could be
collected for analysis purpose. From actual customers, we need to know more
about their features and their attitude towards XploRe. For the XploRe users,
the follow up online survey could also be carried out to get feedback of their
satisfaction grade with XploRe.
2. Association rule, sequential rule analysis for user and customer data
In future analysis, the association rule and sequential rule analysis could be taken
to find out the relationship between the data. Such analysis could be conducted
for both user and customer data to find the features of the users group, which
have a high turning rate into customer. It will be really exiting, if such a group
or rules could be identified. Then XploRe marketers could allocate their resource
more effectively to reach the high profitable customers.
3. Improvement of questionnaire
We are interested in studying features and behaviour patterns of the customers.
The questions and choices appearing in the questionnaire should all have a marketing reason back ground. Unclear questions or choices will lead to confusion
in the results, which could lead to confusion or bad result. In the current questionnaire, there exist shortcomings. We could make some improvement in future
analysis.
(1) Variable At Home
The disappearing of group Researcher and the emerging of the group Home
worker are the consequences of the value At home in variable Work Place.
Home worker is a mixed group. It is difficult to identify them and reach them.
And Home worker has also mixed characters. They might be the people using
XploRe at home, but work in other places. This fact induces the confusion to the
answers. Furthermore, to identify where people actually use XploRe, at home or
working place, has no obvious marketing reasoning. Therefore, the improvement
in the questionnaire should be made to some questions and choices.
(2) Question of Country
Another shortcoming in the questionnaire exists in the choice of Country. Since
there is no question or explanation to this choice, one could understand that this

4. Suggested marketing strategy for XploRe

105

choice asks for his original nationality. In this survey, the original nationality is
not important because the cultural difference is not an important factor in this
survey, but the geographic factor. We want to use this choice to locate geographic
markets. For example, a user originally comes from Africa but is now working
in France. When he downloaded XploRe, it is more important to know that in
which country he works and in which country he will use XploRe. Because in this
country, he more likely to buy XploRe. In this case, we want him he to choose
France as answer. But in fact, he might choose the country of his nationality.
This will mislead the marketer. The marketing measures will not reach him in
Africa, but in France. From this point of view, the question of in which country
you work or use XploRe maybe is more proper.
4. Solve the problem of large size of domain through regrouping
Some variables in the survey have large size of domain, such as Software (12),
Fieldwork (10), Method used (12), Methods looked for (12) and country (93). The
large domain size results in low Condorcet value when these variables are taken
as input variables. These facts lead to the almost exclusion of the possibility
of those variables as input variables. Some quite interesting variables such as
methods looked for and software only can be used as complementary variables in
the analysis.
The improvement is already made in the analysis of Regrouped data analysis.
The variables are regrouped and combined together to form smaller domains.
In future analysis, the improvement of the regrouping could be conducted. The
nature of the variables and choices should be studied and examined in more
detail. The regrouping of variable values should represent the marketing reason as
well, which must be based on more detailed study of the products and methods.
For example, for software the following questions should be answered. What
are the exact features for each group of software? Which group is with similar
characteristic with XploRe? In which aspects? Will such grouping help to find
the target market of XploRe?
5. Further study of high profitable customer sector
High percentage of customers turns out to be from research institute. To locate
this group, further study should focus on this group. The feature and behaviour
patterns of this group are important for XploRe marketer to design more effective
marketing measures to reach them and persuade them.
6. Online user segmentation analysis
With the development of E-commerce, Internet is the dominant instrument of

106

4. Suggested marketing strategy for XploRe

communication for XploRe customers. It is the main marketing tool for XploRe
as well. To effectively reach the customer online and make the marketing effort
more profitable, it is necessary for the XploRe marketer to understand the online
users and buyers behaviours patterns and features.
There are already many studies focusing on this topic. McKinsey segmented the
online user into six groups: Simplifiers, Surfers, Bargainers, Connectors, Routiners and Sporters according to the differences in active time online, pages and
domains accessed, and active time spent per page.105 Other normally used attributes to segment and measure the loyalty of online user are Frequency, Recency.
Customer response, retention and valuation model or The Recency, Frequency,
Monetary value (RFM) is also useful customer model to predict the future value
and loyalty of customer segments and help to create high ROI promotions.106
Because Internet is crucial for XploRe , for XploRe marketer the task to learn
more about their online users and customers are with primary importance.
7. Competitor analysis and SWOT analysis for deriving a full scale
marketing strategy for XploRe
Here only customer analysis was undertaken. In order to derive a full scale marketing strategy and marketing mix for XploRe, more insight analysis for market
and product should be conducted, such as the competitor analysis, SWOT analysis, etc. All these are future tasks of XploRe marketer.

105
106

McKinsey & Company


WWW17

References
AAKER, DAVID A.: Strategic market management, Six Edition, John Wiley
& Sons, Inc., 2001.
ALDENDERFER, M. S. and BLASHFIELD, R. K.: Cluster Analysis, Series:
Quantitative Applications in the Social Science, SAGE University Papers,
Sage Publications, Inc. 1984.
ANDERSEN, ERLING B.: Introduction to the statistical analysis of categorical
data, Springer, 1997.
ANDRITSOS, PERIKLIS: Data clustering techniques Qualifying oral examination paper, Department of computer science, University of Toronto,
March 11, 2002.
ABELL, D. F. and HAMMOD, J. S.: Strategic Market Planning: Problems
and Analytical Approaches, Prentice-Hall. Inc., 1979.
ARMSTRONG, G. and KOTLER, P.: Marketing An Introduction, Pearson
Education, Inc., 2002.
BACKHAUS, K., ERICHSON, B., WEIBER, W., PLINKE, R.: Multivariate Analysemethoden Eine anwendungsorientierte Einf
uhrung, - 9.,
u
berarbeitete und erweite Auflage, Springer Verlag, 2000. P328- 389.
BAKER, M. and HART, S.: Product Strategy and Management, Prentice
Hall, 1999.
BANNES, E., McCLELLAND, B., MEYER, R.: Marketing An active learning
approach, Blackwell Publications Ltd., 1997. P138-237
BERRY, MICHAEL J. A., LINOFF, G.: Data Mining Techniques For Marketing, Sales, and Customer Support, John Wiley & Sons, Inc., 1997.
BLOIS, KEITH: The Oxford Textbook of Marketing, Oxford University Press,
2000.
BLYTHE, J.: Marketing Strategy, McGraw-Hill Education, 2003.
107

108

References

BOUNSAYTHIP, C., RUNSALA, E.: Overview of data mining for customer


behavior modeling, Version 1, 29 June 2001, Research report TTEI- 200218, VTT Information Technology.
CARBONE, PATRICIA L.: Data Mining or Knowledge Discovery in
Databases: An Overview, The MITRE Corporation, 1997. http://www.
mitre.org/pubs/data-mgt/papers/DMHdbk.pdf
CANNON, TOM: Marketing Principles & practice, 5th Edition J.W. Arrowsmith Ltd. 1998. P133-143.
CHUNG, H. M., GRAY, P., MANNINO, M.: Introduction to Data Mining
and Knowledge Discovery,1998. http://www.computer.org/proceedings/
hicss/8245/82450244.pdf
DOBASHI, JUNYA: Research on the summarization of entries in genome
database using Data Mining Technology, School of Knowledge Science, Japan Advanced Institute of Science and Technology, March 2002.
http://www.jaist.ac.jp/library/thesis/ks-masterEVERITT, Brian S.: Cluster analysis, Arnold, 1993.
EVERITT, B. S. and DUNN, G.: Applied Multivariate Data Analysis, Edward
Arnold, 1991, P99-126.
FAYYAD, U., PIATETSKY-SHAPIRO, G., SMYTH, P. and UTHURUSAMY,
R. : Advances in Knowledge Discovery and Data Mining, American Association for Artificial Intelligence, 1996.
FERGUSON, MIKE: Evaluating and Selecting Data Mining Tools, http://
www.dbaint.com/pdf/v11n21.pdf
GANTI, V., GEHRKE, J., RAMAKRISHNAN, R.: CACTUS-Clustering
Categorical Data Using Summaries, Department of Computer Sciences,University of Wisconsin-Madison,1999. http://www.cs.cornell.edu/
johannes/papers/1999/kdd1999-catus.pdf
GARY KELSEY AND ASSOCIATES: Planning process outline, http://www.
hfpg.org/pdf/plan handouts.pdf
GIBSON, D., KLEINBERG, JON M., RAGHAVAN, P.: Clustering categorical
data: An approach based on dynamical systems, In Proceedings of the 24th

References

109

International Conference on Very large Data Base (VLDB), P 311-322, New


York, NY, USA, 24-27, August 1998. Morgan Kaufmann, 1998.
GRABMEIER, J., RUDOLPH, A.,: Techniques of Cluster Algorithms in Data
Mining, Data Mining and Knowledge Discovery, 6, 303-360, 2002. Kluwer
Academic Publishers, 2002..
GUPTA, S.K., SAMBASIVA RAO, K., BHATNAGA, V.: K-means Clustering
algorithm for categorical attributes. http://www.cse.iitd.ernet.in/skg/ps/
send.ps
HAN, J., KAMBER, M.: Data mining Concepts and techniques, Morgan
Kaufmann, 2001.

HARDLE,
W., KLINKE, S., MULLER,
M.:
Springer Verlag, 2000.

XploRe

Learning Guide,

HARDLE,
W., SIMAR, L.: Applied multivariate statistical analysis, Insti
tute f
ur Statistik und Okonometrie,
Wirtschaftswissenschaftliche Fakultat,
Humboldt Universitat zu Berlin, 2000. P295-312.
HEYGATE, RICHARD: Customer Analysis, Sophron Partners Ltd, 1998.
http://www.icare.cl/CAM/pdf/15.pdf
HILL, C. W. L. and JONES, G. R.: Strategic Management Theory An Integrated Approach, Fourth Edition, Houghton Mifflin Company, 1998.

HIPPNER, H., KUSTERS,


U., MEYER, M., WILDE, K.: Handbuch Data
Mining im Marketing Knowledge discovery in marketing databases, Friedr.
Viewer & Sohn Verlaggesellschaft mbH, 2001.
HOLLANDER, S. C. and GERMAIN, R.: Was There a Pepsi Generation before Pepsi discovered it? Youth-based segmentation in marketing, NTC
Business Books, 1992.
HOWARD, JOHN. A.: Buyer Behavior in marketing Strategy, Second Edition, Prentice- Hall, Inc., 1994.
IBM, Data Management Solution, White paper, IBMs Data Mining
Technology, 1996.
http://www.acm.org/sigs/sigmod/disc/dis99/ibm/
datamine.pdf

110

References

IBM Corp.: Tutorial of IBM DB2 Intelligent Miner for Data, IBM Corp.
1999.
JACOB, FRANK: Development, management and governance of relationship,
International Conference on relationship marketing, March 29-31,1996,
Berlin.
JAIN, ANIL K., DUBES, RICHARD C.: Algorithms for Clustering Data,
Prentice-Hall, 1988.
JARKE, M., LENZERINI, M., VASSILIOU, Y., VASSILIADIS, P.: Fundamentals of Data Warehouse, Springer, 1999.
JOACHIM MUNCHA, H., SOFYAN, H.: Cluster analysis, Discussion paper 49, 2000, Sonderforschungsberich 373, Quantifikation und Simulation

Okonomischer
Prozesse, Humboldt-Universitat zu Berlin.
JOACHIM MUCHA, HANS: Clustering in an interactive way, Discussion paper 13, 1995, Sonderforschungsberich 373, Quantifikation und Simulation

Okonomischer
Prozesse, Humboldt-Universitat zu Berlin.
KALAKOTA, R. and WHINSTON, A. B.: Do or Die: market Segmentation
and Product Positioning on the Internet, http://cism.bus.utexas.edu/res/
articles/segmentation.html
KAUFMAN, L., ROUSSEEUW, PETER J.: Finding Groups in Data An
Introduction to Cluster Analysis, A Wiley-Interscience Publication, John
Wiley & Sons, Inc. 1990.
KOTALA, P., PERERA, A., KAI ZHOU, J., MUDIVARTHY, S., PERRIZO,
W., DECKHARD, E.: Gene expression profiling of DANN Microarray
Data Peano Count Trees (P-Trees), North Dakota State University, 2001.
http://www.ndsu.edu/virtual-genomics
KOTLER, P.: Marketing management, 7th ed., Englewood Cliffs, New Jersey:
Prentice- Hall, 1991.
KURES, M., RYAN, B. and LAMB, G.: Customer Profiling and Prospecting
Analysis: For the Door County Lodging Industry, University of WisconsinExtension, May 2001.

References

111

LAMBIN, JEAN-JACQUES: Strategic Marketing Management, Mcgraw-Hill


Publishing Company, 1997.
LILIEN, GARY L., KOTLER, P., SRIDHAR, K.: Marketing models, PrenticeHall, International Editions, 1992.
LUGER, ADOLF E., PFLAUM, D: Marketing Strategie und Realisierung,
Carl Hanser Verlag, M
unchen, Wien, 1996. P69-75
MARDIA, K.V., KENT, J. T. and BIBBY J. M.: Multivariate Analysis,
Academic Press, 1979, P360-393.
MATSUSHIMA, HITOSHI: Direct mechanisms, virtual implementation, and
majority- Proofness, The university of Tokyo, April 2002. http://www.e.utokyo.ac.jp/cirje/research/dp/2002/2002cf149.pdf
MCDONALD, M. and DUNBAR, I.: Market Segmentation How to do it, how
to profit from it, Second Edition, Macmillan Press Ltd., 1998.
McKinsey & Company: All Visitors Are Not Created Equal Knowing Online Consumer Segments and How to Attract them Are Key, McKinsey Marketing Practice. http://marketing.mackinsey.com/solutions/McKconsumer.pdf
MICHAUD, PIERRE: Condorcet A Man of The Avant-Grade, Applied
Stochastic models and Data Analysis, Vol. 3, 1987, John Wiley & Sons,
Ltd., P173-189.
NOVO, JIM: Traffic,
Visitor,
and Customer Analysis,
2003.
http://www.devwebpro.com/devwebpro-39-20030403Traffic,-Visitor,and-Customer-Analysis.html
PETRO, SHERRI. Strategic Marketing
segmentation,
accountant.intuit.com/tools resources/marketing/articles/
sp strategicmarketing segmentation.html

http://

PLYE, DORIAN: Data preparation for data mining, Academic Press, 1999.
RAJOLA, FEDERICO: Customer Relationship management Organizational
and Technological Perspectives, Springer, 2003.

112

References

RONZ,
BERND: Computergest
utzte Statistik, Institute f
ur Statistik und

Okonometrie, Wirtschaftswissenschaftliche Fakultat, Humboldt Universitat


zu Berlin, 2000.
SALOMAA, JUHA: Customer analysis as a foundation of company profitability, TU- 91.167 Seminar in Business Strategy and International
Business. http://www.tuta.hut.fi/studies/ courses and schedules/Isib/TU91.167/ Seminar-papers/Salomaa Juha.pdf
SCHNITTKER, JONSON, Cluster analysis, 2000, http://www.indiana.edu/
socsrp/cluster analysis.pdf
SHI, X., FAN, Z.: Data analysis method, Shanghai University of Finance and
Economics Press, 1997.
SOFYAN, H., WERWATZ, A.: Analyzing XploRe download profiles with Intelligent Miner, Computational Statistics (2001) 16:465-479, Physica-Verlag
2001.
SOKAL, R. R., MICHENER, C. D.: A statistic methodstatistical method for
evaluating systematic relationships, Univ.Kansas Sci. Bull., 38, 1409-1438.
STANTON, WILLIAM J., FUTRELL, C.: Fundamentals of Marketing,
McGraw-Hill, 1987. P156-169
TOWELL, P. and MASON, J.: Draft Marketing Strategy and Plan for
ISO/IEC JTC1 Sc36, 2002. http://jtc1sc.org/doc/36N0253.pdf
TRUCHON, MICHEL: An extension of the condorcet criterion and
kemeny orders,
1998,
http://collection.nlc-bnc.ca/100/200/300/
univ laval/dep economique/cahier/ 1998/9813/9813.pdf
WALKER, JR. O. C., BOYD, JR. H. W., MULLINS, J., and LARRECHE,
J.: Marketing Strategy A decision-focused approach, Fourth Edition,
McGraw-Hill Irwin, 2003.
WANG, H., WANG, W., YANG J., YU, PHILIP S.: Clustering by pattern
similarity in large data sets, IBM T. J. Watson Research Center, 2002,
http://www.cis.ohio-state.edu/hakan/CIS888/SIGMOD02-2.pdf.
WILSON, RICHARD M.S., GILLIGAN, C.: Strategic marketing management Planning, implementation and control, Butterworth-Heinemann,
1997. P155-193

References

113

WOJCIECHOWSKI, MAREK: Discovering and Processing Sequential Patterns in Databases, Poznan University of Technology, Institute of
Computing Science, Poland. http://www.edbt2000.uni-konstanz.de/phdworkshop/ papers/wojciehowski.pdf
WRIGHT, M. and ESSLEMONT, D.: The Logical Limitations of Target Marketing, Marketing Bulletin, 1994,5,13-20. http://marketingbulletin.massey.ac.nz/ article5/article2b.asp
WWW1:
Hamming
hammingdist.html

distance,

http://www.nist.gov/dads/HTML/

WWW2: Channel code, Hamming distance, http://www.cs.ucl.ac.uk/staff/


S.Bhatti/D51-notes/node30.html
WWW3: Cluster analysis, http://www.rand.org/publications/MR/MR1304/
MR1304.appc.pdf
WWW4:
Introduction to Data Mining, http://www.doc.ac.uk/frk/
frank/kmt/Data%20Mining20%/handout.pdf
WWW5: XploRe Information Guide, 2001. http://www.xplore-stat.de
WWW6: Wards Cluster method, http://www.bus.sfu.ca/courses/bus846/
Wards.htm
WWW7:
Understanding Market Segmentation by DSS Research,
http://www.dssresearch.com/Library/Segment/understanding.asp
WWW8: eBusiness Performance management software for Online Retailing,
Visual Insights. Inc. 2001. http://www.visualinsights.com/PDF/retail.pdf
WWW9: Customer Segmentation Analysis.
pdf/CSA.pdf.

http://www.mosbygrey.com/

WWW10:Customer profiling Analytical Application


Turning valuable Customer Data into Knowledge and Action.
1999.
http://www.symmetrics.net/resources/whiteppre/cps.pdf
WWW11:How to promote B2C, using Online Customer Experience
Analysis,2002.
http://www.ecommerce.or.th/nceb2002/ paper/-12How to promote:B2c.pdf

114

References

WWW12:Grow your business with strategic customer


http://www.marketwise.net/strategic-customer.html
WWW13: Customer Analysis.
analysis.html

analysis.

http://www.psychadvantage.com/cust-

WWW14: Customer Analysis: A Manual of Techniques, The University Libraries, University of Southern California, 1997. http://isd.usc.edu/ jkwan/CAManual.pdf
WWW15: What is CRM?, http://www.edgeservices.com/salesmarketing/
what is crm.shtml
WWW16:Customer Profiling Pandectas Guide To Accurate Customer profiling, http://pandects.com/customer profiling.html.
WWW17: Drilling Down Turning Customer Data into Profits with a Spreadsheet, http://www.drilling-down.com/profiles.htm
WWW18: Database Marketing: Customer Profiling, http://www.schooldata.
com/ssm-profiling.html
WWW19: What is a customer profile, http://sic.nvgc.vt.edu/SICstuffVirtual/KANG/WWW/tutorial profile.html
WWW20: Marketing Segmentation, http://www.educationsupport.co.uk/
downloads/rjh/segmentation.pdf
WWW21:
Marketing Mix,
lesson marketing mix.htm

http://www.marketingteacher.com/Lesson/

WWW22: Marketing Strategy, http://sbindocanada.about.com/library/


weekly/aa072500a.htm
WWW23: Develop a Product Marketing Strategy, http://www.robertwinton.
com/marketing.htm
WWW24:
The Marketing
marketing/mix.html

Mix,

http://sol.brunel.ac.uk/javis/bola/

WWW25: The marketing Mix, http://www.quickmba.com/marketingmix/

References
WWW26:
Market segmentation,
factory/marketing/theories2.htm

115
http://www.bized.ac.uk/virtual/cb/

WWW27: Marketing-Positioning, strategies, segmentation, niches, http://


www.determan.net/Michele/mposition.htm
WWW28: Chanimal Marketing Basics, http://www.chanimal.com/html/basics.html
WWW29: Case Study-Market segmentation, http://www.crinfnorthamerica.
com/solutions/cstudies/segmentation.asp
WWW30: 5 Steps to Improving Your companys market Positioning,
http://www.compstrategy.com/fivesteps.htm
WWW31: Understanding market Segmentation, http://www.dssresearch.
com/library/segment/understanding.asp
WWW32: Marketing Course, Faculty of Business and Economics, Monash
University, Australia.
http://www.buseco.monash.edu.au/depts/Mkt/
mtp online/sevenps.html
WWW33: Focusing Marketing Strategy with Segmentation and Positioning,
http://www.mhhe.com
ZELL, A. J.: Developing A Marketing Mix, http://www.sellingselling.com/
articles/mktmix.html

Appendix

116

Appendix 1: User 220702 Frequency Analysis


First Learn
WWW, newsgroups
Publications, journals
Friends, colleagues
Conferences
Other
Sum

Frequency
507
216
206
41
211
1181

Percentage
42.9
18.3
17.4
3.5
17.9
100.0

Methods Looked For


Time series
Multivariate methods
Non. and semiparametric methods
Graphics and exploratory data analysis
Basic statistics
Panel data/cross-sectional time series
Generalized linear models and limited dependent variables
Resampling and simulation methods
Tools for learning or teaching statistics
Survival analysis
Other
Sum

Frequency
204
165
155
144
113
64
63
58
39
28
86
1181

Percentage
17.3
14.0
13.1
12.2
9.6
5.4
5.3
4.9
3.3
2.4
7.3
100.0

Where Work
University
At home
Research institute
Private company
Government or international organization
Other
Sum

Frequency
584
341
107
78
28
43
1181

Percentage
49.4
28.9
9.1
6.6
2.4
3.6
100.0

Xversion
Local
ReX
Client
Sum

Frequency
1022
110
49
1181

Percentage
86.5
9.3
4.1
100.0

Field Work
Econometrics
(Mathematical) Statistics
Finance and actual science
Physics and engineering
Biometrics or Biostatistics
Social science (sociology, psychology, etc.)
Risk analysis
Marketing and survey research
Epidemiology
Other
Sum

Frequency
285
141
130
119
117
70
62
50
20
187
1181

Percentage
24.1
11.9
11.0
10.1
9.9
5.9
5.2
4.2
1.7
15.8
100.0

117

118

Appendix 1
Methods Used
Time series
Basic statistics
Multivariate methods
Linear models
Graphics and exploratory data analysis
Non- and semiparametric methods
Generalized linear models and limited dependent variables
Panel data/cross-sectional methods
Resampling and simulation methods
Tools for learning or teaching statistics
Survival analysis
Other
Sum

Frequency
221
185
166
122
92
82
78
38
38
31
29
74
1181

Percentage
18.7
15.7
14.1
10.3
7.8
6.9
6.6
3.2
3.2
2.6
2.5
6.3
100.0

Software
Excel
SPSS
MatLab
R
Eviews
SAS
S/S-Plus
GAUSS
Statistica
MiniTab
Stata
LIMDEP
Rats
TSP
Xlisp-stat
XGobi
Other
Sum

Frequency
297
132
123
89
78
76
65
47
39
32
32
12
11
10
5
1
132
1181

Percentage
25.1
11.2
10.4
7.5
6.6
6.4
5.5
4.0
3.3
2.7
2.7
1.0
0.9
0.8
0.4
0.1
11.2
100.0

Platform L
Windows NT
Linux
Solaris
Others
Sum

Frequency
952
149
13
18
1132

Percentage
84.1
13.2
1.1
1.6
100.0

Platform C
Windows NT
Linux
Apple
Solaris
Sum

Frequency
43
3
3
0
49

Percentage
87.8
6.1
6.1
0.0
100.0

Appendix 1

119

Continent
Europe
America
Asia-Pacific
Africa
Sum

Frequency
622
289
242
28
1181

Percentage
52.7
24.5
20.5
2.4
100.0

Country
Germany
USA
Japan
Sum

Frequency
199
186
102
1181

Percentage
16.9
15.7
8.6
100.0

Platform
Windows NT
Linux
Solaris
Others
Sum

Frequency
995
152
13
21
1181

Percentage
84.3
12.9
1.1
1.8
100.0

Appendix 2: Customer Frequency Analysis (Nov. 05)


State Name
Germany
USA
Japan
Italy
Denmark
France
Norway
The Netherlands
UK
China
Taiwan
Missing value
Sum

Frequency
11
8
3
2
1
1
1
1
1
1
1
1
32

Percentage
34.4
25.0
9.4
6.2
3.1
3.1
3.1
3.1
3.1
3.1
3.1
3.1
100.0

Federal State
Baden-W
urttenberg
Berlin
Brandenburg
Schleswig-Holstein
Rheinland-Pfalz
Missing value
Sum

Frequency
1
1
1
1
1
27
32

Percentage
3.1
3.1
3.1
3.1
3.1
84.4
100.0

OS
Windows 2000/NT
Windows 95/98
Missing value
Sum

Frequency
6
4
22
32

Percentage
18.8
12.5
68.8
100.0

Title
Prof.
Dr.
Prof. Dr.
Missing value
Sum

Frequency
3
2
2
25
32

Percentage
9.4
6.2
6.2
78.1
100.0

Sex
Man
Woman
Sum

Frequency
25
7
32

Percentage
78.1
21.9
100.0

Language
English
German
French
Italian
Spanish
Missing value
Sum

Frequency
6
5
1
1
0
19
32

Percentage
18.8
15.6
3.1
3.1
0.0
59.4
100.0

Sector
Research Institute
Company
Missing value
Sum

Frequency
11
1
20
32

Percentage
34.4
3.1
62.5
100.0

Branch
Economics
Statistics
Biostatistics
Mathematics
Computer science
Missing value
Sum

Frequency
3
1
1
1
1
25
32

Percentage
9.4
3.1
3.1
3.1
3.1
78.1
100.0

120

Appendix 3: Customer Registration form.

121

Appendix 4: Characteristics of User220702 Clusters by XploRe


Cluster Internet Surfer
Cluster Character Variable
Size(abs.) 155
First learn
Size(rel.)
13.1% Fieldwork
Where Work
Xversion
Platform

Attributes
WWW, Newsgroup
Econometrics
University
Local
Windows

Freq.
99.4%
20%
100%
100%
100%

Cluster Home worker


Cluster Character Variable
Size(abs.) 652
First learn
Size(rel.)
55.2%
Fieldwork
Where Work
Xversion
Platform

Attributes
WWW, Newsgroups
Others
Econometrics
At Home
Local
Windows

Freq.
39.3%
28.1%
23.8%
44.2%
77.1%
96.5%

Attributes
Friends, Colleagues
Econometrics
University
Local
Windows

Freq.
51.1%
40.5%
100%
100%
98.9%

Attributes
WWW, Newsgroup
Others
Biometrics & Biostatistics
University
Local
Linux

Freq.
51.6%
21.7%
19%
58.2%
94.6%
76.6%

Cluster Academia
Cluster Character
Size(abs.) 190
Size(rel.)
16%

Variable
First learn
Fieldwork
Where Work
Xversion
Platform

Cluster Linux User


Cluster Character Variable
Size(abs.) 184
First learn
Size(rel.)
15.6% Fieldwork
Where Work
Xversion
Platform

122

Appendix 5: User 130303 Frequency Analysis


First Learn
WWW, newsgroups
Publications, journals
Friends, colleagues
Conferences
Other
Sum

Frequency
846
354
324
67
354
1945

Percentage
43.5
18.2
16.7
3.4
18.2
100.0

Methods Looked For


Time series
Multivariate methods
Non. and semiparametric methods
Graphics and exploratory data analysis
Basic statistics
Linear models
Generalized linear models and limited dependent variables
Panel data/cross-sectional time series
Resampling and simulation methods
Tools for learning or teaching statistics
Survival analysis
Other
Sum

Frequency
331
279
256
228
194
109
99
90
82
74
44
159
1945

Percentage
17.0
14.3
13.2
11.7
10.0
5.6
5.1
4.6
4.2
3.8
2.3
8.2
100.0

Where Work
University
At home
Research institute
Private company
Government or international organization
Other
Sum

Frequency
930
576
174
140
42
83
1945

Percentage
47.8
29.6
8.9
7.2
2.2
4.3
100.0

Field Work
Econometrics
(Mathematical) Statistics
Finance and actual science
Physics and engineering
Biometrics or Biostatistics
Risk analysis
Social science (sociology, psychology, etc.)
Marketing and survey research
Epidemiology
Other
Sum

Frequency
428
253
248
201
165
124
111
70
34
311
1945

Percentage
22.0
13.0
12.8
10.3
8.5
6.4
5.7
3.6
1.7
16.0
100.0

123

124

Appendix 5
Methods Used
Time series
Basic statistics
Multivariate methods
Linear models
Non- and semiparametric methods
Graphics and exploratory data analysis
Generalized linear models and limited dependent variables
Panel data/cross-sectional methods
Tools for learning or teaching statistics
Resampling and simulation methods
Survival analysis
Other
Sum

Frequency
373
332
273
213
136
132
114
90
60
54
44
124
1181

Percentage
19.2
17.1
14.0
11.0
7.0
6.8
5.9
4.6
3.1
2.8
2.3
6.4
100.0

Software
Excel
MatLab
SPSS
R
Eviews
SAS
S/S-Plus
GAUSS
Statistica
Stata
MiniTab
LIMDEP
Rats
TSP
Xlisp-stat
XGobi
Other
Sum

Frequency
504
214
205
145
126
116
100
80
62
56
49
23
15
15
8
5
222
1945

Percentage
25.9
11.0
10.5
7.5
6.5
6.0
5.1
4.1
3.2
2.9
2.5
1.2
0.8
0.8
0.4
0.3
11.4
100.0

Platform L
Windows NT
Linux
No
Solaris
Other
Sum

Frequency
1595
231
80
21
18
1945

Percentage
82.0
11.9
4.1
1.1
0.9
100.0

Platform C
No
Windows NT
Apple
Linux
Solaris
Sum

Frequency
1865
66
9
5
0
1945

Percentage
95.9
3.4
0.5
0.3
0.0
100.0

Appendix 5

125

Continent
Europe
America
Asia-Pacific
Africa
Other
Sum

Frequency
983
486
421
40
15
1945

Percentage
50.5
25.0
21.6
2.1
0.8
100.0

Country
Germany
USA
Japan
Sum

Frequency
318
310
168

Percentage
16.4
15.8
8.1
0.0

Platform
Windows NT
Linux
Solaris
Other
Sum

Frequency
1661
236
21
27
1945

Percentage
85.4
12.1
1.1
1.4
100.0

Xversion
Local
ReX
Client
Sum

Frequency
1643
222
80
1945

Percentage
84.5
11.4
4.1
100.0

Appendix 6: User 13032003 Intelligent Miner Cluster Analysis


Internet surfer (36%)

Academia (29%)

First learn
WWW, newsgroups

100%

Friends, colleagues
publications, journals
conferences
WWW, newsgroups
other

39%
31%
6%
0%
23%

university
at home
private company
research institute
gov./international org.

44%
32%
10%
8%
2%

university
research institute
private company
gov./international org.
at home
other

88%
8%
2%
2%
0%
2%

Excel
MatLab
SPSS
SAS
other

28%
14%
11%
8%
11%

Excel
MatLab
SPSS
Eviews

20%
11%
9%
9%

17%
14%
13%
11%
17%

Econometrics
(Mathematical) Statistics
Finance & actuarial sc.
Bio-metrics/statistics
other

38%
17%
10%
6%
9%

20%
16%
11%
11%
8%

Non/semipara.meth.
Time series
Multivariate meth.
Basic statistics
Graph./ explor.analy.

20%
16%
12%
10%
8%

21%
18%
16%
10%
7%

Time series
Basic statistics
Linear models
Multivariate meth.
Non/semipara.meth.

19%
14%
13%
12%
11%

99%
1%

Windows NT
Linux

98%
1%

85%
12%
4%

Local
ReX
Client

86%
9%
5%

Where Work

Software

Field Work
Econometrics
Finance & actuarial sc.
Physics & engin.
(Mathematical) Statistics
other
Methods looked for
Time series
Multivariate meth.
Non/semipara.meth.
Graph./ explor.analy.
Basic statistics
Methods used
Time series
Basic statistics
Multivariate meth.
Linear models
Graph./ explor.analy.
Platform
Windows NT
Solaris
Xversion
Local
ReX
Client

126

Appendix 6

127

Home Worker (21%)


publications, journals
Friends, colleagues
conferences
WWW, newsgroups
other

32%
19%
5%
0%
44%

at home
private company
research institute
gov./international org.
university
other

67%
12%
10%
3%
0%
8%

Excel
SPSS
MatLab
Eviews
other

35%
9%
9%
7%
13%

Finance & actuarial sc.


Econometrics
(Mathematical) Statistics
Physics & engin.
other

19%
15%
12%
10%
21%

Time series
Multivariate meth.
Graph./ explor.analy.
Non/semipara.meth.
other

16%
16%
15%
10%
11%

Time series
Basic statistics
Multivariate meth.
Linear models
other

19%
16%
15%
11%
9%

Windows NT

Local
ReX
Client

100%

78%
19%
3%

Linux User (14%)


First learn
WWW, newsgroups
publications, journals
Friends, colleagues
conferences
other
Where Work
university
at home
research institute
private company
gov./international org.
other
Software
R
Excel
SPSS
MatLab
other
Field Work
Bio-metrics/statistics
Physics & engin.
Econometrics
(Mathematical) Statistics
other
Methods looked for
Graph./ explor.analy.
Basic statistics
Multivariate meth.
Time series
Non/semipara.meth.
Methods used
Basic statistics
Time series
Multivariate meth.
Graph./ explor.analy.
Linear models
Platform
Linux
Solaris
Xversion
Local
Client
ReX

General User (100%)


56%
17%
8%
3%
17%

WWW, newsgroups
publications, journals
Friends, colleagues
conferences
other

44%
18%
17%
3%
18%

43%
31%
12%
5%
3%
6%

university
at home
research institute
private company
gov./international org.
other

48%
30%
9%
7%
2%
4%

20%
18%
13%
6%
19%

Excel
MatLab
SPSS
R
other

26%
11%
11%
8%
11%

18%
14%
11%
11%
22%

Econometrics
(Mathematical) Statistics
Finance & actuarial sc.
Physics & engin.
other

22%
13%
13%
10%
16%

16%
16%
14%
13%
11%

Time series
Multivariate meth.
Non/semipara.meth.
Graph./ explor.analy.
Basic statistics

17%
14%
13%
12%
10%

22%
15%
12%
12%
9%

Time series
Basic statistics
Multivariate meth.
Linear models
Non/semipara.meth.

19%
17%
14%
11%
7%

86%
5%

Windows NT
Linux

85%
12%

90%
5%
5%

Local
ReX
Client

85%
11%
4%

Appendix 7: Comparison of User and Regrouped User Data


User
Name
First Learn
Work Place
Software
Work Field
Method Used
Method Looked for
Xversion
Platform L
Platform C
OS Platform
Country
Continent

Regrouped User

Modal Value

Modal
Freq.

No. of
Values

Modal Value

Modal
Freq.

No. of
Values

WWW, Newsgroup
University
Excel
Econometrics
Time Series
Time Series
Local
Windows NT
Windows NT
Windows NT
Germany
Europe

43.5%
47.8%
25.9%
22.0%
19.2%
17.0%
84.5%
82.0%
95.9%
85.4%
16.4%
50.5%

5
6
17
7
12
12
3
4
4
4
93
4

WWW, Newsgroup
University
Statistics
Econometrics
Multi./Non-Semipara.meth.
Multi./Non-Semipara.meth.
Local
Windows NT
Windows NT
Windows NT
Germany
Europe

43.5%
47.8%
30.0%
22.0%
40.1%
40.5%
84.5%
82.0%
95.9%
85.4%
16.4%
50.5%

5
6
6
7
7
7
3
4
4
4
93
4

128

Appendix 8: User 130303 (Regrouped) Frequency Analysis


First Learn
WWW, newsgroups
Publications, journals
Friends, colleagues
Conferences
Other
Sum

Frequency
846
354
324
67
354
1945

Percentage
43.5
18.2
16.7
3.4
18.2
100.0

Methods Looked For


Multivariate methods/Non. and semiparametric methods/
Generalized linear models and limited dependent variables/
Linear models/Survival analysis
Time series
Graphics and exploratory data analysis/
Tools for learning or teaching statistics
Basic statistics
Panel data/cross-sectional time series
Resampling and simulation methods
Other
Sum

Frequency

Percentage

Where Work
University
At home
Research institute
Private company
Government or international organization
Other
Sum

Frequency
930
576
174
140
42
83
1945

Percentage
47.8
29.6
8.9
7.2
2.2
4.3
100.0

Field Work
Econometrics
Finance and actual science/Risk analysis
(Mathematical) Statistics
Physics and engineering
Biometrics or Biostatistics/Epidemiology
Social science/Marketing and survey research
Other
Sum

Frequency
428
372
253
201
199
181
311
1945

Percentage
22.0
19.1
13.0
10.3
10.2
9.3
16.0
100.0

129

737
331

40.5
17.0

302
194
90
82
159
1945

15.5
10.0
4.6
4.2
8.2
100.0

130

Appendix 8

Methods Used
Multivariate methods/Non. and semiparametric methods/
Generalized linear models and limited dependent variables/
Linear models/Survival analysis
Time series
Basic statistics
Graphics and exploratory data analysis/
Tools for learning or teaching statistics
Panel data/cross-sectional methods
Tools for learning or teaching statistics
Resampling and simulation methods
Other
Sum

Frequency

Percentage

Software
Statistics
Excel
Applied
Econometrics
Rest
Other
Sum

Frequency
583
504
316
292
28
222
1945

Percentage
30.0
25.9
16.2
15.0
1.4
11.4
100.0

Platform L
Windows NT
Linux
No
Solaris
Other
Sum

Frequency
1595
231
80
21
18
1945

Percentage
82.0
11.9
4.1
1.1
0.9
100.0

Platform C
No
Windows NT
Apple
Linux
Solaris
Sum

Frequency
1865
66
9
5
0
1945

Percentage
95.9
3.4
0.5
0.3
0.0
100.0

780
373
332

40.1
19.2
17.1

192
90
60
54
124
1181

9.9
4.6
3.1
2.8
6.4
100.0

Appendix 8

131

Continent
Europe
America
Asia-Pacific
Africa
Other
Sum

Frequency
983
486
421
40
15
1945

Percentage
50.5
25.0
21.6
2.1
0.8
100.0

Country
Germany
USA
Japan
Sum

Frequency
318
310
168

Percentage
16.4
15.8
8.1
0.0

Platform
Windows NT
Linux
Solaris
Other
Sum

Frequency
1661
236
21
27
1945

Percentage
85.4
12.1
1.1
1.4
100.0

Xversion
Local
ReX
Client
Sum

Frequency
1643
222
80
1945

Percentage
84.5
11.4
4.1
100.0

Appendix 9: Regrouped User Intelligent Miner Cluster Analysis


Internet surfer (21%)

Academia (41%)

First learn
WWW, newsgroups
conferences

96%
4%

WWW, newsgroups
Friends, colleagues
publications, journals
conferences
other

38%
22%
20%
4%
16%

at home
private company
research institute
gov./international org.
other

55%
18%
16%
5%
6%

university

Excel
Statistics
Applied
Econometrics
other

37%
29%
15%
7%
12%

Statistics
Econometrics
Excel
Applied
other

32%
21%
20%
17%
8%

32%
13%
13%
9%
17%

Econometrics
Statistics
Finance/Risk ana.
BioMetr./stat. & Empi.
other

30%
17%
13%
12%
12%

37%
21%
16%
9%
9%

Mult/Semi./Linear
Time series
Graph./explor./Learn.
Statistics
other

43%
16%
13%
10%
7%

35%
20%
19%
11%
7%

Mult/Semi./Linear
Time series
Statistics
Graph./explor./Learn.
Panel/cross-sec. Time Ser.

42%
19%
15%
8%
7%

98%
1%

Windows NT

83%
14%
3%

Local
ReX
Client

Where Work
100%

Software

Field Work
Finance/Risk ana.
Econometrics
Physics & engin.
Social Sc./Market
other
Methods looked for
Mult/Semi./Linear
Time series
Graph./explor./Learn.
Statistics
other
Methods used
Mult/Semi./Linear
Time series
Statistics
Graph./explor./Learn.
other
OS Platform
Windows NT
Solaris
Xversion
Local
ReX
Client

132

100%

85%
10%
6%

Appendix 9

133

Home Worker (24%)


publications, journals
Friends, colleagues
conferences
WWW, newsgroups
other

33%
26%
1%
0%
39%

at home
research institute
private company
gov./international org.
university
other

59%
17%
11%
4%
0%
9%

Excel
Statistics
Econometrics
Applied
other

32%
23%
16%
14%
12%

Finance/Risk ana.
Econometrics
Statistics
Physics & engin.
other

25%
21%
12%
9%
19%

Mult/Semi./Linear
Time series
Graph./explor./Learn.
Statistics
other

40%
18%
17%
8%
9%

Mult/Semi./Linear
Time series
Statistics
Graph./explor./Learn.
other

38%
22%
16%
9%
8%

Windows NT

Local
ReX
Client

100%

82%
16%
2%

Linux User (14%)


First learn
WWW, newsgroups
publications, journals
Friends, colleagues
conferences
other
Where Work
university
at home
research institute
private company
gov./international org.
other
Software
Statistics
Applied
Excel
Econometrics
other
Field Work
BioMetr./stat. & Empi.
Econometrics
Physics & engin.
Social Sc./Market
other
Methods looked for
Mult/Semi./Linear
Graph./explor./Learn.
Statistics
Time series
other
Methods used
Mult/Semi./Linear
Statistics
Graph./explor./Learn.
Time series
other
OS Platform
Linux
Solaris
Xversion
Local
Client
ReX

Total Users (100%)


54%
16%
10%
3%
17%

WWW, newsgroups
publications, journals
Friends, colleagues
conferences
other

44%
18%
17%
3%
18%

45%
29%
11%
4%
2%
6%

university
at home
research institute
private company
gov./international org.
other

48%
30%
9%
7%
2%
4%

37%
18%
17%
8%
18%

Statistics
Excel
Applied
Econometrics
other

30%
26%
16%
15%
11%

24%
14%
14%
12%
22%

Econometrics
Finance/Risk ana.
Statistics
Physics & engin.
other

22%
19%
13%
10%
16%

39%
21%
15%
13%
7%

Mult/Semi./Linear
Time series
Graph./explor./Learn.
Statistics
other

41%
17%
16%
10%
8%

40%
21%
15%
14%
6%

Mult/Semi./Linear
Time series
Statistics
Graph./explor./Learn.
other

40%
19%
17%
10%
6%

86%
7%

Windows NT
Linux

85%
12%

91%
5%
4%

Local
ReX
Client

85%
11%
4%

Appendix 10: Institute Users Frequency Analysis


First Learn
WWW, newsgroups
Publications, journals
Friends, colleagues
Conferences
Other
Sum

Frequency
53
22
19
3
10
107

Percentage
49.5
20.6
17.8
2.8
9.3
100.0

Methods Looked For


Time series
Multivariate methods
Non. and semiparametric methods
Graphics and exploratory data analysis
Basic statistics
Generalized linear models and limited dependent variables
Panel data/cross-sectional time series
Resampling and simulation methods
Survival analysis
Tools for learning or teaching statistics
Linear models
Other
Sum

Frequency
20
18
17
10
8
8
7
5
4
4
3
3
107

Percentage
18.7
16.8
15.9
9.3
7.5
7.5
6.5
4.7
3.7
3.7
2.8
2.8
100.0

Where Work
Research institute
At home
Government or international organization
Private company
University
Other
Sum

Frequency
107
0
0
0
0
0
107

Percentage
100.0
0.0
0.0
0.0
0.0
0.0
100.0

Field Work
Econometrics
Physics and engineering
Biometrics or Biostatistics
Social science (sociology, psychology, etc.)
Epidemiology
Finance and actual science
Risk analysis
(Mathematical) Statistics
Marketing and survey research
Other
Sum

Frequency
23
19
16
8
6
6
6
4
2
17
107

Percentage
21.5
17.8
15.0
7.5
5.6
5.6
5.6
3.7
1.9
15.9
100.0

134

Appendix 10

135

Methods Used
Time series
Multivariate methods
Generalized linear models and limited dependent variables
Basic statistics
Linear models
Non- and semiparametric methods
Graphics and exploratory data analysis
Panel data/cross-sectional methods
Resampling and simulation methods
Tools for learning or teaching statistics
Survival analysis
Other
Sum

Frequency
21
18
15
12
11
11
8
4
3
2
1
1
107

Percentage
19.6
16.8
14.0
11.2
10.3
10.3
7.5
3.7
2.8
1.9
0.9
0.9
100.0

Software
Excel
MatLab
SPSS
S/S-Plus
GAUSS
R
SAS
Statistica
MiniTab
Eviews
LIMDEP
TSP
Rats
Stata
XGobi
Xlisp-stat
Other
Sum

Frequency
20
18
13
9
6
6
6
5
4
3
2
2
1
0
0
0
12
107

Percentage
18.7
16.8
12.1
8.4
5.6
5.6
5.6
4.7
3.7
2.8
1.9
1.9
0.9
0.0
0.0
0.0
11.2
100.0

Platform L
Windows NT
Linux
No
Solaris
Other
Sum

Frequency
84
18
2
1
2
107

Percentage
78.5
16.8
1.9
1.9
1.9
100.0

Platform C
No
Windows NT
Linux
Solaris
Apple
Sum

Frequency
105
2
0
0
0
107

Percentage
98.1
1.9
0.0
0.0
0.0
100.0

136

Appendix 10

Continent
Europe
Asia-Pacific
America
Africa
Sum

Frequency
51
34
18
4
107

Percentage
47.7
31.8
16.8
3.7
100.0

Country
Germany
Japan
USA
Sum

Frequency
16
10
7

Percentage
15.0
29.9
6.5
0.0

Platform
Windows NT
Linux
Solaris
Other
Sum

Frequency
86
18
1
4
107

Percentage
80.4
16.8
0.9
1.9
100.0

Xversion
Local
ReX
Client
Sum

Frequency
97
8
2
107

Percentage
90.7
7.5
1.9
100.0

Erkl
arung zur Urheberschaft

Hiermit erklare ich, dass ich die Arbeit selbstandig verfasst, keine anderen als die
angegebenen Quellen und Hilfsmittel benutzt und die diesen Quellen und Hilfsmitteln wortlich
oder sinngema entnommenen Ausf
uhrungen als solche kenntlich gemacht habe.

Berlin, den 27. Mai, 2003

Jianqiu Wang

137

Vous aimerez peut-être aussi