Académique Documents
Professionnel Documents
Culture Documents
Diplomarbeit
Eingereicht von
Jianqiu Wang
Am 27. Mai 2003
Matrikel-Nr.: 161426
Pr
ufer: Prof. Dr. Wolfgang Hardle
Contents
Abstract
Introduction
1. Customer analysis
1.1
1.2
1.3
Customer Behaviour . . . . . . . . . . . . . . . . . . . . . . . . .
1.1.1
1.1.2
1.1.3
1.1.4
10
12
1.2.1
Market segmentation . . . . . . . . . . . . . . . . . . . . .
13
1.2.2
Customer profiling . . . . . . . . . . . . . . . . . . . . . .
22
23
1.3.1
Market Targeting . . . . . . . . . . . . . . . . . . . . . . .
23
1.3.2
Positioning . . . . . . . . . . . . . . . . . . . . . . . . . .
24
2. Data Mining
2.1
2.2
26
26
2.1.1
26
2.1.2
Data Preparation . . . . . . . . . . . . . . . . . . . . . . .
28
2.1.3
Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
29
2.1.4
Result Interpretation . . . . . . . . . . . . . . . . . . . . .
29
29
2.2.1
Applications . . . . . . . . . . . . . . . . . . . . . . . . . .
30
2.2.2
Operations . . . . . . . . . . . . . . . . . . . . . . . . . . .
31
2.2.3
31
ii
Index of contents
39
3.1
About XploRe . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
39
3.2
39
3.2.1
Data collection . . . . . . . . . . . . . . . . . . . . . . . .
39
3.2.2
41
3.2.3
42
3.2.4
46
3.2.5
Measures of Improvement . . . . . . . . . . . . . . . . . .
46
47
3.3.1
47
3.3.2
53
3.3.3
59
3.3.4
63
63
3.4.1
63
3.4.2
72
Complementary analysis . . . . . . . . . . . . . . . . . . . . . . .
78
3.5.1
78
3.5.2
82
3.3
3.4
3.5
4.2
85
85
4.1.1
marketing strategy . . . . . . . . . . . . . . . . . . . . . .
85
4.1.2
Marketing Mix . . . . . . . . . . . . . . . . . . . . . . . .
85
91
4.2.1
92
4.2.2
Target Market . . . . . . . . . . . . . . . . . . . . . . . . .
92
4.2.3
92
Index of contents
iii
4.2.4
93
4.2.5
96
4.2.6
4.2.7
References
107
Appendix
116
137
iv
Index of contents
List of Figures
1.1
1.2
1.3
1.4
10
1.5
14
1.6
15
1.7
SAGACITY. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
20
1.8
Targeting strategies. . . . . . . . . . . . . . . . . . . . . . . . . .
24
3.1
40
3.2
55
3.3
67
3.4
74
3.5
75
3.6
81
4.1
4P of marketing mix . . . . . . . . . . . . . . . . . . . . . . . . .
86
vi
Index of contents
List of Tables
23
1.1
1.2
. . . . . . . .
19
2.1
30
3.1
44
3.2
45
3.3
47
3.4
57
64
3.6
65
3.7
72
3.8
73
3.9
74
76
76
77
80
84
3.5
vii
. . . . . . . . . . . . . . .
24
. . . . . . . . . .
18
viii
Index of contents
Abstract
This thesis paper presents a case study of customer analysis with the purpose
of to developing a marketing strategy for the statistical software XploRe. The
customers analysed include the users, who downloaded XploRe free trial version
through web site and the actual customers, who bought XploRe. Descriptive
analysis was conducted for both data, which leaded to the conclusion that research institutes represent is the high- profit able sector for of XploRe. For users
data, data mining method clustering was undertaken to identify the customer
segments. Two different clustering methods were tested on the same users data
set with different software IBM Intelligent Miner and XploRe. As the a result,
the users of XploRe were divided into four clusters by both methods, Internet
surfer,Academia, Linux user and Home worker. Through the comparison
of historical data for of user data 2003 and data 20020, more facts and trends
of XploRe market and customers were discovered regarding the software used,
information resource, new market and the undergoing changes in customer segments. Based on the results of customer analysis, the suggestions for marketing
strategy, marketing mix and further analysis were outlined.
Key words: customer analysis, market segmentation, data mining, clustering,
marketing strategy, marketing mix
Abstract
Introduction
Customer analysis is a crucial step for the development of marketing strategy.
Only when the company has a clear view of its customers could , the proper
strategy and actions could then be undertaken to gain competitive advantage in
the market.
In the current time, together with the development of digital data management
systems, the capability for of gathering, storing and accessing to the information
has improved dramatically. This trend brings the difficulty for companies when
they confront the huge amount of data. Data mining is a important technology
for the companies to conduct customer analysis for large data set. It discoveries
valuable information which is useful for marketing.
The research presented in this paper tried to segment the customers and find
the trends and facts of XploRe market, so that the suggestions for marketing
strategy could be derived based on the results. XploRe is a statistical software
which aims at sophisticated users who are looking for a flexible, programmable
statistics package with an emphasis on more advanced procedures.1 It is important for XploRe marketer to understand its customer and market. The customer
data studied here include the data of XploRe users (the potential customer) and
actual customers (the buyers). The user data was collected through an online
questionnaire preceding the downloading process of XploRe trial version, while
through the returned registration forms the customer data was gathered. With
the purpose of comparison, two sets of user data were analysed and two clustering methods were tested with two software IBM Intelligent Miner and XploRe.
The user data 2002 is from October 11, 2001 to July 22, 2002 and with 1734
profiles. The raw data of user data 2003 contains 2593 profiles and is collected
from October 11, 2002 to March 13, 2003. The customer data includes data of
32 profiles from July 1, 2000 to August 30, 2002.
Only descriptive analysis was taken for customer data due to its low amount
of records. For user data, the data mining process of clustering was conducted
to segment the market. The mining run for user data consists of several steps:
cleaning the raw data with MS Excel, transferring data to IBM Intelligent Miner
or XploRe, performing cluster analysis. The clustering identified four groups
of XploRe customers, namely Internet surfer, Academia, Linux user and
1
Introduction
1. Customer analysis
In the current market space, the competition is intensive. The market is abundant
with all kinds of products. To win the decision of customers to their products, the
companies should get a deep sight into what the customers really need and how to
influence their purchasing e decision. Therefore, the companies should now have
a customer focus conducting business with the emphasis on the understanding
of the customers and the market.
Customer analysis is the study of customers and their behaviour, which is central
to achieve a customer focus. 2 The purpose of conducting customer analysis is
to achieve marketing goals, such as the following: 3
Customer acquisition finding the new customer
Customer cross sell further sales of different products to the same customer
Customer up sell the customer makes greater use of the same product or
service
Customer retention keeping the customer loyal
1.1
Customer Behaviour
1.1.1
Customer behaviour here means that the behaviour of individuals who purchase
for private or household consumption. These customers buy goods which are not
a part of the value chain, and the purpose of purchasing is not to generate profit.
Buying behaviour depends on the individual reaction to the internal and external
stimuli; therefore, it is difficult to predict. Black box is the item that describes
2
3
WWW14
Heygate, Richard, 1998.
1. Customer analysis
the customer purchasing decision, which is difficult to access but is crucial for the
purchasing determination.
In order to develop appropriate products that are attractive to the customers,
firms need to have an insight into what happens in the black box. Figure ??
presents the customers black box. In the customers black box, the customer
actually gather information, evaluate and compare, then come to a decision, which
is called the Consumer buying process.
Consumer
Black box
Aspirations
Motivation
Education
Personality
Beliefs
External stimuli
- Identification of needs
- Evaluation of offers
that Satisfy need
- Comparsion of substitute
products and brands
- Purchase
- Post-purchase evaluation
7Ps
- Social pressure
- Legal requirments
- Physical factors
- Economic cycle
Marketer
1.1.2
1. Customer analysis
Recognition of
the problem
Evaluation of the
alternatives
The purchase
decision
Post-puchase
behaviour
Through information gathering, the customers get aware become aware of the various products and brands in the market, then they will evaluate the alternatives,
and finally make the purchase decision.
After purchasing major items or expenditure, many people experience cognitive
dissonance also called post purchase anxiety. They wonder whether they have
made the correct purchasing decision. To reduce this anxiety, they will look for
confirmation. For example, they might ask friends to approve that their purchase
is a right choice.
Figure 1.2 summarises the stages of consumer buying process: Recognition of the
problem, The search for information, Evaluation of the alternatives, The purchase
decision and Post-purchase behaviour.
Companies should present themselves in each buying process stage and try to
be distinguished among all other products and brands of competitors. To let
a brand or product be the final choice of customer, companies need to have
clear understanding of the evaluative criteria used by consumers in comparing
products, which was mentioned before.
3
1. Customer analysis
1.1.3
The customer behaviour model indicates the procedure and basic elements, which
happens inside the customers black box or consumer buying process.
The most basic, simplest and best known model of buyer behaviour is the AIDA,
which stands for Awareness, Interest, Desire and Action.4
The model introduced here composes of six interrelated components.5
1. Information or facts: refers to the precept caused by stimulus.
2. Product recognition defines to what the extent the buyer knows about the
product to distinguish it from others products.
4
5
1. Customer analysis
3. Attitude towards the product refers to what the customer expects from the
product to satisfy their particular needs.
4. Confindence in judging the product is the customers degree of certainty that
his or her evaluative judgement of a product is correct.
5. Intention to buy is the mental state that reflects the customers plan to buy
some specific number of products from a particular brand in some specified
time period.
6. Purchase is caused by the intention to buy. It is defined as when the customer has paid for a product or has made some financial commitment to
buy some specified amount during some specified time period.
F- Information R- product recognition C-Confidence A-Attitude I-Intention PPurchase
When consumers evaluate a product, they also employ certain evaluative criteria,
which have several aspects:
1. The products attributes such as its price, performance, quality, and styling.
2. Their relatively importance to the consumer.
3. The consumers perception of each brands image.
4. The consumers utility function for each of the attributes.
These evaluative criteria come cross with the elements in the consumer behaviour
model. For instance, product recognition, attitude towards the product and confidence in judgement are the three parts in the buyers image of a product. They
all have vital impact on the consumers buying decision.
10
1. Customer analysis
Cultural
Environmental
Culture
Sub-culture
Social class
Economic cycle
Social pressure
Legal requirement
New technology
Social
Reference groups
Family
Roles and status
The buyer
Personal
Psychological
Motivation
Learning
Perception
Beliefs and attitudes
1.1.4
WWW11
Bannes, E., etc., 1997, P139-149.
8
Environmental factors are external factors, while the other four factor categories are internal
factors that influence consumer buying behaviour.
9
Bannes, E., Mcclelland, B., etc., 1997, P139-184.
7
1. Customer analysis
11
desired.
Safety needs refers to peoples needs for security, stability and predictability. Services, such as insurance, guarantees, etc. are the products to satisfy humans
safety needs.
Social needs explain the humans desire of love and sense of belonging. At this
level, people will seek to join association and clubs.
Self-actualisation is the highest level of needs. It demonstrates itself in the search
of status, esteem, achievement and recognition. To satisfy this level of needs,
people turn to the luxurious products, like perfumes, high-tech products, cars,
etc..
Only after people achieve all these level of needs, they will then turn to the
realisation of their potential, which is expressed in concern for external issue, like
volunteer work.
2. Personal factors
Personal factors are the set of buyers personal characteristics, including age,
occupation, lifestyle, personality, and economic circumstances.
3. Cultural factors
Culture factors include culture, sub-culture and social class.
Culture is a set of shared values, which define peoples behaviour. Language is
the best example of culture difference. Not rightly using a language will cause
misunderstanding. And also there are attitude differences between eastern and
western culture towards family and individual.
A large society or culture is normally divided into subculture groups, which define
more subtle behaviour norms. Subculture groups include ethnic groups, religious
groups, racial groups and geographical groups etc.. They exhibit the difference
in culture preference, ethnic taste, attitudes, life style and taboos.
Social class is also called socio-economic group. It is decided by the income level,
education and occupation. The often-used social class model divides the society
into upper class, upper middle class, lower class, upper working class, working
class and others.
4. Social factors
Social factors includes reference groups, family, social role and status.
Reference groups are defined as all groups that have a direct (face-to- face) or
12
1. Customer analysis
1. Primary membership groups are generally informal, and interact within the
members, such as family, neighbours, colleagues and friends.
2. Secondary membership groups are more formal than primary memberships,
and the interactions between members are less. These include religious
groups, professional groups, trade unions.
3. Aspirational groups are groups that one would like to belong to.
4. Dissociating groups are groups, whose values and behaviour are rejected by
the individual.
5. Environmental factors
Environmental factors consist of economic, social, political, technological aspects.
Economic cycle, social pressure, legal requirements, new technology all will influence consumers purchase decision on which product to buy and the way to buy
it.
1.2
When firms try to sell their products in customer markets, they should not only
try to identify the factors that influence the customers black box, but also to
estimate whether there is enough number of customers who need their offer. It
is important for the companies to compare their capabilities and the objectives
of customers, so that they can decide whether they are able to serve the market
with appropriate products profitably. Therefore, firms must identify market need,
segment the total customer into potential customer groups, which are likely and
able to purchase the offer, and also position the product or service as attractive
alternative to other offers of the target groups.
10
1. Customer analysis
1.2.1
13
Market segmentation
14
1. Customer analysis
1. Customer analysis
15
Homogeneous demand
Consumers have relatively similar needs or desires for
a product or service category
Diffused demand
Consumersneeds and desires are so divers that
no clear clusters (segments) can be identified
Clustered demand
Consumersneeds and desires can be grouped
into two or more idenitifiable clusters (segments),
each with its own set of purchase criteria
desire or potentially desire it, and willing to and able to buy it. It is necessary
to analyse the market in terms of its size and pattern of demand.
There are three patterns of demand categories:
15
1. Homogeneous demand
All consumers in a market have similar needs and wants.
2. Diffused demand
Consumers needs are diverse and no clear segments can be identified. This
suggests the need for customisation.
3. Clustered demand
Consumers need and desires can be grouped into several identifiable segments. Each has its own set of purchase criteria.
2. Selecting the approach and bases for segmentation
Identification of market segmentation could be conducted based on detailed market research, or on basic analysis of customer data held within a company. Many
companies keep customer records detailing information such as age and gender.
15
16
1. Customer analysis
17
1. A Priori methods:
In a prior approach, the basis for segmentation is set in advance. The primary
market research is not necessary. Thus, the analysis of second data resources,
the customer information at hand, manger intuition and other methods will be
employed to set the segmentation basis for the buyers according to their usage
patterns (heavy, medium, light and non-user), demographic characteristics (age,
sex, income) or psychographic profiles (personality). After the basis setting, a
research will be conducted to identify the size, location and potential of each
segment. The marketing decision will be based on which segment the marketing
efforts should be concentrated. For example, classification is a prior approach.
2. Post hoc methods:
Post hoc approach segments the market depending on the research finding, rather
than decides the segmentation basis in advance. The primary market research is
conducted to collect the classification and descriptor variables. Segments will be
defined only after all the relevant information is collected and analysed. The research might highlight the particular attributes, attitudes or benefits, with which
particular groups of customers are concerned. The result then becomes the basis
for dividing the market.
3. Dividing the market and profiling the segments
Based on the data gathered, the process of dividing the market into identifiable
market segments is carried out. The information obtained will give details regarding to the nature of customer segments. This is called segment profiling.
Profiling associates tapes each segment with certain characteristics, and aggregates the customer with similar characteristics into group and separates them
from those with different characteristics.
Criteria of customer segmentation
A market could be segmented in various ways. There are problems with segmentation, such as the relevance and quality of the data, intuition, continuous process
16
17
WWW31
Han, J. and Kamber, M, 2001, P281-319.
1. Customer analysis
17
and over-segmentation. A good segmentation should be relevant for buying behaviour and satisfy the following requirements:18 19
Size: the market should be big enough to guaranty a good segmentation.
It is dangerous to over segment an already very small market.
Difference: the difference between the member of the segments should exist
and could be measured through data collection approach.
Measurability: The company is able to collect information that measures
the nature of buying behaviour for the segmentation.
Substantiality: The selected segmentation should be profitable regarding to
the marketing mix resources designed especially for it.
Accessibility: The extend that the marketing effort could reach the segmentation.
Stability over time: The segmentation should last a certain period without
dramatic change in major features.
Responsive to communication means: The segmentation sensitive to the
marketing mix and communication means.
Variables for customer segmentation
Almost all factors which affect customers buying process and decision can be
used as the variables of customer segmentation. Generally the variables for
customer segmentation can be put into five categories: Demographic, Socioeconomic Grade, Psychographics and life style, Behavioural, Geographic and
Geo-demographics. 20 21
1. Demographic variables
Demographic variables categorise the market according to the population characteristics and population profiles. Customers are subdivided into groups based
on one or more demographic variables such as age, sex, religion, race, nationality,
family size and stage of family life cycle. For example, the custom seller groups
18
WWW20
Wilson, R. and Gilligan, C., 1997, P275.
20
Kalakota, R. and Whinston A. B..
21
McDonald M. and Dunbar I., P85-91.
19
18
1. Customer analysis
ACORN Group
A
B
C
D
E
F
G
H
I
J
K
U
Agricultural areas
Modern family housing, higher incomes
Older housing of intermediate status
Older terraced housing
Better - off council estates
Less well-off council estates
Poorest council estates
Multi-racial areas
High-status non-family areas
Affluent suburban housing
Better-off retirement areas
Unclassified
1981
Population
1, 811, 485
8, 667, 137
9, 420, 477
2, 320, 846
6, 976, 570
5, 032, 657
4, 048, 658
2, 086, 026
2, 248, 207
8, 514, 878
2, 041, 338
388, 632
%
4.3
16.2
17.6
4.3
13.0
9.4
7.6
3.9
4.2
15.9
3.8
0.7
23
customer regarding their ages. Like age of 20-30, this group are the customers,
who are more like to purchase trendy items.
2. Geographic and Geo-demographics
Geographic segmentation divides the market into different geographic units such
as countries, regions, counties, cities and postcode etc. Geographic system is
based on the proposition that the neighbourhood area in which you live will
be reflected in your professional status, income, life stage and behaviour. The
neighbourhood types are initially identified using national census data.
ACORN (A Classification of Residential Nneighbourhoods) is an example of geographic systems. ACORN classifies consumers into 43 demographic and behaviourally distinct clusters. The clusters are based on the type of neighbourhood,
socio-economics status and the buying behaviour and preference.22 A Broadbased ACON classification is conducted in Great Britain in 1981. It segments
the residents in Great Britain into 12 categories.
3. Socio-economic Grade
The buying behaviour is often influenced by the social class of a person The
factors include income, status, education etc. National Readership Survey scales
22
23
1. Customer analysis
Grade
A
B
C1
C2
D
E
Social Classification
Upper Middle Class
Middle Class
Lower middle class
Skilled working class
Working class
Subsistence level
19
Occupation
Higher managerial, professional or administrative jobs
Middle managerial, professional or
Supervisory or clerical jobs, Junior management
Skilled manual workers
Unskilled and semi-skilled manual workers
Pensioners, unemployed, casual or low grade workers
24
is one of the popular classifications, which and is based on the occupation of the
main wage earner of the household.
A further development of the life stages socio-economic grade model is SAGACITY, developed by Research Services Ltd.. This model combines life stages with
income and social class.
4. Psychographic variables
Psychographics attempts to classify individuals by their attitudes, personality
and life styles.
(1)Personality
Personality is used as variable to segment the market. The earliest segmentation
was conducted by Riesman et al (1950) in early 1950s. It identified three distinct
types of social characterisation and behaviour: 25
1. Traditional directed behaviour, which changes little over time and which as
a result, is easy to predict and is used as a basis for segmentation.
2. Other directness, in which the individual attempts to fit in and adapt to
the behaviour of the peer group.
3. Inner directness, where the individuals is seemingly indifferent to the behaviour of others.
(2) Attitude
Attitude includes the customers attitudes towards risk, degree of loyalty, the
24
20
1. Customer analysis
Life Cycle
Income
Occupation
White-collar
Dependent
Blue-collar
Pre-family
White-collar
Blue-collar
Family
Better off
White-collar
Blue-collar
Worse off
White-collar
Blue-collar
Late
Better off
White-collar
Blue-collar
Worse off
White-collar
Blue-collar
1. Customer analysis
21
likelyhood of taking new products, etc. Many of the personality variables could
also use as the descriptor of the attitude.
(3) Lifestyle
The consumers behaviour is determined by the way we live our lives as well. It
arises from a complex relationship between our aspirations, surest situation, and
perception of self, income and attitudes. Life style market segmentation offers a
detailed view of buyers because it composes of numerous characteristics related
to their activities, interests and opinions. The life style consist mainly of three
dimensions: 26
1. Activities: Work, hobbies, social events, vacations, entertainment, club,
membership, community, shopping, sports.
2. Interests: Family, home, job, community, recreation, fashion, food, media,
and achievements.
3. Opinions: Selves, social issues, politics, business, economics, education,
products, future, culture.
5. Behavioural variables
(1) Benefit sought variables
This group of variables for segmenting customer considers the motive for a purchase. It groups consumers according to specific benefits that they seek in a
product. Even if two customers bought exactly the same products, the benefit
they expected may vary. Benefit segmentation is therefore based on behaviour
processes, involving thought and action, as opposed to age and socio-economic
class, which are defined according to individual characteristics. It closely identifies the customers needs and represents a powerful method of understanding and
influencing behaviour.
In applying for this approach, a company should begins by attempting to measure
consumers value systems and their perceptions of various brands within a given
product class. The information gathered is then used as the basis of marketing
segmentation. Benefiting segmentation begins by determining the principal benefits that the customers are seeking in the product, the kinds of people who look
for each benefit and the benefit delivered by each brand. For example, for teeth
26
22
1. Customer analysis
paste market, four segments are identified according to benefit: Seeking economy,
Decay prevention, Cosmetic and Taste benefits.
(2) User status
The market can be divided into five segments, according to user status: nonusers, ex-users, potential users, first-time users and regular users. First-time user
and potential users can be further subdivided on the basis of usage rate.
(3) Loyalty Status and Brand Enthusiasm
Loyalty status categorises the customers on the basis of the extent and depth
of their loyalty to particular brands or products. Most typically there are four
categories: Hard core loyals, soft-core loyals, shifting loyals and switchers.27
1. Hard core loyals are customers who consistently buy the same brands or
product.
2. Soft-core loyals are those who are willing to choose from a limited brand
set. Their Loyalty is divided among the limited brands or products.
3. Shifting loyals consists of consumers who shift their loyalty from one brand
to another. After they shift the brand, they will not buy the ex-brand any
more.
4. Switcher loyals are those who show no loyalty to any single brand. Their
buying pattern is typically determined either by the special offers available
or by their search for variety.
(4) Critical events
Major or critical events generate ones needs, which can be satisfied by the provision of a special collection of products and/or services. Typical examples are
marriage, the death of someone in the family, unemployment, illness, retirement
and moving house, etc..
1.2.2
Customer profiling
Customer segmentation and customer profiling are two elements of Customer Relationship Management (CRM). Customer Profiling is performed after customer
segmentation. Customer Profiling is to locate clusters within the customer file
that outperform the average.28 It creates customer segment profile, which labels
27
28
1. Customer analysis
23
1.3
1.3.1
The next task after customer segmentation and profiling is market targeting.
Companies choose one segment or several segments as the target market. The
target market is the market that company decides to serve. Specific marketing
mix and resources will be developed to serve the target market.
The companies normally adopts on e of the three targeting strategies:29
Undifferentiated strategy: Company ignores the difference between each customer segments, and regards the whole market as a single market. Single
marketing mix is adopted for the whole market. This is the so called mass
marketing.
Differentiated strategy: The whole market is divided into several segments.
The company develops different marketing mix for different segments.
28
29
24
1. Customer analysis
Undifferentiated Strategy
Organisation
Marketing
Mix
Entire
market
Concentrated Strategy
Segment 1
Organisation
Marketing
Mix
Segment 2
Segment 3
Differentiated Strategy
Organisation
Marketing Mix 1
Segment 1
Marketing Mix 2
Segment 2
Marketing Mix 3
Segment 3
Concentrated strategy: The company chooses one or several market segments, but only take the single marketing mix. Under this strategy, the
company tries to have a high market share in one or several niches markets,
instead of struggling to have a small share in the whole market. For the
firms with limited resource, this strategy is very appealing.
1.3.2
Positioning
The purpose of target marketing is to focus on the selected target market, finetune the market mix to provide a group of potential customers with superior
value, therefore, to build up unique position of product in the customers view.
A products position is the complex set of perceptions, impressions, and feeling
that it induces in consumers, compared with competing products.30 Positioning
refers to the how customer think about proposed and /or present brands in a market. 31 The fundamental idea of positioning is competitive advantage. 32 Through
30
1. Customer analysis
25
the differentiated market mix, the special needs and demands of customers could
be satisfied. Thus, the customers will view the product or brand as superior to
the others, and place the product or brand with a distinct position. To position
a product, the marketer must appeal to the target customers strongly with its
strength and differences using proper marketing mix.
2. Data Mining
Data mining, which is also known as Knowledge Discovery in Database KDD,33
is a powerful new technology, which help company to identify the important
information among the sea of data. Data mining technology is commonly used
for customer analysis.
Fayyad defined data mining as a non-trivial process aimed at identifying, valid,
novel, potentially useful and ultimately understandable pattern in data.34 While
Grameier and Rudolph consider data mining in terms of all methods and techniques, which allow to analyse very large data sets to exact and discover previously unknown structures and relations out of such huge heaps of details. These
information is filtered, prepared and classified so that it will be a valuable aid for
decisions and strategies.35
Data mining extract the implicit, previous unknown and potentially useful data
from the data in order to automate the process of discovering the significant
pattern and trends.
2.1
The process of data mining could be summarised in as the four stages: Data collection and selection, Data preparation, Data mining, and Result interpretation.36
37
2.1.1
26
2. Data Mining
27
External resource: There are resources, from which one could obtain information such as demographic information.
Research survey: The often-used way to collect particular information is
to conduct a survey. The survey could be conducted through face-to-face
interview, telephone interview, and postal questionnaire or via Internet.
During the collection of data, two types of variables should be collected:38 Classification Variables classify the data set into groups. Most demographic, geographic, psychographic or behavioural variable can be used to classify customer
into segments.
Demographic variables: Age, gender, income, ethnicity, marital status, education, occupation, household size, length of residence, type of residence,
etc.
Geographic variables: City, state, zip code, census tract, county, region,
metropolitan or rural location, population density, climate, etc.
Psychographic variables: Attitudes, lifestyle, hobbies, risk aversion, personality traits, leadership traits, magazines read, television programmes
watched, etc.
Behavioural variables: Brand loyalty, usage level, benefits sought, distribution channels used, reaction to marketing factors, etc.
Descriptor variables are variables used to describe and distinguish each subgroup from each other in a data set. We could say that the descriptor variables
stand for the characteristic of the represented data set. Descriptor variables must
be easily obtainable variables that already exist in or appended to the customer
files. Many classification variables could be used as descriptor variables.
The data is normally stored in a data warehouse. As the data warehouse contains
all diverse types of data, so that to conducting data mining, the data that will
be used in analysis should be selected in the first step.
38
WWW7
28
2. Data Mining
2.1.2
Data Preparation
Before data can be analysed, the original collected data must be prepared first
prepared in order make to let it suitable for the analysis. Data preparation
consists of the following stages:
1. Data cleaning:
2. Data Mining
29
4. Data Sampling39
2.1.3
Mining
At the mining stage, various techniques could be used to extract the valuable information from the final prepared data. For example: To create an accurate, symbolic classification model to predict whether a reader will continue to subscribe
for a newspaper. First, clustering technique should be conducted to segment
the subscribers database; then, the rule is introduced to create a classification
model automatically for each desired cluster, through which one could predict
the behaviour of a customer.
2.1.4
Result Interpretation
2.2
Data mining could be distinguished between the aspects of applications, operations, techniques and algorithms.40 41
39
Ferguson, Mike
WWW 4
41
IBMs Data Mining Technology, 1996
40
30
2. Data Mining
Applications
Operations
Techniques
Database marketing
Customer segmentation
Customer retention
Fraud detection
Credit checking
Web site analysis
Prediction and classification modelling
Link analysis
Database segmentation
Deviation detection
Supervised Induction
Clustering
Association discovery
Sequence discovery
2.2.1
Applications
Data mining is widely used in customer analysis and marketing. The following
areas cover the main application of data mining.42
Customer segmentation: Data mining tools automate the process of find predictive information in large database. The companies, especially the retailers,
banks, are interested in knowing if there are sub-group customers who exhibit
certain characteristics. They could use data mining to clustering the customers,
discover interested groups. For example, companies use data mining to analyse
the historical mailing list in order to find out the high return to investment group,
so that they could determine the new mailing target groups. Banks and credit
companies classify the credit scoring to identify the customer segments, which
has lower risks.
Relationship management: Data mining discovers and identifies the previous
unknown relationships hiding in the data. The buying patterns of a customer
are of interested to by the retailers and advertisers. Combined with customer
segmentation, data mining could help them to find out the relationship between
the purchase of product items, and customer types, or to improve the conduction
of a advertisement campaign on special media for specific group of customers.
42
Carbone, Patricia L.
2. Data Mining
2.2.2
31
Operations
2.2.3
Numerous techniques support the operations of data mining to find the desired
groups or relationships.
32
2. Data Mining
Classification and predictive modelling is supported by supervised induction techniques. Clustering supports database segmentation. Association discovery and
sequence discovery are used for the link analysis. The deviation detection is
supported by statistical techniques.
The desired relationships to be discovered by data mining are:43
Classes: in which the data items is located into predetermined groups.
Clusters: in which the data items are grouped by logical relationships.
Associations: data is mined to identify associations.
Sequential patterns: data is mined to anticipate the behaviour patterns and
trends.
Supervised Induction
Supervised induction is the process to automatically create a classification model
from a sets of records (example)44 , which is called the training sets. The records
in the training set must belong to a set of pre-defined classes. Each class has a
distinguishable pattern, which is generated from the existing records. Once the
model is set up and induced, a new record could be automatically put into a class
according to its pattern.
Supervised induction contains steps of classification and prediction to put elements into ppredetermined erformed groups according to some criterion. The
numbers of subgroups and the feature of each subgroup are defined at beginning.
Then, the feature of the observation will be compared with the criterion and then
be put into corresponding ed group.45 This is usually done in two steps:
Step 1: Build a model to describe the predetermined data set groups or
classes. The model contains a set of classification rules (labels).
Step 2: If the accuracy of the model or classifier is acceptable, the model
can be used to classify the new unlabeled data groups or elements.
Clustering Clustering is a method of grouping data elements into homogenous
groups. It divides a heterogeneous data set into disjoint sub-groups, so that the
elements in any ner one cluster is highly similar, while the elements in different
43
2. Data Mining
33
1. Hierarchical algorithms
2. Iterative partitioning
3. Density search
46
47
34
2. Data Mining
4. Factor analytic
5. Clumping
6. Graphic theoretic
Here we only discuss two basic clustering algorithm methods: Hierarchical algorithms and Iterative partitioning algorithm.
(1) Hierarchical algorithms
Hierarchical clusteringc can be performed using algorithm is composed of two
main types different of procedures: Agglomerative procedure and Splitting procedure.
Agglomerative procedure starts from the finest partition. It considers each
observation as a cluster, then puts groups together to form new clusters.
At each stage in the procedure, the number of clusters is reduced by one,
by through the joining or fusing two groups into one, which are considered
to be the closest or most similar groups. Aggolomerative algorithm is a
frequently used procedure. It contains the following steps:48 49
1. Construct the finest partition. Normally each observation is a group.
2. Compute the distance or dissimilarity matrix.
3. Find out the closest or most similar groups.
4. Put the two most similar groups together to form a cluster.
5. Computer the distance or dissimilarity between the new groups, get a
reduced distance or similarity matrix.
6. Repeat the step 3 to step 5, until the optimal clusters are formed.
Splitting procedure is opposite to the agglomerative procedure. It considers
the whole data set as a cluster to start with, then splits the cluster into sub
groups to form new clusters.
The linkage for Agglomerative algorithm There are many linkages to measure the proximity or similarities of elements and groups. The frequently
normally used linkages are:
48
49
2. Data Mining
35
36
2. Data Mining
more likely to be put into the shopping basket together. Through this analysis,
retailers are able to identify which items are frequently purchased together by the
customers.
An association rule is the relationship of the form X Y , where X is the
antecedent item set and Y is the consequent item set. For example: customers
who purchased item X are very likely also to purchase item Y at the same time.51
There are two measures for each rule: support and confidence.52
Support (or prevalence) indicates the occurrence frequency of an itemset.
s(A B) = P (A B)
Confidence (Certainty or Predictability) measures the validity of the pattern. It indicates, denotes how strong the strength of the relationship between the items, and to what degree an item depends on the others.
For example: Among the customers who buy computers, only 5% customers are
students. and buy laptop. But if a customer is also a student, the possibility
of his buying a computer is 20%. In this rule: 5% is support and 20% is the
confidence.
Two other important measures for association rule discovery are: Expected confidence - the possibility of an items purchasing regardless what other items haves
been bought together. For instance, customers buy a computer 40% of the
time, 40% is Expected confidence.
Lift - refers to the difference between the confidence of a rule and the expected
confidence, either in the form of absolute difference or in the form of ratio. When
Lift is negative or less than one, it means the itemset of the rule are unlikely to
happen or two products are unlikely to be purchased at the a same time.
The goal of association discovery is to find out all the associations with s% support
and c% confidence in the data of transaction.
1. Data format
Two types of format are used to form the data for association discovery:
1. Horizontal format: each entry as a row, each attribute is a column.
51
52
2. Data Mining
37
2. Vertical format: Only one column for attributes. Different entries are denoted by different ID. Attributes belonging ed to the same entry will be
assigned the same ID number.
2. Apriori Algorithm
The most often used algorithm of association rule is called Apriori algorithm. It
uses the prior knowledge of itemset features to explore their further associations.
The steps are as following:
38
2. Data Mining
54
Step 1: Sort phase. Sort he database according to customer id and transaction id.
Step 2: Itemset phase. Find all large sequences of length 1.
Step 3: Transformation phase. Transform each item in the sequence into
integer.
Step 4: Sequence phase: Find all large sequences.
Step 5: Maximal phase: delete all non-maximal sequences.
53
54
Wojciechowski, Marek
Han, J and Kamber M, 2001, P225-271.
3.1
About XploRe
XploRe is a professional statistical software for high-end statistical analysis, advanced research and interactive teaching. It was developed in 1999 by Prof. Wolfgang Hardle and his team at Humboldt University of Berlin, Germany. XploRe
is a module structured, command driven software. The statistical methods of
XploRe are supported by various libraries. Therefore, one can incorporate his/her
ones own methods in XploRe and easily extend the environment. The competitive
advantage of XploRe lies on rather advanced methods, particularly smoothing.
The purpose of XploRe lies in the exploration and analysis of data. According to
Prof. Hardle (1999), it aims at sophisticated users who are looking for a flexible,
programmable statisticals package with emphasis on more advanced procedures.
The Internet is currently the main marketing instrument of XploRe. A free trail
version with limitations of XploRe (with limitations) could be downloaded from
the net.
3.2
3.2.1
User refers to the person who downloaded XploRe from Internet, while Customer refers
to the person who bought XploRe.
39
40
All trial versions of XploRe (except for the Linux local version) do not include all
function and commands of XploRe, will expire after two months, and are limited
to 1000 observations. The Linux local version has no expiration date and no limit
on the size of observations.
41
downloaders are asked to choose the preferred versions of XploRe56 and the operating system, on which XploRe will be installed, such as Windows, Linux, Sun
etc.. An example questionniare is attached in the Appendix.
During downloading, the date and IP-address are automatically recorded. They
are very helpful in in data cleaning procedure.
XploRe Customer data collection
XploRe customer here refers to who haves actually bought XploRe. I call them
also call them actual customers. The data of XploRe customer is collected
through registration forms, which are sent to customer together with XploRe.
The return of the registration form is not compulsory. The customer data is from
1 July 2000 to 30 August 2002. Because of the change in registration form, the
data after this date was not used. In the Appendix, the new registration form is
attached for the reference.
The registration form includes the questions about the identity of the customer
like country, language and the questions about their fields, as well as the operating
systems.
As a the result, we get 8 variables of customer data: country, federal state (Germany), language, title, operating system, profile sector, profile branch and sex.
3.2.2
A analysis based on poor quality or wrong data could deliver erroneous results
no matter how sophisticated the statistical method is. Therefore, the raw data
are thoroughly cleaned before using them for analysis.
XploRe user data cleaning
When people download XploRe, obviously they would like to complete the download process as quick as possible and answer the question as promptly as possible.
If the questionnaire is too tedious or too complicated, the downloader may get
impatient so that they give wrong or incomplete answers. In addition, in survey
56
XploRe has three versions: Local version, Java-Client version and ReX, which is a Excel
add-in.
42
it often happens that the questionees are not very serious about the answer and
dont give actual information.
To avoid including the false information into the data, I used the personal questions as the indicators for the degree of seriousness to the questionnaire and the
possibility of false answers. Many people gave obviously wrong answers to the
personal questions. I assume that, if people gave false answer to the personal
questions, they would give false answer to substantive questions as well. Furthermore, according to the given IP addressed, the suspicious observations were
inspected and then deleted according to a set of criteria.
The cleaning process was carried out mainly automatically by Excel Visual Editor.
However, the whole process of data cleaning could hardly be carried out fully
automatically. Therefore, the manually cleaning work was also taken to delete
the false information that the computer program could not identify, for instance,
the matching of IP address and the deletion of the profiles of those from XploRe
team. At the end, there was 1181 profiles for analysis after the cleaning.
XploRe customer data cleaning
The cleaning procedure of customer data is relativelyly simple. We suppose that
the customer knows their answer will help XploRe to improve its service, therefore, they intend to provide right information. The cleaning process, therefore,
only include the deletion of doubled customer information.
3.2.3
In the first step, the descriptive analysis was conducted with XploRe to give an
overview of the data.
XploRe User descriptive analysis
From the Table in Appendix 1, XploRe user frequency analysis, we can see the
frequency and percentage of each variable.
Concerning the resources of getting to know XploRe, WWW/Newsgroup are
the main resource. 42.9% of the downloader first learn about XploRe through
Internet. The second main resource is Publications and Journals, 20% users use
these channels to know about XploRe.
43
49.4% of users work in a university, and 9.1% of users work in research institute.
The users from Private, Non-research Company have a percentage of 6.6%. The
interesting point is that a high percentage of users work at home. With 28.9% of
the users, this group is the second biggest group in this category.
Excel is the most popular software, which is used by 25.1% of total users. The
next are SPSS and MatLab, with 11.2% and 10.4% of users respectively. XploRe
is a command driven software, competitive in rather advance statistical methods.
The software such as S-Plus and GAUSS have more similar feature and scope
with XploRe, their users comprise 5.5% and 4% of the total respectively. This
fact shows that most users are more likely to choose more standard software
such as Excel and SPSS, because of the higher programming requirement and
difficulties in using a programmable matrix oriented software like XploRe. But
the relatively high percentage of MatLab user underlies a sign for opportunity for
XploRe because MatLab is also a program-oriented software. There is chance for
XolpRe marketing to get this type customer.
A great part of XploRe users work in the field of Econometrics. The other popular work fields are Mathematical Statistics, Finance and actuarial science, and
Physics and engineering. Each consists of about 10% of users.
The most often used statistical methods, corresponding to the users work, are
time series, followed by Basic statistics, Multivariate methods and Linear models.
But regarding to the methods that the users look for in XploRe, there are some
differences. The most wanted statistical method are Time series and Multivariate
methods, while Non- and semi- parametric methods, Graphics and exploratory
data analysis are ranked as the third and forth most wanted methods, respectively.
This difference indicates that the existing statistical software are weak at Nonand Semiparametric methods and Graphic/Exploratory methods. Therefore, the
users try to discover more powerful instrument related to these two methods.
XploRe could emphasis its strength in these two analysing methods, thus, expand
its customer base.
86.5% of users downloaded the local version of XploRe, 9.3% downloaded ReX
version of XploRe, which is a statistical Microsoft Excel 2000 add-in. Only 4.1%
of users downloaded the XploRe - Java - Client version.
Windows-NT is the dominant platform of local version with 84.1% of users. Linux
is also relativelyly popular, 13.2% of users downloaded XploRe Linux version.
Concerning Client version, windows- NT is still the dominant platform. Linux
only account for 6.1%. Other platforms account for very small fractions.
44
Name
First Learn
Work Place
Software
Work Field
Method Used
Method Looked for
Xversion
Platform L
Platform C
OS Platform
Country
Continent
Type
Categorical
Categorical
Categorical
Categorical
Categorical
Categorical
Categorical
Categorical
Categorical
Categorical
Categorical
Categorical
Modal Value
WWW, Newsgroup
University
Excel
Econometrics
Time Series
Time Series
Local
Windows NT
Windows NT
Windows NT
Germany
Europe
Modal Freq.
42.9%
49.4%
25.1%
24.1%
18.7%
17.3%
86.5%
84.1%
87.8%
84.2%
16.9%
52.7%
No. of Values
5
6
17
10
12
12
3
4
4
4
77
4
Tab. 3.1: Summary and decription of the varibale of User 22/07/02 data
XploRe Users are with various national backgrounds. Users from Germany
(16.9%), USA (15.7%) and Japan (8.6%) consist of half of the population.
More than half users are from Europe, 52.7%. The following are America and
Asia-Pacific, with 24.5% and 20.5% respectively. The reason might be that
XploRe origins from Germany. The information and marketing are more active
in Europe than in other areas.
Since the variables are categorical, we could draw a picture of the typical user of
XploRe. The modal user of XploRe is some one who is from Germany, works in
a university, learnt about XploRe through Internet. He uses excel as the main
software for statistical, and he works in the field of econometrics. Time series are
his main analysis method, and he looks for the software that performs better in
Time series methods. He downloads the local version of XploRe and windows-NT
is his platform.
Type
Categorical
Categorical
Categorical
Categorical
Categorical
Categorical
Categorical
Categorical
Modal Value
Germany
Baden-W
urttenberg
Man
English
Prof.
Windows
Research Institute
Economics
45
Modal Freq.
34.4%
3.1%
21.9%
18.8%
9.4%
31.1%
34.4%
9.4%
Missing value
3.1%
84.4%
0.0%
59.4%
78.1%
68.8%
62.5%
78.1%
Tab. 3.2: Summary and descripiton of the variables for customer data
percentage of 25%. The following are Japanese customers, 9.4%. The customers
from Italy consist of 6.2% of the total customers. There are customers from
Denmark, France, Norway, The Netherlands, UK, China and Taiwan, they each
have 3.1% percentage of the customers. Therefore, Europe is the main customer
market of XploRe, followed by America and Asia.
78.1% of XploRe customers are men. Women have a relativelyly lower percentage,
only 21.9%. This is in correspondence with the facts of the XploRe users.
English is the main language used among the customers, followed by German,
French and Italian.
The customer of XploRe are highly intellectual, 21.8% of them own the title of
Prof., Dr, or Prof.Dr..
34.4% of customers work in research institutes. 3.1% of them work in companies.
Windows is the most popular platform. 21.3% of the customers use Windows as
their computing platform.
The professional fields, in which the customers work, are diverse. Econometrics
has a higher percentage of 9.4% among them. The other professional fields indicated in the data are statistics, biostatistics, mathematics and computer science.
Because of the high appearance of missing value, we could only get a vague
image of the customer of XploRe. The model customer of XploRe is some one
from Germany, works in research institute, and speaks English. He is likely to
46
have the title of Professor or Dr., and works in the field of econometrics and use
windows as platform.
3.2.4
From the above analysis, we now can have a vision of the user and customer of
XploRe. What is the relationship between them? How to change the user into
actual customer? Which marketing instruments should be employed? And how
do we stimulate this change? All of these questions should lie in the mind of
XploRe marketer.
The results show that the feature of the XploRe user and customer are quite
similar, such as the origin, computing platform, and field of work etc. However,
differences exist as well.
The customers from Germany have a much higher percentage of the total customer than the Users from Germany. This indicates that Germany is the main
market for the XploRe.
The XploRe users come mainly from university, but the customers are mainly
from research institutes and companies, especially from research institute, which
consist 34% of purchaser. This indicates that research institute and private company could be a active target market that could provide a high turnover than
other markets. Therefore, the further work should be carried out to determine
the needs of customers from research institutes and private companies, and to
which marketing instruments they are sensitive.
Due to the deficit of customer data, further analysis is constricted. Thus, the work
to build a quality data bank of customer should be placed in the top agenda of
XploRe marketer.
3.2.5
Measures of Improvement
To impove the situation, the measure of collecting high quality customer information should be conducted, because only through the analysis of the actual
customer behaviour, the marketer is able to understand them and thus determine
the right action to reach them.
One possible way could be to reform the registration form, delivering questionnaires together with the products, or conducting follow up suvery via telephone
16.9%
15.7%
8.6%
49.4%
28.9%
9.1%
6.5%
24.1%
11.9%
11.0%
84.2%
68.8%
47
Customer
Country
Germany
USA
Japan
Sector
Research Institute
Company
Missing Value
34.4%
25.0%
9.4%
34.4%
3.1%
62.5%
Branch
Economics
Missing Value
9.4%
78.1%
Operation System
Windows
31.3%
3.3
3.3.1
48
59
49
50
n
X
(xi yi )2
i=1
Take the above two tuples above (a) and (b) as example, we could calculate the
69
In the situation with missing value, p is the number of variables that are available both in i
and j.
70
WWW1
WWW2
72
H
ardle W., Simar L., 2000, P298
71
51
(12 + 02 + 12 + 02 + 12 )1/2 =
p X
X
k=1 iLk
76
jLk ,i6=j
(m di,j ) +
dij
j6Lk
Where, m is the number of observation attribute. As discussed above, for hamming distance of categorical data, the distance d between two observation i and
j is the disagreements between the two observations. That is the number of variables that take different values. The term m dij is then obviously the number
73
52
n
X
(xi + x)2
i=1
An increasing Information loss is indicative of increasing heterogeneity. Therefore, the ward procedure fusions two groups with minimal increase of heterogeneity. The aim of ward procedure is to unify groups without dramatically increasing
the variation inside this group, thus, to reach the most homogeneous partition
groups.
77
53
nR
1 X
d2 (xi , xR )
nR i=1
nP nQ 2
d (P, Q)
nP + nQ
Ward algorithm joins two groups that gives the smallest increase in (P, Q).
3.3.2
54
Computational efficiency refers to the amount of time (and computer memory) used by a
software or an algorithm to perform the required calculations in order to produce the desirable
results.
81
Sofyan, Werwatz, 2001
55
To initialize the clustering algorithm, two inputs were given at the start: the
variables of analysis and the number of clusters for the partition. In order to find
out the suitable partition, heuristic backwards selection strategy was adopted,
which started with relatively larger number of variables and maximum number of
clusters. I have tried various combinations of variables and maximum number of
clusters in order to locate a handful and meaningful partition. With the reference
to the results of previous analysis of Sofyan and Werwatz, I conducted the analysis
starting with the maximum number of clusters with six, then five, four and three.
The final chosen segmentation has five variables and four clusters. The five variables are Work field, Work Place, Resource of First learn, XploRe version and OS
platform. With these five variables, the four-cluster segmentation achieves relatively high NCC value (0.6002)82 and a good interpretation of the data comparing
the other segmentations. As mentioned before, the final chosen segmentation not
only achieves the high statistical value, but also could deliver a rational description of the data.
The final segmentation presented by Figure and Table 3.4. The Figure shows the
visual result of the clustering. The Table presents the details of each cluster.
The NCC value in IBM Intelligent Miner is called Global Condorcet Value.
56
The Figure displays 4 rows, each row represents one of the four clusters identified
by the mining run. The figure at the left end indicates the percentage of each
cluster among the whole sample. The Pie chart represent the active variables
used in the clustering. The importance of variables in forming the cluster is
indicated by the position of the pie chart from left to right. That is, the variable
of pie chart more on the left has higher influence in the cluster formation. Each pie
chart composes two rings. The inside ring shows the distribution of the associated
cluster, while the outside ring represents the distribution of the entire sample.
The first cluster contains 39% of total users. To understand the information that
the pie chart delivered, first take a look at the left end pie in the first row. This
pie shows the distributions of variable First learn in first cluster (inner pie) and
in the whole sample (out ring). 100% of users in first cluster have a value of 1
for variable First learn, which is the numerical code of Internet and means
that 100% of XploRe users of cluster 1 got to know XploRe through Internet.
Comparing the out ring of pie chart (the corresponding segment in out ring
with same colour), according category among the whole sample has a smaller
percentage, 42.93%. The users with other information resources (publication,
friends and conference) consist of the rest part.
In the cluster four, First learn is represented by the third left pie. This means
that the variable is less influential in forming Cluster four. The distribution of
variable First learn in cluster four (inner pie) is very similar to the distribution
in the whole sample (out ring). In contrast, for Cluster four, variable Platform
is the most influential variable, which is presented by the first left pie in the forth
row.
The characteristics of each cluster is summarised in Table. The first two clusters
are relatively bigger, with 36% and 30% each of all observations. And the Cluster
Condorcet Values83 are 6.339 and 6.119 respectively. The third and forth clusters
are rather small, each with 20% and 14% of all observations, and their Cluster
Condorcet Values are 0.5205 and 0.4904.
The Table also indicates the detailed distribution of each related variable in each
cluster, for instance, the inter-cluster modal model frequency of respective variable and value of Chi-squared.84
83
Cluster Condorcet Value is the standardised measure of agreement among the observations
within a cluster.
84
Chi-Squared showed in the Table indicates to what extent the intracluster distribution
differs from that of the whole sample. The closer is it to 1, the more is the difference between
Cluster Academia
Cluster Character
Similarity: 0.6119
Size(abs.) 359
Size(rel.)
30.40%
57
2
0.33
0.01
0.00
0.00
0.05
Attributes
WWW, Newsgroup
Econometrics
University
Local
Windows
Freq.
100%
18%
45%
87%
98%
2
0.24
Freq.
44%
31%
19%
17%
68%
81%
98%
Work Place
Xversion
Platform
0.23
0.02
0.05
Attributes
Other resources
Publications, Journals
Others
Finance & Actuarial Sc.
At Home
Local
Windows
Variable
First learn
Fieldwork
Work Place
Xversion
Platform
2
0.22
0.03
0.12
0.01
0.05
Attributes
Friends, Colleagues
Econometrics
University
Local
Windows
Freq.
42%
43%
87%
88%
98%
Variable
First learn
2
0.02
Attributes
WWW, Newsgroup
Freq.
52%
Fieldwork
0.04
Work Place
Xversion
Platform
0.01
Others
Biometrics & Biostatistics
0.01
University
0.01
Local
0.84
Linux
25%
19%
49%
90%
88%
Tab. 3.4: Character characteristics of User IBM Intelligent Miner Clusters (2002)
58
3.3.3
59
60
In variable Work Place, the users who worked at home compose 44.2% of total
users in this group. Those who work at university are 20.2% of the cluster user.
The most popular software in this group is Excel, with 31.7% of users. SPSS and
MatLab are the following, with 10.3% and 9.5% of users respectively.
Econometric is the main working field for Cluster 1 (23.8%). 14% and 11.8% of
users in this cluster work in Finance and actuarial science and Statistics.
18.9% of Cluster 1 users use method of Time series. 16.6% of them use Basic
statistical methods. Multivariate methods and Linear models users consist each
12.9% and 11.2% of users.
Time series are also the most preferred method, 18.1% of Cluster 1 users seek
better performance in time series methods in XploRe. Multivariate methods are
at second place, with 12.7% of users. With 11.5% and 9% of users each, Graphics
and exploratory data analysis and Non-and semiparametric methods are the third
and forth methods that the cluster 1 user preferred. Basic statistical methods
have a much lower percentage, only 9%.
Local version of XploRe is the dominant downloaded version (77.1%). Windows is
the dominant platform (96.5%). And the users are mainly from Europe (54.3%).
Cluster 2
15.6% of total users (184) make up Cluster 2.
Half of the users in Cluster 2 knew XploRe through the Internet, (51.6%). They
use other resource as well, 13.6% use publication as information resource, 10.9%
attend conference, only 8.7 % get information from friends.
Users from Cluster 2 also work mainly in university (58.2%) and at home (28.8%).
Users from research institute and private company have much smaller partition,
8.7% and 2.7% respectively.
The most popular software for Cluster 2 users are Excel (16.3%), R (15.2%) and
SPSS (13.6%). MatLab and S/S-Plus follow, them with 7.6% and 6% of users.
Cluster 2 users has no dominant The Work Field: 19% of them work in Biometrics
or Biostatistics. Those who work in Econometrics and Physics and engineering
compose equally 13.6% of users each. Fileds of Social science and Statistics have
relatively high percentage, 12% and 10.9% respectively.
Many users in Cluster 2 conduct basic statistics (19.6%). There are also relatively high percent of users who use method of Time series (16.3%), Multivariate
methods (15.8%) and Graphics and exploratory data analysis (12%). Linear
61
62
Friends and Publication are two main resources for Cluster 4 users (51.1% and
47.9% respectively).
100% of them work at a university.
Cluster 4 users use different software. The mainly software they use are Excel
(15.8%), MatLab (11.6%), E-views (10.5%) and SPSS (10%).
Econometrics is the main field of work for Cluster 4 users (40.5%). 14.7% of the
cluster users work in Mathematical statistics.
The Cluster 4 users apply mainly methods of Time series (20%). 13.7% and 12.1%
of them conduct Multivariate methods and Non-and Semiparametric methods.
The users who use methods of Basic statistics and Linear models are 11.6% each
of cluster users.
Non-and semiparametrics methods are the most wanted methods of Cluster 4
users. They also look for methods of Time series (17.4%), Multivariate methods
(11.6%) and Basic statistics (10%).
Cluster 4 users are mainly from Europe (54.2%), use Windows as platform
(98.9%) and download Local version of XploRe (100%).
2. The modal user of each cluster
Based on the general description of each cluster, I identified the modal user and
the main characteristics of each cluster.
Cluster 1 Home worker
User of Cluster 1 is a home worker, who works at home in Europe in field of
Econometrics. He gets information mainly from the Internet. Excel is his mainly
used software. He uses methods of Time series and Basic statistics, but looks
for better performance in Time series, Multivariate methods and Graphics and
exploratory data analysis. His platform is Windows and he downloads Local
version of XploRe.
Cluster 2 - Linux user
Cluster 2 user is a Linux user, who come from Europe, works in a university.
His professional field is Biometrics and Biostatistics. Internet is his main information resource. The present software that he mainly uses are Excel and R. He
normally applies Basic statistics, and searches for Multivariate methods. Linux
is his operation platform. He downloads the local version of XploRe.
Cluster 3 Internet surfer
Internet surfer is the user from Cluster 3. He works at university in Europe in the
63
field of Econometrics. Excel is his mainly used software. He applies Time series
and Multivariate methods, but seeks better software for Graphics and exploratory
data analysis and Time series. He uses Windows as platform and download Local
version of XploRe.
Cluster 4 - Academia
Users in Academia make up Cluster 4. He works in a university in Europe, and
conducts research in Econometrics. He mainly uses Excel, applies methods in
Times series, but the methods that he searches for are methods in Non-and semiparametric methods. Windows is his platform, and he downloads Local version
of XploRe.
3.3.4
The Table bellow shows that the results of the two softwares are very similar.
The main difference exsits in the cluster of Home worker. Beacuse of the group
size are different with the two software, it is easy to understand that the subcharacters of them are different. However, we could come to the conclusion that
the two methods deliver similar outcomes, although some minor differences exist.
However, as a mining tool, IBM Intelligent Miner performed better in visulation
and reached a higher computational efficiency comaped with XploRe. Therefore,
I chose IBM Intelligent Miner to conduct the further cluster analysis.
3.4
3.4.1
Descriptive analysis
In order to have the view of the development of the users and market, an anlysis was undertaken for the latest user data. The raw latest user data contains
2593 records (11 October 2001 - 13th March, 2003). After data cleaning and
preparation, the final data has 1945 items.
84
I conducted the analysis of user data 2002 (11 Oct. 2001 - 22 July 2002) with the purpose
to compare it with the customer data, which was collected in similar period (1 July 2000 - 30
64
XploRe
Freq.
100%
18%
45%
87%
98%
Freq.
44%
31%
19%
17%
68%
81%
98%
Size 13.1%
Attributes
WWW, Newsgroups
Econometrics
University
Local
Windows
Freq.
99.4%
20.0%
100%
100%
100%
Size 55.2%
Attributes
WWW, Newsgroups
Others
Econometrics
Freq.
39.3%
28.1%
23.8%
At Home
Local
Windows
44.2%
77.1%
96.5%
Freq.
42%
43%
87%
88%
98%
Size 16%
Attributes
Friends, colleagues
Econometrics
University
Local
Windows
Freq.
51.1%
40.5%
100%
100%
98.9%
Freq.
52%
25%
19%
49%
90%
88%
Size 15.6%
Attributes
WWW,Newsgroup
Others
Biometrics & Biostatistics
University
Local
Linux
Freq.
51.6%
21.7%
19.0%
58.2%
94.6%
76.6%
Tab. 3.5: Comparison of Clustering results with IBM Intelligent Miner and XploRe
Type
Categorical
Categorical
Categorical
Categorical
Categorical
Categorical
Categorical
Categorical
Categorical
Categorical
Categorical
Categorical
Modal Value
WWW, Newsgroup
University
Excel
Econometrics
Time Series
Time Series
Local
Windows NT
Windows NT
Windows NT
Germany
Europe
65
Modal Freq.
43.5%
47.8%
25.9%
22.0%
19.2%
17.0%
84.5%
82.0%
95.9%
85.4%
16.8%
50.5%
No. of Values
5
6
17
10
12
12
3
4
4
4
93
4
Tab. 3.6: Summary and description of the variables for User data 2003
The modal values of the data is summarised by the above table. Again we could
locate the features of current modal user of XploRe. He works in a German University, majors in Econometrics.85 His computer is with OS platform of Windows
NT. The software he uses is Excel, with which he conducts research with Time
Series methods. He searches information through Internet, and the purpose he
downloaded XploRe is to find better software for Time Series methods.
Appendix 5 presents the detatail information for User data 1 3/03/03. Comparing
the methods used and methods needed looked for by the XploRe user, I found an
interesting point.
The methods used by users were at first Time series 19.20%, then Basic Statistics
17.1%. Multivariate methods were at the third place with 14%. The Graphics
and exploratory data analysis is only 6.8%. But the methods of preference looked
for by the users are different. Time series were still the primary methods needed
(preferred) wanted by the users. Mutlivariate methods were at same place, but
with same percentage of the usage 14%. Graphics and Exploratory data analysis
jumped to the third position with 11.7%, while methods of basic statistics were
only preferred wanted by 10% of the users.
This discovery indicates that most software have basic statistics functions, thereAugust 2001).
85
Referencing to Appendix 6, the major Work Field of General XploRe user is defined as
Practical /Applied Econometrics.
66
fore, the users have no extra needs in those functions. They dont search the
benefit from XploRe, but Graphics and Explortatory data analysis are more in
demand. XploRe could focus more and improve on these functions, and marketing
and send this information to the customers.
Clustering for 2003 data
The clustering with IBM Intelligent Miner was undertaken again to find the clusters for the new data of 2003. The final chosen clustering variables were the same
as those for 2003 user data, namely Firstlearn, OS Platform, Work Place, Fieldwork and Xversion. Because IBM Intelligent Miner offers the opportunity for
input some other variables as supplementary variables and in order to have the
view of the distribution of some interested variables in each cluster, the variables
of Software, Method used and Method looked for were adopted as the supplementary variable during clustering. As the result, the users again were subgrouped
into four clusters, namely as Internet surfer, Home worker, Academia and
Linux user. The clustering reached a Global Condorcet value of 0.5940. The
following graphic86 and the Table (Appendix 6) present the outcome of the clustering.
1. Summary from Variable perspective
Information resource (First learn)
Form the summary table of the four customer clusters, we can see that the Internet plays an important roll as communication channel for Internet surfers and
Line users. Linux users depend partly also on publications as information resource. Academia depends highly on personal communication channels. They
get information mainly from Friends and Colleagues, Publications is the another
important information resource for them. Home workers have a mixed information resources. They get more information from other resources. Publication and
Friends/Colleagues are two main resources for them to get information.
Conference plays a minor roll in all of the groups. This might means that our
participations in the conferences havent made strong impact on the customer or
XploRe still lack enough appearance in the conferences.
Working place
Academias mostly work in Universities. Internet surfer and Linux user are mainly
composed of people who work at university or at home. The presentation of
86
67
people working in research institutes and private companies are allocated fairly
in Internet surfer and Home worker with the similar distribution. Linux user
has a relatively high percentage of those working in the research institute. More
people work in private companies than in research institute in group of Home
worker.
Home worker is a mixed group. A highly percentage of them work at home, they
might be students or people who work some where but use XploRe at home. It
also contains some percentage of people who work in institutes and companies
Software
Excel is the first choice of software in all the groups except Linux user. MatLab
and SPSS are at the second and the third places for Internet surfer and Academia.
SPSS users are more than MatLab users in Homer worker group. MatLab user
will need more sophisticate knowledge in programming than the SPSS user, which
indicates a chance for XploRe.
Linux user s first choice is R. It is because R is also an online non-profit software
68
the same as Linux. There are less SPSS users in this group than MatLab users
as well.
Fieldwork
Internet surfers work mainly on Econometrics and Finance /Actuarial science.
Academia also work mainly on Econometrics but followed by Mathematical statistics. This shows that the Internet surfer may be engaged more in practical
financial study and Academia devote himself more in theoretical statistical research. Homer worker works mainly on Financial and actuarial science, followed
by Econometrics. This hints an even higher degree of engagement in the financial
practice. Linux user works more on Biometrics or Biostatistics and Physics or
engineering. Therefore, comparing the other groups, Linux user is more natural
science oriented.
Methods looked for
Internet surfer and Homer worker are both interested in Time Series and Multivariate methods because of their strong involvement in financial practice. But
the difference in the third method, Internet surfer in Non- and Semi-parametric
methods and Homer worker in Graphics and exploratory data analysis, shows
that the Home worker group emphases even more on the practical side than the
Internet surfer. Academia concentrates on the theoretical development, therefore,
their interested methods are more research oriented. Academia pays more attention in Non- and Semi- parametric methods, which are relatively new methods.
Linux users are more interested in the Graphics and exploratory data analysis
and Basic statistics, which complies with their demand in natural science data
analysis.
Methods used
All of the groups use Time series and Basic statistics as main methods. Comparing the methods they search in XploRe, we could be indicated that the users
apply basic statistical methods in the software that they possess, but they try to
find more sophisticate software which has better performance in Time series and
Multivariate methods. The requirement for Non- and Semi- parametric methods
and Graphic and exploratory data analysis also motivate them to look for the
new software.
Platform
Except for Linux users, Windows is the dominant platform for all the other three
groups.
XploRe Version
69
Most users in all the groups downloaded the local version of XploRe. ReX has
a higher presence in Homer worker group, which accords to the high utilisation
of Excel in this group. They use it as complementary software for Excel. Client
version has low percentage in all of the groups. The low percentage of Client
version shows the demand of Client version for more promotion.
2. Summary from cluster perspective
(1) Modal user of each cluster
Internet surfer
Internet surfer gets information absolutely from Internet. He conducts research in
Econometrics with Time Series Methods in a University. The software he uses is
Excel, but he looks for software with better performance in Time Series Methods.
Windows NT is his OS Platform. He downloaded Local version of XploRe.
Academia
Academia works in University, conducts research with software Excel in Econometrics. The methods he employed are Time Series Methods. He got the information of XploRe through Friends/Colleagues. The benefit he sought in XploRe
is Non-and Semiparameter methods. Windows NT is his OS platform, Local
version is the XploRe version he downloaded.
Home worker
Home worker has a mixed character. He works mainly at home. Publication/
Journals are the important Information resources for him87 . The fields he works
in are mainly in Finance and actuarial analysis. He currently adopts Time series
Methods, but searches for better performance in Time Series methods and Multivariate Methods. He downloaded Local version of XploRe onto his Windows NT
OS platform.
Linux User
Linux user works at University in the field of Biometrics or Biostatistics. He
uses R as the software to conduct his work with Basic Statistics methods. He
primarily wants software with good performance in Graphics and Exploratory
data analysis. He got to know XploRe through Internet, and downloaded the
Linux version of XploRe. His platform is Linux.
(2) Special features for clusters
Modal values indicate the main characters for each cluster. But when comparing
87
Because the value of other contains various options, therefore, I did not consider it as
modal value, even it rates at the first position for the variable.
70
some sub-features of each cluster with those of the Total Users (See Appendix
6), some very interesting characteristics are found, which distinguish each cluster
from each other and from the whole user population.88
Internet surfer
Internet is the sole information quell for the Internet surfers. They use MatLab and SAS more than the other groups, which are the software similar with
XploRe. The fields they work in are mainly Econometrics and Finance/ Actuarial
analysis. We could say that they conduct the practical work in Econometrics.
The downloading of Client version by this group is surprisingly the same as the
whole user group, which is different from my expectation of a higher percentage,
because they have a better Internet access than the other groups.
Academia
Academia works mainly in a university. The users from Academia group get
information mainly through Friends/Colleagues and Publication/ Journals. Internet doesnt play any role in information gathering. They conduct researches
in Econometrics and Mathematical Statistics. This means that they work more
in Academic research of Econometrics. Non-surprisingly, the benefit they sought
in XploRe is also more academic research oriented and advanced - the Non- and
Semiparametric methods.
Home worker
This group is a mixed group. To name it as home worker maybe not proper,
because actually only two third of them work at home, the rest work in various
places except for University. They work relatively more in Private Company, but
None of them work in a university.
They get information actually mainly from Other resources. However, Publications/Journals are a more importance information resources for them than for
the other groups (except for Academia). None of them use WWW/Newsgroups
as Information resource. Thus, it might be more proper to call them as a group
with extra Information Resources.
The field they engaged in are Finance/ Actuarial analysis and Others, so that
they conduct Practical work in Finance. Graphics and Exploratory data analysis
methods are more needed by them than others.
They use Excel as statistical software with a very high percentage. Therefore,
88
The features not specially mentioned below for each cluster are the features , which are
similar to those of the whole user group.
71
ReX version of XploRe is very popular among them, because ReX is an aid
instrument to Excel.
Linux User
Linux users use Linux as operation systems. R is naturally their choice of primary
software, because similar as Linux, R is a free statistical software for Linux system. Linux users work more in the fields of Biometrics/Biostatistics and Physics/
Engineering. Therefore, I define their main work field as Biological research and
Practical Engineering. They apply mainly Basic statistics, software for Graphics
and Exploratory data analysis are what they look for. They have high demands
in high quality graphic presentation.
Each group of XploRe users have different focus in their work field
The General User of XploRe devotes himself mainly in Practical /Applied
Econometrics.
Internet surfers mainly involve in Practical/ Applied Econometrics.
Academia majors in Academic/Theoretical Econometrics
Home workers engage in Practical/Applied Finance
Linux users are not from economic background, their work fields are Biological research and Applied Engineering.
(3) Similarity between clusters
Above we focused more on the differences between groups, but there are some
similarities exist in some groups as well. These facts might be helpful in Marketing
Mix design. The same tool could be then applied to different groups.
Academia and Home worker both dont use Internet as information resource,
while Publications/Journals play a very importance roll in both groups.
Internet surfer and Home worker both search for software of Time series
methods and Multivariate Methods, which are similar with the Total group.
This might be the result of them both conducting practical work. The difference exists in the third needs, Home works have higher needs in Graphics/exploratory data analysis, while Internet surfer place the needs of Nonand Semiparametric Methods as the third need. This indicates their different demands based on their work, Home work for Finance and Internet
72
Name
First Learn
Work Place
Software
Work Field
Method Used
Method Looked for
Xversion
Platform L
Platform C
OS Platform
Country
Continent
User220702
Modal Value
Modal
Freq.
WWW, Newsgroup
42.9%
University
49.4%
Excel
25.1%
Econometrics
24.1%
Time Series
18.7%
Time Series
17.3%
Local
86.5%
Windows NT
84.1%
Windows NT
87.8%
Windows NT
84.2%
Germany
16.9%
Europe
52.7%
No. of
Values
5
6
17
10
12
12
3
4
4
4
77
4
User130303
Modal Value
Modal
Freq.
WWW, Newsgroup
43.5%
University
47.8%
Excel
25.9%
Econometrics
22.0%
Time Series
19.2%
Time Series
17.0%
Local
84.5%
Windows NT
82.0%
Windows NT
95.9%
Windows NT
85.4%
Germany
16.4%
Europe
50.5%
No. of
Values
5
6
17
10
12
12
3
4
4
4
93
4
surfer for Econometrics. From this point, Internet surfers are more similar
to the Total users, who have the same needs. The general user of the whole
XploRe users engages also mainly in the Practical work in Econometrics.
3.4.2
Percentage
15.0%
14.0%
11.0%
8.0%
7.0%
5.0%
73
2003
Software Percentage
Excel
25.9%
Other
11.4%
MatLab
11.0%
SPSS
10.5%
R
7.5%
Eviews
6.5%
1. Comparison of software
Comparing the statistics software used by the users in 2000 and 2003, I find that
there is a clear trend of increase in using of Excel and MatLab. The share of
SPSS, SAS, GAUSS and S/S-Plus are declining. GAUSS and S/S-Plus have the
percentage of 4.1% and 5.1% in 2003 respectively. Other software such as R and
Eviews belong to the top five.
The findings have some indications. The dramatic increase in Excel users might
reflect that more and more people conduct statistics analysis with Excel. Because
Excel is an applied software, which performs basic statistical analysis, the increase
in the percentage of Excel users might also hint that the user base of XploRe in
2003 is less professional than in 2000. This could be the consequence of the
Internet. Through Internet people can access professional software as XploRe
much easier than before.
Another change I have already mentioned before is the rising of MatLab. MatLab
taking place of SPSS is now at the second position instead of the fourth in 2000.
Because MatLab, like XploRe, is a command driven software, while SPSS is much
more application oriented. Therefore, its rising might indicate that the statistics
professionals are now more interested in the programming based software. This
change might benefit XploRe.
2. Comparison of Information resource
We could find the change in the information resources in the categories of Publications/Journals and Colleagues/Friends. In 2000, the communication channel of XploRe was more personal, Colleagues/Friends was 25% and Publications/Journals was only 15%. But in 2003, their positions are changed. Publications/Journals are now at 18.2%, Colleagues/ Friends declines to 16.7%.
74
M
at
La
b
G
AU
SS
S/
SPl
us
SP
SS
Ex
c
SA
S
el
Percentage
2000
Software
Percentage
Software
Ev
ie
w
s
SP
SS
at
La
er
th
O
Ex
ce
30.0% 25.9%
20.0%
11.4% 11.0% 10.5%
7.5% 6.5%
10.0%
0.0%
Percentage
2003
Percentage
2000
Info. Resource
www/newsgroupes
Colleagues, friends
Publications/Journals
Others
Conferences
2003
Percentage Info. Resource
44.0%
www/newsgroupes
25.0%
Publications/Journals
15.0%
Others
13.0%
Colleagues, friends
2.0%
Conferences
Percentage
43.5%
18.2%
18.2%
16.7%
3.4%
75
44.0%
Pu
re
.
fe
C
on
bl
2.0%
..
er
s
13.0%
O
th
at
ic
ag
le
C
ol
w
w
w
/n
15.0%
...
25.0%
...
50.0%
40.0%
30.0%
20.0%
10.0%
0.0%
...
Percentage
2000
Info. resource
Percentage
43.5%
18.2% 18.2% 16.7%
..
..
C
on
fe
re
.
u.
ag
C
ol
le
O
th
at
ic
bl
Pu
w
w
w
/n
er
s
3.4%
...
50.0%
40.0%
30.0%
20.0%
10.0%
0.0%
...
Percentage
2003
Info. rsources
Percentage
This trend reflects the efforts of XploRe in improving its communication channels.
The width of the personal channel is limited. Comparing personal communication
channels, non-personal communicational channels are more effective to broaden
the potential user base and spread the information in a much wider scope.
3. Comparison of Country and Continent
The geographic trend that happened in the past two years presents that users
in Asia, especially Japan, increase dramatically. Asia becomes a more important
base for the potential customers of XploRe.
4. Comparison of Clusters in 2000 and 2003
The obvious difference between the clustering of 2000 data and 2003 data is that
76
Percentage
22.5%
18.8%
17.5%
2003
Country
Germany
USA
Japan
Percentage
16.4%
15.8%
8.6%
2000
Continent
Europe
America
Asia
Africa, Australia
Percentage
59.0%
29.8%
10.0%
3.2%
2003
Continent
Europe
America
Asia-pacific
Africa
Percentage
50.5%
25.0%
21.6%
2.9%
in 2000 there are only three user clusters, namely Academia, Unix/Linux and
Researchers,89 while 2003 clustering subdivided the user data into four clusters:
Academia, Linux user, Home worker and Internet surfer.
The appearance of a new cluster Internet surfer in 2003 reflects the trend of
popularity of Internet among the users and the development of E-commerce and
E-shopping. Internet becomes increasingly a powerful communication tool. It
enables the potential customers to access the product information much easier
than before. Internet surfer is the dominant tool of communication channel for
this group. It is important to understand more about the behaviour and features
of this group.
Two groups exist in both clustering, Academia and Linux user. For Academia
2003, publications/Journals are a much more important information resources
than in 2000. In 2003, 39% of Academia use Publication/Journals as Information
Resource comparing only 19% of them in 2000. This indicates that XploRe was
successful in improving communication channels in the last two years. More
users access the information about XploRe through non-personal channels than
personal channels. Internet is not the choice for them in 2003 compared to 37%
in 2000. This could be explained by the appearance of Internet surfer group in
89
77
2000
2003
Cluster Academia
Variable
Attribute
Kind of Work University
OS Platform
Windows
Get Info.
Journals
Friends, colleagues
WWW, newsgroups
conferences
Freq.
80%
97%
19%
32%
37%
6%
Cluster Academia
Variable
Attribute
Work Place University
Platform
Windows
First Learn Friends, colleagues
Journals
others
WWW, newsgroups
Freq.
88%
98%
39%
31%
23%
0%
Cluster Unix/Linux
Variable
Attribute
S Platform
Unix/Linux
Get Info.
WWW, newsgroups
Kind of Work University
Freq.
99%
62%
46%
Freq.
86%
56%
17%
17%
8%
3%
43%
31%
12%
Cluster Researchers
Variable
Attribute
Kind of Work Research
OS Platform
Windows
Get Info.
other source
Friends, colleagues
WWW, newsgroups
Freq.
78%
97%
20%
27%
34%
Freq.
67%
100%
44%
32%
19%
5%
0%
Freq.
100%
44%
99%
78
2003. Some members of Academia 2000 were resegmented into Internet Surfer
group in 2003.
Linux user in 2000 is similar to the Linux user in 2003. This is the most stable
user group of XploRe. The main characteristics of this group remain almost no
change. They are mainly from universities, get the information through Internet
and use Linux as platform.
Researchers group disappeared in the segmentation of 2003. Instead in 2003
segmentation, there is another new group Home worker. This phenomenon is
the consequence of the change in the values for the variable.
In 2000, variable Kind of Work contains values of University, Research institutes and Private/Non-research companies. Among the total users, 34% of them
work in research institutes. But in 2003 questionnaire, the variable of Work
Place consists of five choices, University, At home, Private company, Research
Institute and Government /International organisations. In the results of 2003
survey, 29.6% of total users work at home instead of research institute ranking
at the second position after University. The users working in research institutes
declines to only 8.9%. This consequence is the result from the new value of
At Home in 2003 survey. Actually, this is not the result we prefer. Because
Researchers are the users with high potential to become a true customer, who
actually buy XploRe. We are more interested in studying their features and behaviour patterns. Home worker is a mixed group. It is difficult to identify
them and reach them. And Home worker has also mixed characters. They
might be the people using XploRe at home, but work in other places. This fact
brings actually the confusion to the answers. Furthermore, to identify where people actually use XploRe, at home or working place, has no obvious marketing
reasoning. Therefore, the improvement in the questionnaire should be made to
some questions and choices.
3.5
Complementary analysis
3.5.1
During the clustering, the variable with a large domain will lead to low Condorcet
value. As the consequence some variable with large domain could not be used
as clustering variables. In order to solve the problem of large size domain, the
79
Descriptive analysis
The table below gives the modal values of the regrouped data. As the result of
the regrouping, the Software that the users used became Statistical software.
They engage in field of Econometrics. The methods they used and looked for are
now Multivariate /Non-semiparametric methods group. Appendix 7 compares
the modal values of the user data with regrouped user data. From the Appendix
7, we could see that the modal value of Fieldwork keeps the same. The software
changes from Excel to Statistics software. Both of the methods used and looked
for are also changed from Time series to Multivariate/ Non-and Semiparametric
methods group.
Appendix 8 gives the detailed information for the descriptive analysis of regrouped
data. Look again into the variables of methods used and looked for, the same
fact was found as that in the user data. The Graphics and Exploratory data
analysis are more wanted while Basic statistical methods were more employed.
Multivariate methods group and Time series keep high in both data sets for the
both categories.
90
Rest software refer to the software, which were listed in questionnaire, but not grouped
into the given software groups.
80
Name
First Learn
Work Place
Software
Work Field
Method Used
Method Looked for
Xversion
Platform L
Platform C
OS Platform
Country
Continent
Type
Modal Value
Modal Freq.
Categorical
Categorical
Categorical
Categorical
Categorical
Categorical
Categorical
Categorical
Categorical
Categorical
Categorical
Categorical
WWW, Newsgroup
University
Statistics
Econometrics
Multi./Non-Semipara.meth.
Multi./Non-Semipara.meth.
Local
Windows NT
Windows NT
Windows NT
Germany
Europe
43.5%
47.8%
30.0%
22.0%
40.1%
40.5%
84.5%
82.0%
95.9%
85.4%
16.4%
50.5%
No. of
Values
5
6
6
7
7
7
3
4
4
4
93
4
Tab. 3.13: Summary and description of the variables of regrouped User data 2003
91
The
General user92
The general users of XploRe adopt Internet as main information resource. They
work mainly in universities, use statistics software to conduct research in Econometrics and Finance filed. Time series, Multivariate methods group and Basic statistics are the methods they employed, but they look for software that
91
Variables of Xversion, Software, Methods used are supplementary variables in the clustering.
In order to present the special features of each cluster more clearly, here the characteristic
of the general user is also given.
92
81
have better performance in Time series, Multivariate methods group and Graphics/Exploratory data analysis group.
Internet surfer
Internet surfers use Internet as dominant information resources. They work
mainly at home, but a high percentage of them work in private companies and
research institutes. They are engaged mainly in Finance field, and use mainly
Excel to conduct their work. The methods they used and looked for are same to
the general users. They download Local version of XploRe onto their Windows
NT platforms.
Academia
Academia gets information mainly through Internet, but Friends/Colleagues play
a rather important roll as information resources. They work in Universities. They
use Statistic software and Excel to undertake their work in Econometrics and
Mathematical statistics. They use and look for the same methods as the general
users. They download Local version of XploRe on to Windows NT platform.
82
Home worker
Friends/ Colleagues and Publications/Journals are rather important information
resources for this group. They work mainly at home, but a high percentage of
them work in research institutes. Private companies are with a relatively high
percentage as well as work place. The fields they engaged in are Finance field
and Econometrics. The methods they used and looked for are the same as the
other groups. Most of them download Local version of XploRe on to Windows
NT platforms, but ReX has a relatively higher usage in this group.
Linux user
Internet is the primary information resource for Linux users. The software they
use are mainly Statistic software and Applied software. They work in the Biological fields. Graphics /Exploratory data analysis group are the methods they
most wanted after Multivariate methods group. Their platforms are Linux. Local
version of XploRe is the dominant version they downloaded.
The characteristics of the regrouped user clusters are similar with the users clusters. Here the further comparing will not be discussed. But in the future study,
the regrouping study is important. It is task for the future study to consider
more carefully with the possibility of regrouping.
3.5.2
83
However, this study is rather simple. In order to understand more about the
Institute users, more study should be conducted in the future.
84
General Users
First learn
WWW, newsgroups
publications, journals
Friends, colleagues
others
conferences
49.5%
20.6%
17.8%
9.3%
2.8%
WWW, newsgroups
publications, journals
others
Friends, colleagues
conferences
42.9%
18.3%
17.9%
17.4%
3.5%
Excel
MatLab
SPSS
other
S/S-Plus
18.7%
16.8%
12.1%
11.2%
8.4%
Excel
SPSS
Other
MatLab
R
25.1%
11.2%
11.2%
10.4%
7.5%
Econometrics
Physics & engin.
other
Biometric/Biostatistics
Social Science
Methods looked for
Time series
Multivariate meth.
Non/semipara.meth.
Graph./ explor.analy.
Basic statistics
Methods used
Time series
Multivariate meth.
General. Linear models
Basic statistics
Non/semipara.meth.
Platform
Windows NT
Linux
Xversion
Local
ReX
Client
21.5%
17.8%
15.9%
15.0%
7.5%
Econometrics
other
(Math.) Statistics
Finance & actuarial sc.
Physics & engin.
24.1%
15.8%
11.9%
11.0%
10.1%
18.7%
16.8%
15.9%
9.3%
7.5%
Time series
Multivariate meth.
Non/semipara.meth.
Graph./ explor.analy.
Basic statistics
17.3%
14.0%
13.1%
12.2%
9.6%
19.6%
16.8%
14.0%
11.2%
10.3%
Time series
Basic statistics
Multivariate meth.
Linear models
Graph./ explor.analy.
18.7%
15.7%
14.1%
10.3%
7.8%
80.4%
16.8%
Windows NT
Linux
84.2%
12.9%
90.7%
7.5%
1.9%
Local
ReX
Client
86.5%
9.3%
4.1%
Software
Work Field
4.1
4.1.1
4.1.2
Marketing Mix
Marketing Mix is used to describe how businesses promote their products and
services or how customers learn about a businesss products and services.94
The basic marketing mix normally composes of four elements, that is so called
4Ps Product, Place/distribution, Price, Promotion.
The marketing mix can be expressed in a more customer oriented way, which is
called 4Cs.95
Customer Value or Solution: Product benefits from customers view
Cost to the customer: Price plus the customers cost, for example, travel,
fax.
Convenience for the customer:
place/distribution channel.
85
86
Place
Product
Target
Price
Promotion
Fig. 4.1: 4P of marketing mix.
Place
The Marketing mix element place is concerned with two aspects of a firms
function: distribution and logistics.
Distribution refers to the ways that organisations get the physical product to a
point where is most convenient for the customer to buy it.96 Logistics is concerned with the process of planning, implementing, and controlling the efficient,
cost-effective flow and storage of materials, in-process inventory, finished goods
and related information from one point of origin to point of consumption for the
purpose of conforming to customer requirements. 97
Distribution channel consists of the organisations that move the product from
producer to the end consumer. The members of distribution channel perform
many functions, which help the completion of the transaction. The functions
include gathering and distribution of the information that is needed for planning
and transaction, promoting product, obtaining making contacts with potential
customers.
Conventional distribution channel and vertical market system are the two major types for distribution channels. The vertical marketing system consists of
corporate, contractual and administered vertical marketing systems.
The distribution mix is composed of administration, order processing, inventory,
packaging, warehousing, receiving, dispatch and transportation.
96
97
87
Product
Product is the physical product or service offered to the customer. Normally the
offer of the company combined the both sides. Therefore, the product refers not
only to the tangible but also the service added to it.
A product is a combination of three levels:98
Core product: The core product/ benefit is the real purpose that a customer by
for buying the product.
Actual product refers to the components of a product, such as features, packing,
brand name, quality level and design.
Augmented product is the additional customer service and benefit offered to the
customer. For example: installation, delivery and credit, warranty and after sale
service.
The decision on product attributes concerns the following aspect: Branding,
Packaging and labelling, Product line and mix
Price
The traditional definition of price is that the price of a product or service is
the number of monetary units a customer has to pay to receive one unit of that
product or service. In 90s a broader concept of price is more customer oriented.
In this concept, the price is the cost of an industrial goods, which includes much
more than the sellers price.99
The broad concept of price has three dimensions:
1. Recognise the difference of objective price and perceived price. The customer will not always have complete information about the price, and the
price information may also affect the customers buying process of decision.
2. Price refers not only to the monetary amount the customer paid at the time
of purchase. There are costs involved before and after the purchaseing. For
example: the time a customer must wait while purchasing, the cost of
maintenance and repair etc.
98
99
88
89
also give feedback to the producer. Therefore, the customers affect the behaviour
of the producer as well. The channels of communication can be grouped into
belong mainly to two groups: personal communication channels and non-personal
communication channels.
1. Push and pull strategy
Push strategy of promotion utilises the distribution channels to promote the products.
Pull strategy of promotion adopts heavy promotions directly aimed at the end
users. This strategy is normally used in a customers (buyers) market.
2. Promotion mix
Five major promotion tools make up the promotion mix also is called marketing
communication mix:
Advertising
Advertising is any paid form of non-personal presentation and promotion
of ideas, goods, or services by an identified sponsor.100 The forms of advertising include ambient advertising, press advertising, TV advertising, Radio
advertising, outdoor advertising, transport advertising.
Sales promotion
Sales promotion is the offer of short-term incentives to the customer with
the purpose of encouraging the immediate purchase of a product The three
main type of sales promotion are customer promotion, trade promotion and
sale force promotion.
Public relations
Companies utilise the favourite publicity to building up corporate images
and good relations with various publics, as well as handling unfavourable
events.
Personal selling
The sales force of a company conducts the sale effort and the communication
with the customers.
Direct marketing
Direct marketing adopts various promotion activities to create an immediate sale, the interaction with potential customers or to maintain a the
100
90
Customer Service
Customer service is a crucial component of product strategy. Customer will expect some level of service accompanying product offer, which could be in the form
of promotion delivery, instruction, warranties, and return policies etc. In todays
market, because of the similarities of in product quality, service increasingly becomes an important tool for companies to gain a competitive advantage.
The cost is less to maintain an existing customer base for repeat purchase than to
win new customers. The task of customer service is not merely to deal with customer complaints, but also to be pro-active, to identify the needs of the customer,
and to develop a proper product strategy.
Marketing mix for Service (7 Ps)
Services have several unique characters: intangibility, inseparability, variability,
perishability and non-transferability. These characteristics differentiate services
from products. The marketing strategy for service therefore has another three
components other than the original 4 Ps. They are people, process management,
and physical evidence.
People refers to the customers of the organisation, the service personnel of
the organisation and other customers.
Process management regards to the process how the service deliver to the
customers.
Physical evidence is what the customer can sense physically that contributes to their perception of the service.101 Physical evidence has essential and peripheral kinds. A service could not be conducted without the
101
WWW32
91
essential evidence. Peripheral evidence refers to those aspects beside the essential evidence, which will affect the customers perception and evaluation
of the service quality.
4.2
The customer analysis above gives us an insight into the market and customers
of XploRe. Based on the facts and trends of XploRe market, some suggestions in
marketing strategy of XploRe are developed.102
102
The results of complementary analysis is only for reference. The marketing strategy is
developed based on the results of analysis of non-modified data.
92
4.2.1
XploRe is still a new product. The XploRe marketer has limited resources under
his command. To cover a large scope of total market is not wise, and less effective.
Niche market strategy could allocate the XploRe resources more efficiently and
utilise the resources more effectively.
4.2.2
Target Market
The user cluster of Internet surfer, Academia and Linux user could be identified
as the target markets for XploRe. Home worker is a mix group, but with high percentage of presence of private company, it could be regarded as a supplementary
target market.
Since almost half of the XploRe users are from the fields of Econometrics, Finance and Mathematical Statistics, and high percentage of customers come from
research institute, research institutes in Finance and Econometrics is a high profitable market. XploRe should focus on developing this market. Another niche
market XploRe could concentrate on is the market of Research institutes of Biological research and Applied Engineering. These two markets are the high profitable sectors of XploRe.
From a geographic viewpoint, almost all XploRe users are from three Countries:
Germany, USA, and Japan. XploRe could orient their resources more on the
market in these countries
4.2.3
XploRe Windows series products could define itself as a Advanced statistic software for financial & econometrics analysis. This decision based on the facts that
command driving statistical software become increasingly popular, the target
market for XploRe is the research institute in Finance and Econometrics and
the users expect the advanced methods in XploRe. For Windows Series, XploRe
could focus on to improve and promote the strength in Time series methods,
Multivariate analysis methods, Non-Semi parametric methods and Graphics /
Exploratory analysis.
103
Because the competitor analysis was not conducted in this study, therefore, here the price
strategy and monetary position will not be discussed here.
93
Because most Linux users of XploRe are from natural science background with
the emphasis on biology and engineering / Physics. They are interested in the
methods of Basic statistics and Graphics/ Exploratory data analysis. The Linux
based XploRe products could define it as the Statistical software for Biological research and Applied Engineering, which is especially good at Graphics and
Exploratory analysis.
4.2.4
94
95
- Program 2: Organising professional events such as conferences, seminars, workshops, discussion forum etc. through membership clubs.
- Program 3: High standards and professional solution offered by customer service could also help assistant in to achieving e this goal.
Tactic 4: Improve customer service
- Program 1: Customer service through Internet customer panel. Deliver customer service online through online forum/ panel, answer questions of customer and encourage discussions. This measure could help
in building up good image as well.
- Program 2: Customer Membership club offers professional lecture,
seminar, organise regular meetings and discussions.
- Program 3: Provide aids for problem solving, offer high quality and
professional solutions.
Tactic 5: Management and active Customer base; Keep potential customer
and actual customer base alive and active. Improve the communication with
customers and users, especially those in target markets.
- Program 1: Establish customer membership club and online forum/panel. Organise Membership based club activities, such as seminar, workshops, lectures, regular meetings, discussions etc. The user
base of XploRe is a valuable resource. If we could use this resource
actively and effectively, it will lead to unexpected returns. It should
be good managed to keep them active and alive.
Strategy 3: Concentrate on and optimise the direct marketing and
distribution channels.
Tactic 1: Improve the performance the direct online sale, Internet shopping. Because the Internet is a ultimate communication channel for XploRe
user, and currently it is the main distribution channel of XploRe. Improving
the performance and management of this channel is crucial for XploRe.
- Program 1: Improve the management of E-shop.
96
4.2.5
Price
Discriminate price strategy: XploRe could use discriminate price strategy. It
could offer different prices to different groups. For example: XploRe Club members and academic researchers could have discounts when they buy XploRe. The
students could get a lower price as well when they buy XploRe.
Place (Distribution)
1. Direct distribution channel
97
Product
To provide benefit to the customer, XploRe should develop a line of products
and modules, which can easily be adopted to reach the demands of different
customers.
To maximise the benefit it offers, XploRe should improve and promote its
strengths and product quality in the methods the user shows high interest, Time
series, Multivariate methods, Non- and Semi- parametric methods and in Graphics and exploratory methods towards certain groups.
Product position
XploRe windows version should define them as a product for econometric and
financial analysis, since three clusters of customers specialise in Econometric and
Finance. For XploRe Linux version, it is better to define the product more as
statistical software for natural science research or application.
Very few users use XploRe Client version. Client version can expand the capacity
of the user by using the Server of XploRe, but it needs a quite good Internet
connection to conduct the performance. Considering the advantage and feature
of this product, Client Version should promote to the target group of Internet
user, who might have Internet access with high speed and good quality.
Promotion / Communication
1. Communication channels
(1) Problem in XploRe communication channels
The range of downloaders might also be confined by the communication channels, that XploRe marketer uses. As the XploRe producer has good contact with
universities, more downloader get to know about XploRe through informal personal channels, which might skew the potential customer base. XploRe marketer
should use more formal channel to expand the downloader base, meaningfully
the potential customer base. The publication and conference should play also
important roles in the information search process.
Publicity channel: For the small percentage of presence of XploRe users from
institute and company, one reason might be that the lack effectiveness of the
publicity. Most of them get to know XploRe through Internet. The percentage
of professional publications and conferences as information resources is also very
low. Therefore, the strength and effectiveness of XploRe publicity should be
98
99
analysis.
The program based software like MatLab is quite popular in XploRe s users.
This part of users should be effectively reached by offering them right modus
and letting them learn more about the program possibility of XploRe, like the
capacity of building the library and quantlets yourself. Cluster Internet surfer
and Academia have a relatively high present of MatLab users. This message
should be effectively passed to them.
In short, XploRe market could employ both personal and non-personal communication channels, such as Internet, Publications and member club etc. to send
customers more persuasive messages and try to turn them into actual buyers.
2. Promotion mix
(1) Advertising
Advertising is any paid form of non-personal presentation and promotion of
ideas, goods or services by an identified sponsor.104
The advertisements of XploRe are mostly informative advertising. More Persuasive advertising should be used to rise the turning rate of users into actual
buyers.
The media vehicles can be flyer, brochure, professional magazine, free demo CDs
and Internet. For Internet surfer, Internet is the solely media vehicle. Online
XploRe forum could reach and maintain the group.
The banner can be considered to put in the special software searching engine
portal. For Academia, the utilisation of publication should be increased. To keep
the cost low, articles about XploRe should be encouraged and published. And
XploRe activity should be regularly reported and appear in professional publications especially in Financial and econometric publications through the active
information announcement to the publishers. For Linux user, special publications
and Internet portals should be considered as the media to reach and persuade
Linux users.
(2) Personal sales promotion
Certain conferences should be considered as well. XploRe should contribute more
conference appearance to strengthen its image. For instance, more articles can be
submitted and increase the presentations in the conferences. In addition, XploRe
could participate at more exhibitions and organise free presentations to increase
its publicity and product image as well.
104
100
Personal selling instruments, like free seminars, demonstrations could have more
impact on Academia group, because they rely heavily on the informal communication channel. And they could draw close the relationship of XploRe to its
customer, which will benefit to create and maintain a healthy customer relationship.
To create a XploRe Club or community could effectively promote the communication of XploRe members, which could also help the potential customers to turn
to actual customers.
(3) Direct marketing
Sales promotion
XploRe can employ more customer sales promotion to encourage customers
to buy. Trial version is already in use. Other tools, like cash related offers,
demonstrations and displays could be more utilised. Sales force promotion, like activities in exhibitions, trade shows could help XploRe to gain
awareness of the products and increase sales leads.
Other direct marketing instruments
In addition, XploRe could more actively engage in other direct marketing instruments, such as direct mail and catalogue marketing, integrated
database marketing, telemarketing, electronic shopping. All these instruments could keep the potential and customer base alive, increase the turning
rate into actual customer. They would also keep the actual customer base
alive, increase the rate of repeated buying and strengthen the image of the
products.
Customer service
As a part of product, customer service is a crucial issue. Concerning the additional 3P of service, XploRe could improve its customer service in the following
aspects:
People: Improve the attitude of customer service staff.
Physical evidence: Shorten delivery time, improve the access to service for customers.
Process management: Optimise the odering process and the process of problem
solving.
101
Because most of the customers of XploRe are professionals, they demand high
professional products. The service of XploRe should emphase on the expertise of
the XploRe. The aspects of people, physical evidence and process management
are the points the customer service stuff of XploRe should pay attention to,
such as quick response to customer, helping to solve their problem efficiently and
effectively and active in communication etc.
The membership club and customer online forum could be the instrument to offer
high quality customer service efficiently and effectively. But the traditional way
of offering customer service should also not be forgetten. The customer service
staff should can be reached by the customer in a personal way when needed. For
example, the service hot line. Although XploRe is a high technology product,
and most customers adopt Internet and communication tools, but the traditional
methods of personal contact could establish a most vivid and personal image of
XploRe. It is sometimes frustrating for customers when they face the machinery
world and could reach no human to hear from their special need.
The service of XploRe should company the customers all the time. Before the
purchase, to give them information and advice; during the purchase with process
instruction and quick delivery and after the purchase, get to know their needs
and help to solve the problem if there is any.
4.2.6
Besides the general marketing mix for whole XploRe products, we should also
recognise that differences exist in the clusters of XploRe users and customers. To
reach them effectively and efficiently, special marketing mix measures should be
designed for them as well.
Internet surfer
102
Product: Academia wants that XploRe has good performance in Nonsemiparametric methods especially. To convince this group, XploRe should
improve their strength of products in these methods.
Communicational channels: Academia uses Friends/Colleagues and Publications/Journals as main Information resources. They are a group of closely
tied professionals. To expand the influence of XploRe in this group, XploRe
member club or forum could be an effective way. The member club could
utilise the advantage of personal communicational channels in the group
to expand the base of potential customers and build up a closer relationship between XploRe and the customers. More appearance in professional
publications and journals could effectively promote XploRe through the
non-personal channels.
Communication message: Academia is a group of professionals, who are
majored in econometrics and mathematical statistics. Comparing to other
groups, they are more academic research oriented. The methods they want
are more advanced methods, such as Non-semiparameter methods. To promote XploRe in this group, the message should express XploRe in a high
professional and analytical way. The advantage of XploRe in advanced
methods in non- and semiparametric methods should be emphased.
Media vehicle: Membership club and Professional publications/ Journals
should be the main vehicles for Academia.
Service: Because Academia put more value in personal contact, it is important for them that they could reach the service personally. Therefore,
service through direct personal contact such as customer hot lines and visits
are in need for this group.
103
Linux user
Product: the product for Linux user should have high capability in Graphics
and exploratory data analysis.
Communication: Linux users are with nature science background. A high
percentage of them comes from research institutes, which indicates that it
could be a high return group. To reach them, XploRe should convey more
information to them about the XploRe application in Biological research
and Engineering. The emphasis on the strength of XploRe in Graphics and
Exploratory data analysis is with high importance. The media conveying
the XploRe message should be more natural science and Linux specialised
media.
Because the computer expertise of Linux users and their focus on Internet, XploRe
marketers could inform them more about the XploRes ability in utilising programs, which are written in fast languages such as Fortran or C via Dynamically
linked libraries.
Home worker
Product: Home worker group has a high Excel usage rate. This shows the
good opportunity to sell ReX to them. Graphics and Exploratory data
analysis are important aspects when the Home workers evaluate XploRe.
Communication: This group of customers are more engaged in Applied Finance. They are less academic oriented. And a high percentage of them
come from private companies. To persuade this group, XploRe should convince them that XploRe is a valuable instrument for practical financial
analysis and with good performance in Time series, Multivariate methods
and Graphics/Exploratory data analysis. The media XploRe to chose for
this group could be Publications and Journals in Finance field.
4.2.7
104
the base to understand the market and to develop the marketing strategy. To
improve the customer data, different ways of data collection could be conducted.
The questionnaire survey could be delivered through the distribution channel
together with the products. Telephone interviews directly with the customer
could be undertaken as well. Online forum or membership club could act as
active communication bases; the feedback from customers and users could be
collected for analysis purpose. From actual customers, we need to know more
about their features and their attitude towards XploRe. For the XploRe users,
the follow up online survey could also be carried out to get feedback of their
satisfaction grade with XploRe.
2. Association rule, sequential rule analysis for user and customer data
In future analysis, the association rule and sequential rule analysis could be taken
to find out the relationship between the data. Such analysis could be conducted
for both user and customer data to find the features of the users group, which
have a high turning rate into customer. It will be really exiting, if such a group
or rules could be identified. Then XploRe marketers could allocate their resource
more effectively to reach the high profitable customers.
3. Improvement of questionnaire
We are interested in studying features and behaviour patterns of the customers.
The questions and choices appearing in the questionnaire should all have a marketing reason back ground. Unclear questions or choices will lead to confusion
in the results, which could lead to confusion or bad result. In the current questionnaire, there exist shortcomings. We could make some improvement in future
analysis.
(1) Variable At Home
The disappearing of group Researcher and the emerging of the group Home
worker are the consequences of the value At home in variable Work Place.
Home worker is a mixed group. It is difficult to identify them and reach them.
And Home worker has also mixed characters. They might be the people using
XploRe at home, but work in other places. This fact induces the confusion to the
answers. Furthermore, to identify where people actually use XploRe, at home or
working place, has no obvious marketing reasoning. Therefore, the improvement
in the questionnaire should be made to some questions and choices.
(2) Question of Country
Another shortcoming in the questionnaire exists in the choice of Country. Since
there is no question or explanation to this choice, one could understand that this
105
choice asks for his original nationality. In this survey, the original nationality is
not important because the cultural difference is not an important factor in this
survey, but the geographic factor. We want to use this choice to locate geographic
markets. For example, a user originally comes from Africa but is now working
in France. When he downloaded XploRe, it is more important to know that in
which country he works and in which country he will use XploRe. Because in this
country, he more likely to buy XploRe. In this case, we want him he to choose
France as answer. But in fact, he might choose the country of his nationality.
This will mislead the marketer. The marketing measures will not reach him in
Africa, but in France. From this point of view, the question of in which country
you work or use XploRe maybe is more proper.
4. Solve the problem of large size of domain through regrouping
Some variables in the survey have large size of domain, such as Software (12),
Fieldwork (10), Method used (12), Methods looked for (12) and country (93). The
large domain size results in low Condorcet value when these variables are taken
as input variables. These facts lead to the almost exclusion of the possibility
of those variables as input variables. Some quite interesting variables such as
methods looked for and software only can be used as complementary variables in
the analysis.
The improvement is already made in the analysis of Regrouped data analysis.
The variables are regrouped and combined together to form smaller domains.
In future analysis, the improvement of the regrouping could be conducted. The
nature of the variables and choices should be studied and examined in more
detail. The regrouping of variable values should represent the marketing reason as
well, which must be based on more detailed study of the products and methods.
For example, for software the following questions should be answered. What
are the exact features for each group of software? Which group is with similar
characteristic with XploRe? In which aspects? Will such grouping help to find
the target market of XploRe?
5. Further study of high profitable customer sector
High percentage of customers turns out to be from research institute. To locate
this group, further study should focus on this group. The feature and behaviour
patterns of this group are important for XploRe marketer to design more effective
marketing measures to reach them and persuade them.
6. Online user segmentation analysis
With the development of E-commerce, Internet is the dominant instrument of
106
communication for XploRe customers. It is the main marketing tool for XploRe
as well. To effectively reach the customer online and make the marketing effort
more profitable, it is necessary for the XploRe marketer to understand the online
users and buyers behaviours patterns and features.
There are already many studies focusing on this topic. McKinsey segmented the
online user into six groups: Simplifiers, Surfers, Bargainers, Connectors, Routiners and Sporters according to the differences in active time online, pages and
domains accessed, and active time spent per page.105 Other normally used attributes to segment and measure the loyalty of online user are Frequency, Recency.
Customer response, retention and valuation model or The Recency, Frequency,
Monetary value (RFM) is also useful customer model to predict the future value
and loyalty of customer segments and help to create high ROI promotions.106
Because Internet is crucial for XploRe , for XploRe marketer the task to learn
more about their online users and customers are with primary importance.
7. Competitor analysis and SWOT analysis for deriving a full scale
marketing strategy for XploRe
Here only customer analysis was undertaken. In order to derive a full scale marketing strategy and marketing mix for XploRe, more insight analysis for market
and product should be conducted, such as the competitor analysis, SWOT analysis, etc. All these are future tasks of XploRe marketer.
105
106
References
AAKER, DAVID A.: Strategic market management, Six Edition, John Wiley
& Sons, Inc., 2001.
ALDENDERFER, M. S. and BLASHFIELD, R. K.: Cluster Analysis, Series:
Quantitative Applications in the Social Science, SAGE University Papers,
Sage Publications, Inc. 1984.
ANDERSEN, ERLING B.: Introduction to the statistical analysis of categorical
data, Springer, 1997.
ANDRITSOS, PERIKLIS: Data clustering techniques Qualifying oral examination paper, Department of computer science, University of Toronto,
March 11, 2002.
ABELL, D. F. and HAMMOD, J. S.: Strategic Market Planning: Problems
and Analytical Approaches, Prentice-Hall. Inc., 1979.
ARMSTRONG, G. and KOTLER, P.: Marketing An Introduction, Pearson
Education, Inc., 2002.
BACKHAUS, K., ERICHSON, B., WEIBER, W., PLINKE, R.: Multivariate Analysemethoden Eine anwendungsorientierte Einf
uhrung, - 9.,
u
berarbeitete und erweite Auflage, Springer Verlag, 2000. P328- 389.
BAKER, M. and HART, S.: Product Strategy and Management, Prentice
Hall, 1999.
BANNES, E., McCLELLAND, B., MEYER, R.: Marketing An active learning
approach, Blackwell Publications Ltd., 1997. P138-237
BERRY, MICHAEL J. A., LINOFF, G.: Data Mining Techniques For Marketing, Sales, and Customer Support, John Wiley & Sons, Inc., 1997.
BLOIS, KEITH: The Oxford Textbook of Marketing, Oxford University Press,
2000.
BLYTHE, J.: Marketing Strategy, McGraw-Hill Education, 2003.
107
108
References
References
109
HARDLE,
W., KLINKE, S., MULLER,
M.:
Springer Verlag, 2000.
XploRe
Learning Guide,
HARDLE,
W., SIMAR, L.: Applied multivariate statistical analysis, Insti
tute f
ur Statistik und Okonometrie,
Wirtschaftswissenschaftliche Fakultat,
Humboldt Universitat zu Berlin, 2000. P295-312.
HEYGATE, RICHARD: Customer Analysis, Sophron Partners Ltd, 1998.
http://www.icare.cl/CAM/pdf/15.pdf
HILL, C. W. L. and JONES, G. R.: Strategic Management Theory An Integrated Approach, Fourth Edition, Houghton Mifflin Company, 1998.
110
References
IBM Corp.: Tutorial of IBM DB2 Intelligent Miner for Data, IBM Corp.
1999.
JACOB, FRANK: Development, management and governance of relationship,
International Conference on relationship marketing, March 29-31,1996,
Berlin.
JAIN, ANIL K., DUBES, RICHARD C.: Algorithms for Clustering Data,
Prentice-Hall, 1988.
JARKE, M., LENZERINI, M., VASSILIOU, Y., VASSILIADIS, P.: Fundamentals of Data Warehouse, Springer, 1999.
JOACHIM MUNCHA, H., SOFYAN, H.: Cluster analysis, Discussion paper 49, 2000, Sonderforschungsberich 373, Quantifikation und Simulation
Okonomischer
Prozesse, Humboldt-Universitat zu Berlin.
JOACHIM MUCHA, HANS: Clustering in an interactive way, Discussion paper 13, 1995, Sonderforschungsberich 373, Quantifikation und Simulation
Okonomischer
Prozesse, Humboldt-Universitat zu Berlin.
KALAKOTA, R. and WHINSTON, A. B.: Do or Die: market Segmentation
and Product Positioning on the Internet, http://cism.bus.utexas.edu/res/
articles/segmentation.html
KAUFMAN, L., ROUSSEEUW, PETER J.: Finding Groups in Data An
Introduction to Cluster Analysis, A Wiley-Interscience Publication, John
Wiley & Sons, Inc. 1990.
KOTALA, P., PERERA, A., KAI ZHOU, J., MUDIVARTHY, S., PERRIZO,
W., DECKHARD, E.: Gene expression profiling of DANN Microarray
Data Peano Count Trees (P-Trees), North Dakota State University, 2001.
http://www.ndsu.edu/virtual-genomics
KOTLER, P.: Marketing management, 7th ed., Englewood Cliffs, New Jersey:
Prentice- Hall, 1991.
KURES, M., RYAN, B. and LAMB, G.: Customer Profiling and Prospecting
Analysis: For the Door County Lodging Industry, University of WisconsinExtension, May 2001.
References
111
http://
PLYE, DORIAN: Data preparation for data mining, Academic Press, 1999.
RAJOLA, FEDERICO: Customer Relationship management Organizational
and Technological Perspectives, Springer, 2003.
112
References
RONZ,
BERND: Computergest
utzte Statistik, Institute f
ur Statistik und
References
113
WOJCIECHOWSKI, MAREK: Discovering and Processing Sequential Patterns in Databases, Poznan University of Technology, Institute of
Computing Science, Poland. http://www.edbt2000.uni-konstanz.de/phdworkshop/ papers/wojciehowski.pdf
WRIGHT, M. and ESSLEMONT, D.: The Logical Limitations of Target Marketing, Marketing Bulletin, 1994,5,13-20. http://marketingbulletin.massey.ac.nz/ article5/article2b.asp
WWW1:
Hamming
hammingdist.html
distance,
http://www.nist.gov/dads/HTML/
http://www.mosbygrey.com/
114
References
analysis.
http://www.psychadvantage.com/cust-
WWW14: Customer Analysis: A Manual of Techniques, The University Libraries, University of Southern California, 1997. http://isd.usc.edu/ jkwan/CAManual.pdf
WWW15: What is CRM?, http://www.edgeservices.com/salesmarketing/
what is crm.shtml
WWW16:Customer Profiling Pandectas Guide To Accurate Customer profiling, http://pandects.com/customer profiling.html.
WWW17: Drilling Down Turning Customer Data into Profits with a Spreadsheet, http://www.drilling-down.com/profiles.htm
WWW18: Database Marketing: Customer Profiling, http://www.schooldata.
com/ssm-profiling.html
WWW19: What is a customer profile, http://sic.nvgc.vt.edu/SICstuffVirtual/KANG/WWW/tutorial profile.html
WWW20: Marketing Segmentation, http://www.educationsupport.co.uk/
downloads/rjh/segmentation.pdf
WWW21:
Marketing Mix,
lesson marketing mix.htm
http://www.marketingteacher.com/Lesson/
Mix,
http://sol.brunel.ac.uk/javis/bola/
References
WWW26:
Market segmentation,
factory/marketing/theories2.htm
115
http://www.bized.ac.uk/virtual/cb/
Appendix
116
Frequency
507
216
206
41
211
1181
Percentage
42.9
18.3
17.4
3.5
17.9
100.0
Frequency
204
165
155
144
113
64
63
58
39
28
86
1181
Percentage
17.3
14.0
13.1
12.2
9.6
5.4
5.3
4.9
3.3
2.4
7.3
100.0
Where Work
University
At home
Research institute
Private company
Government or international organization
Other
Sum
Frequency
584
341
107
78
28
43
1181
Percentage
49.4
28.9
9.1
6.6
2.4
3.6
100.0
Xversion
Local
ReX
Client
Sum
Frequency
1022
110
49
1181
Percentage
86.5
9.3
4.1
100.0
Field Work
Econometrics
(Mathematical) Statistics
Finance and actual science
Physics and engineering
Biometrics or Biostatistics
Social science (sociology, psychology, etc.)
Risk analysis
Marketing and survey research
Epidemiology
Other
Sum
Frequency
285
141
130
119
117
70
62
50
20
187
1181
Percentage
24.1
11.9
11.0
10.1
9.9
5.9
5.2
4.2
1.7
15.8
100.0
117
118
Appendix 1
Methods Used
Time series
Basic statistics
Multivariate methods
Linear models
Graphics and exploratory data analysis
Non- and semiparametric methods
Generalized linear models and limited dependent variables
Panel data/cross-sectional methods
Resampling and simulation methods
Tools for learning or teaching statistics
Survival analysis
Other
Sum
Frequency
221
185
166
122
92
82
78
38
38
31
29
74
1181
Percentage
18.7
15.7
14.1
10.3
7.8
6.9
6.6
3.2
3.2
2.6
2.5
6.3
100.0
Software
Excel
SPSS
MatLab
R
Eviews
SAS
S/S-Plus
GAUSS
Statistica
MiniTab
Stata
LIMDEP
Rats
TSP
Xlisp-stat
XGobi
Other
Sum
Frequency
297
132
123
89
78
76
65
47
39
32
32
12
11
10
5
1
132
1181
Percentage
25.1
11.2
10.4
7.5
6.6
6.4
5.5
4.0
3.3
2.7
2.7
1.0
0.9
0.8
0.4
0.1
11.2
100.0
Platform L
Windows NT
Linux
Solaris
Others
Sum
Frequency
952
149
13
18
1132
Percentage
84.1
13.2
1.1
1.6
100.0
Platform C
Windows NT
Linux
Apple
Solaris
Sum
Frequency
43
3
3
0
49
Percentage
87.8
6.1
6.1
0.0
100.0
Appendix 1
119
Continent
Europe
America
Asia-Pacific
Africa
Sum
Frequency
622
289
242
28
1181
Percentage
52.7
24.5
20.5
2.4
100.0
Country
Germany
USA
Japan
Sum
Frequency
199
186
102
1181
Percentage
16.9
15.7
8.6
100.0
Platform
Windows NT
Linux
Solaris
Others
Sum
Frequency
995
152
13
21
1181
Percentage
84.3
12.9
1.1
1.8
100.0
Frequency
11
8
3
2
1
1
1
1
1
1
1
1
32
Percentage
34.4
25.0
9.4
6.2
3.1
3.1
3.1
3.1
3.1
3.1
3.1
3.1
100.0
Federal State
Baden-W
urttenberg
Berlin
Brandenburg
Schleswig-Holstein
Rheinland-Pfalz
Missing value
Sum
Frequency
1
1
1
1
1
27
32
Percentage
3.1
3.1
3.1
3.1
3.1
84.4
100.0
OS
Windows 2000/NT
Windows 95/98
Missing value
Sum
Frequency
6
4
22
32
Percentage
18.8
12.5
68.8
100.0
Title
Prof.
Dr.
Prof. Dr.
Missing value
Sum
Frequency
3
2
2
25
32
Percentage
9.4
6.2
6.2
78.1
100.0
Sex
Man
Woman
Sum
Frequency
25
7
32
Percentage
78.1
21.9
100.0
Language
English
German
French
Italian
Spanish
Missing value
Sum
Frequency
6
5
1
1
0
19
32
Percentage
18.8
15.6
3.1
3.1
0.0
59.4
100.0
Sector
Research Institute
Company
Missing value
Sum
Frequency
11
1
20
32
Percentage
34.4
3.1
62.5
100.0
Branch
Economics
Statistics
Biostatistics
Mathematics
Computer science
Missing value
Sum
Frequency
3
1
1
1
1
25
32
Percentage
9.4
3.1
3.1
3.1
3.1
78.1
100.0
120
121
Attributes
WWW, Newsgroup
Econometrics
University
Local
Windows
Freq.
99.4%
20%
100%
100%
100%
Attributes
WWW, Newsgroups
Others
Econometrics
At Home
Local
Windows
Freq.
39.3%
28.1%
23.8%
44.2%
77.1%
96.5%
Attributes
Friends, Colleagues
Econometrics
University
Local
Windows
Freq.
51.1%
40.5%
100%
100%
98.9%
Attributes
WWW, Newsgroup
Others
Biometrics & Biostatistics
University
Local
Linux
Freq.
51.6%
21.7%
19%
58.2%
94.6%
76.6%
Cluster Academia
Cluster Character
Size(abs.) 190
Size(rel.)
16%
Variable
First learn
Fieldwork
Where Work
Xversion
Platform
122
Frequency
846
354
324
67
354
1945
Percentage
43.5
18.2
16.7
3.4
18.2
100.0
Frequency
331
279
256
228
194
109
99
90
82
74
44
159
1945
Percentage
17.0
14.3
13.2
11.7
10.0
5.6
5.1
4.6
4.2
3.8
2.3
8.2
100.0
Where Work
University
At home
Research institute
Private company
Government or international organization
Other
Sum
Frequency
930
576
174
140
42
83
1945
Percentage
47.8
29.6
8.9
7.2
2.2
4.3
100.0
Field Work
Econometrics
(Mathematical) Statistics
Finance and actual science
Physics and engineering
Biometrics or Biostatistics
Risk analysis
Social science (sociology, psychology, etc.)
Marketing and survey research
Epidemiology
Other
Sum
Frequency
428
253
248
201
165
124
111
70
34
311
1945
Percentage
22.0
13.0
12.8
10.3
8.5
6.4
5.7
3.6
1.7
16.0
100.0
123
124
Appendix 5
Methods Used
Time series
Basic statistics
Multivariate methods
Linear models
Non- and semiparametric methods
Graphics and exploratory data analysis
Generalized linear models and limited dependent variables
Panel data/cross-sectional methods
Tools for learning or teaching statistics
Resampling and simulation methods
Survival analysis
Other
Sum
Frequency
373
332
273
213
136
132
114
90
60
54
44
124
1181
Percentage
19.2
17.1
14.0
11.0
7.0
6.8
5.9
4.6
3.1
2.8
2.3
6.4
100.0
Software
Excel
MatLab
SPSS
R
Eviews
SAS
S/S-Plus
GAUSS
Statistica
Stata
MiniTab
LIMDEP
Rats
TSP
Xlisp-stat
XGobi
Other
Sum
Frequency
504
214
205
145
126
116
100
80
62
56
49
23
15
15
8
5
222
1945
Percentage
25.9
11.0
10.5
7.5
6.5
6.0
5.1
4.1
3.2
2.9
2.5
1.2
0.8
0.8
0.4
0.3
11.4
100.0
Platform L
Windows NT
Linux
No
Solaris
Other
Sum
Frequency
1595
231
80
21
18
1945
Percentage
82.0
11.9
4.1
1.1
0.9
100.0
Platform C
No
Windows NT
Apple
Linux
Solaris
Sum
Frequency
1865
66
9
5
0
1945
Percentage
95.9
3.4
0.5
0.3
0.0
100.0
Appendix 5
125
Continent
Europe
America
Asia-Pacific
Africa
Other
Sum
Frequency
983
486
421
40
15
1945
Percentage
50.5
25.0
21.6
2.1
0.8
100.0
Country
Germany
USA
Japan
Sum
Frequency
318
310
168
Percentage
16.4
15.8
8.1
0.0
Platform
Windows NT
Linux
Solaris
Other
Sum
Frequency
1661
236
21
27
1945
Percentage
85.4
12.1
1.1
1.4
100.0
Xversion
Local
ReX
Client
Sum
Frequency
1643
222
80
1945
Percentage
84.5
11.4
4.1
100.0
Academia (29%)
First learn
WWW, newsgroups
100%
Friends, colleagues
publications, journals
conferences
WWW, newsgroups
other
39%
31%
6%
0%
23%
university
at home
private company
research institute
gov./international org.
44%
32%
10%
8%
2%
university
research institute
private company
gov./international org.
at home
other
88%
8%
2%
2%
0%
2%
Excel
MatLab
SPSS
SAS
other
28%
14%
11%
8%
11%
Excel
MatLab
SPSS
Eviews
20%
11%
9%
9%
17%
14%
13%
11%
17%
Econometrics
(Mathematical) Statistics
Finance & actuarial sc.
Bio-metrics/statistics
other
38%
17%
10%
6%
9%
20%
16%
11%
11%
8%
Non/semipara.meth.
Time series
Multivariate meth.
Basic statistics
Graph./ explor.analy.
20%
16%
12%
10%
8%
21%
18%
16%
10%
7%
Time series
Basic statistics
Linear models
Multivariate meth.
Non/semipara.meth.
19%
14%
13%
12%
11%
99%
1%
Windows NT
Linux
98%
1%
85%
12%
4%
Local
ReX
Client
86%
9%
5%
Where Work
Software
Field Work
Econometrics
Finance & actuarial sc.
Physics & engin.
(Mathematical) Statistics
other
Methods looked for
Time series
Multivariate meth.
Non/semipara.meth.
Graph./ explor.analy.
Basic statistics
Methods used
Time series
Basic statistics
Multivariate meth.
Linear models
Graph./ explor.analy.
Platform
Windows NT
Solaris
Xversion
Local
ReX
Client
126
Appendix 6
127
32%
19%
5%
0%
44%
at home
private company
research institute
gov./international org.
university
other
67%
12%
10%
3%
0%
8%
Excel
SPSS
MatLab
Eviews
other
35%
9%
9%
7%
13%
19%
15%
12%
10%
21%
Time series
Multivariate meth.
Graph./ explor.analy.
Non/semipara.meth.
other
16%
16%
15%
10%
11%
Time series
Basic statistics
Multivariate meth.
Linear models
other
19%
16%
15%
11%
9%
Windows NT
Local
ReX
Client
100%
78%
19%
3%
WWW, newsgroups
publications, journals
Friends, colleagues
conferences
other
44%
18%
17%
3%
18%
43%
31%
12%
5%
3%
6%
university
at home
research institute
private company
gov./international org.
other
48%
30%
9%
7%
2%
4%
20%
18%
13%
6%
19%
Excel
MatLab
SPSS
R
other
26%
11%
11%
8%
11%
18%
14%
11%
11%
22%
Econometrics
(Mathematical) Statistics
Finance & actuarial sc.
Physics & engin.
other
22%
13%
13%
10%
16%
16%
16%
14%
13%
11%
Time series
Multivariate meth.
Non/semipara.meth.
Graph./ explor.analy.
Basic statistics
17%
14%
13%
12%
10%
22%
15%
12%
12%
9%
Time series
Basic statistics
Multivariate meth.
Linear models
Non/semipara.meth.
19%
17%
14%
11%
7%
86%
5%
Windows NT
Linux
85%
12%
90%
5%
5%
Local
ReX
Client
85%
11%
4%
Regrouped User
Modal Value
Modal
Freq.
No. of
Values
Modal Value
Modal
Freq.
No. of
Values
WWW, Newsgroup
University
Excel
Econometrics
Time Series
Time Series
Local
Windows NT
Windows NT
Windows NT
Germany
Europe
43.5%
47.8%
25.9%
22.0%
19.2%
17.0%
84.5%
82.0%
95.9%
85.4%
16.4%
50.5%
5
6
17
7
12
12
3
4
4
4
93
4
WWW, Newsgroup
University
Statistics
Econometrics
Multi./Non-Semipara.meth.
Multi./Non-Semipara.meth.
Local
Windows NT
Windows NT
Windows NT
Germany
Europe
43.5%
47.8%
30.0%
22.0%
40.1%
40.5%
84.5%
82.0%
95.9%
85.4%
16.4%
50.5%
5
6
6
7
7
7
3
4
4
4
93
4
128
Frequency
846
354
324
67
354
1945
Percentage
43.5
18.2
16.7
3.4
18.2
100.0
Frequency
Percentage
Where Work
University
At home
Research institute
Private company
Government or international organization
Other
Sum
Frequency
930
576
174
140
42
83
1945
Percentage
47.8
29.6
8.9
7.2
2.2
4.3
100.0
Field Work
Econometrics
Finance and actual science/Risk analysis
(Mathematical) Statistics
Physics and engineering
Biometrics or Biostatistics/Epidemiology
Social science/Marketing and survey research
Other
Sum
Frequency
428
372
253
201
199
181
311
1945
Percentage
22.0
19.1
13.0
10.3
10.2
9.3
16.0
100.0
129
737
331
40.5
17.0
302
194
90
82
159
1945
15.5
10.0
4.6
4.2
8.2
100.0
130
Appendix 8
Methods Used
Multivariate methods/Non. and semiparametric methods/
Generalized linear models and limited dependent variables/
Linear models/Survival analysis
Time series
Basic statistics
Graphics and exploratory data analysis/
Tools for learning or teaching statistics
Panel data/cross-sectional methods
Tools for learning or teaching statistics
Resampling and simulation methods
Other
Sum
Frequency
Percentage
Software
Statistics
Excel
Applied
Econometrics
Rest
Other
Sum
Frequency
583
504
316
292
28
222
1945
Percentage
30.0
25.9
16.2
15.0
1.4
11.4
100.0
Platform L
Windows NT
Linux
No
Solaris
Other
Sum
Frequency
1595
231
80
21
18
1945
Percentage
82.0
11.9
4.1
1.1
0.9
100.0
Platform C
No
Windows NT
Apple
Linux
Solaris
Sum
Frequency
1865
66
9
5
0
1945
Percentage
95.9
3.4
0.5
0.3
0.0
100.0
780
373
332
40.1
19.2
17.1
192
90
60
54
124
1181
9.9
4.6
3.1
2.8
6.4
100.0
Appendix 8
131
Continent
Europe
America
Asia-Pacific
Africa
Other
Sum
Frequency
983
486
421
40
15
1945
Percentage
50.5
25.0
21.6
2.1
0.8
100.0
Country
Germany
USA
Japan
Sum
Frequency
318
310
168
Percentage
16.4
15.8
8.1
0.0
Platform
Windows NT
Linux
Solaris
Other
Sum
Frequency
1661
236
21
27
1945
Percentage
85.4
12.1
1.1
1.4
100.0
Xversion
Local
ReX
Client
Sum
Frequency
1643
222
80
1945
Percentage
84.5
11.4
4.1
100.0
Academia (41%)
First learn
WWW, newsgroups
conferences
96%
4%
WWW, newsgroups
Friends, colleagues
publications, journals
conferences
other
38%
22%
20%
4%
16%
at home
private company
research institute
gov./international org.
other
55%
18%
16%
5%
6%
university
Excel
Statistics
Applied
Econometrics
other
37%
29%
15%
7%
12%
Statistics
Econometrics
Excel
Applied
other
32%
21%
20%
17%
8%
32%
13%
13%
9%
17%
Econometrics
Statistics
Finance/Risk ana.
BioMetr./stat. & Empi.
other
30%
17%
13%
12%
12%
37%
21%
16%
9%
9%
Mult/Semi./Linear
Time series
Graph./explor./Learn.
Statistics
other
43%
16%
13%
10%
7%
35%
20%
19%
11%
7%
Mult/Semi./Linear
Time series
Statistics
Graph./explor./Learn.
Panel/cross-sec. Time Ser.
42%
19%
15%
8%
7%
98%
1%
Windows NT
83%
14%
3%
Local
ReX
Client
Where Work
100%
Software
Field Work
Finance/Risk ana.
Econometrics
Physics & engin.
Social Sc./Market
other
Methods looked for
Mult/Semi./Linear
Time series
Graph./explor./Learn.
Statistics
other
Methods used
Mult/Semi./Linear
Time series
Statistics
Graph./explor./Learn.
other
OS Platform
Windows NT
Solaris
Xversion
Local
ReX
Client
132
100%
85%
10%
6%
Appendix 9
133
33%
26%
1%
0%
39%
at home
research institute
private company
gov./international org.
university
other
59%
17%
11%
4%
0%
9%
Excel
Statistics
Econometrics
Applied
other
32%
23%
16%
14%
12%
Finance/Risk ana.
Econometrics
Statistics
Physics & engin.
other
25%
21%
12%
9%
19%
Mult/Semi./Linear
Time series
Graph./explor./Learn.
Statistics
other
40%
18%
17%
8%
9%
Mult/Semi./Linear
Time series
Statistics
Graph./explor./Learn.
other
38%
22%
16%
9%
8%
Windows NT
Local
ReX
Client
100%
82%
16%
2%
WWW, newsgroups
publications, journals
Friends, colleagues
conferences
other
44%
18%
17%
3%
18%
45%
29%
11%
4%
2%
6%
university
at home
research institute
private company
gov./international org.
other
48%
30%
9%
7%
2%
4%
37%
18%
17%
8%
18%
Statistics
Excel
Applied
Econometrics
other
30%
26%
16%
15%
11%
24%
14%
14%
12%
22%
Econometrics
Finance/Risk ana.
Statistics
Physics & engin.
other
22%
19%
13%
10%
16%
39%
21%
15%
13%
7%
Mult/Semi./Linear
Time series
Graph./explor./Learn.
Statistics
other
41%
17%
16%
10%
8%
40%
21%
15%
14%
6%
Mult/Semi./Linear
Time series
Statistics
Graph./explor./Learn.
other
40%
19%
17%
10%
6%
86%
7%
Windows NT
Linux
85%
12%
91%
5%
4%
Local
ReX
Client
85%
11%
4%
Frequency
53
22
19
3
10
107
Percentage
49.5
20.6
17.8
2.8
9.3
100.0
Frequency
20
18
17
10
8
8
7
5
4
4
3
3
107
Percentage
18.7
16.8
15.9
9.3
7.5
7.5
6.5
4.7
3.7
3.7
2.8
2.8
100.0
Where Work
Research institute
At home
Government or international organization
Private company
University
Other
Sum
Frequency
107
0
0
0
0
0
107
Percentage
100.0
0.0
0.0
0.0
0.0
0.0
100.0
Field Work
Econometrics
Physics and engineering
Biometrics or Biostatistics
Social science (sociology, psychology, etc.)
Epidemiology
Finance and actual science
Risk analysis
(Mathematical) Statistics
Marketing and survey research
Other
Sum
Frequency
23
19
16
8
6
6
6
4
2
17
107
Percentage
21.5
17.8
15.0
7.5
5.6
5.6
5.6
3.7
1.9
15.9
100.0
134
Appendix 10
135
Methods Used
Time series
Multivariate methods
Generalized linear models and limited dependent variables
Basic statistics
Linear models
Non- and semiparametric methods
Graphics and exploratory data analysis
Panel data/cross-sectional methods
Resampling and simulation methods
Tools for learning or teaching statistics
Survival analysis
Other
Sum
Frequency
21
18
15
12
11
11
8
4
3
2
1
1
107
Percentage
19.6
16.8
14.0
11.2
10.3
10.3
7.5
3.7
2.8
1.9
0.9
0.9
100.0
Software
Excel
MatLab
SPSS
S/S-Plus
GAUSS
R
SAS
Statistica
MiniTab
Eviews
LIMDEP
TSP
Rats
Stata
XGobi
Xlisp-stat
Other
Sum
Frequency
20
18
13
9
6
6
6
5
4
3
2
2
1
0
0
0
12
107
Percentage
18.7
16.8
12.1
8.4
5.6
5.6
5.6
4.7
3.7
2.8
1.9
1.9
0.9
0.0
0.0
0.0
11.2
100.0
Platform L
Windows NT
Linux
No
Solaris
Other
Sum
Frequency
84
18
2
1
2
107
Percentage
78.5
16.8
1.9
1.9
1.9
100.0
Platform C
No
Windows NT
Linux
Solaris
Apple
Sum
Frequency
105
2
0
0
0
107
Percentage
98.1
1.9
0.0
0.0
0.0
100.0
136
Appendix 10
Continent
Europe
Asia-Pacific
America
Africa
Sum
Frequency
51
34
18
4
107
Percentage
47.7
31.8
16.8
3.7
100.0
Country
Germany
Japan
USA
Sum
Frequency
16
10
7
Percentage
15.0
29.9
6.5
0.0
Platform
Windows NT
Linux
Solaris
Other
Sum
Frequency
86
18
1
4
107
Percentage
80.4
16.8
0.9
1.9
100.0
Xversion
Local
ReX
Client
Sum
Frequency
97
8
2
107
Percentage
90.7
7.5
1.9
100.0
Erkl
arung zur Urheberschaft
Hiermit erklare ich, dass ich die Arbeit selbstandig verfasst, keine anderen als die
angegebenen Quellen und Hilfsmittel benutzt und die diesen Quellen und Hilfsmitteln wortlich
oder sinngema entnommenen Ausf
uhrungen als solche kenntlich gemacht habe.
Jianqiu Wang
137