Académique Documents
Professionnel Documents
Culture Documents
By
Dissertation work carried out at
Dissertation Submitted in partial fulfillment of the requirement of
M.S in Software Engineering
Under the Supervision of
September 2014
CERTIFICATE
This is to certify that the Dissertation entitled Self-Service Segmentation
model generator for E-Commerce sites submitted by the partial fulfillment
of the requirements of M.S. Software engineering degree of BITS, embodies
the work done by him under my supervision.
Designation :
Location
Date
ABSTRACT
Dissertation Title
Name of Supervisor/Guide
Semester
Course Code
The objective of the dissertation is to provide a tool which would provide the
framework for generating various segmentation models based on different
operational and behavioral metrics and to validate the model with historical
data. The tool will present different options for the analyst to choose and will
guide the analyst to come up with a segmentation model even if the analyst
has least idea about what he/she wants to generate. The various operational
metrics for the users/customers are user/customers revenue, number of
purchases, Overall Feedback positive, negative or neutral, Shipping rating,
Number of disputes etc., and the various behavioral metrics are page view
count, count of Items viewed on the site, count of purchase intentions
showed like add to cart, wish list. Tool will also present the options to
customize user geography for which the analyst wants the segmentation to
happen.
Currently there is *no* segmentation tool available in the market is
sophisticated to handle such large volumes of data & integrate different
datasets like clickstream, page view data, bids, buys, sells, listings, customer
demographics, and scores. Since I have closely worked with the client for
close to four years at onsite, I have a better understanding of their
customers, business, business needs, data and data sources. Using my hard
earned knowledge & with the help of right technology, I am sure I will be able
to provide a product which integrates all the above data sets to server the
purpose.
One more reason why large market places are hesitant to use the 3rd party
tools is that they dont want to share the proprietary data which could be
used to identify their customers. This question of sharing the Private data
will be void by the clients business relationship with cognizant.
Signature of Student
Signature of Supervisor
Date:
Date
Name:
Name of Supervisor:
Designation:
Location:
Location:
3
Acknowledgement
I take this opportunity to express my gratitude to my project guide for providing his
insight, guidance & advice whenever & wherever I needed. His sound understanding
of the IT business, project manager needs, clear communication and guidance has
helped me shape the direction of this dissertation project work.
Proposed Activities
Stage
Purpose
Activities
Stage 1
Define Scope of
Dissertation
Define
Requirements
Stage 2
Stage 3
Status
Deliverable
Preliminary
Report
Refining the
requirements
Proposed
Solution
100%
Development QA
and
Implementation
20%
Review
Final
Report
review
100%
Overall architecture
Detailed design for Backend Data flow and data model
the proposed
Front end screen flow
Solution
Coding & Unit testing
1. Backend scripts for data load
2. Frontend forms design
3. VB scripting for the forms
Mid
report
Table of Contents
1.
OBJECTIVE.............................................................................................7
2.
SCOPE....................................................................................................7
4.
CODING STRATEGY...................................................................................16
BEST PRACTICES......................................................................................16
CODING CONVENTION...............................................................................17
CONSTRUCTION.......................................................................................17
CODE REVIEW.........................................................................................17
UNIT TESTING.........................................................................................17
QA........................................................................................................17
4.1 QA STRATEGY.........................................................................................17
4.1.1 QA Scope.....................................................................................17
4.2 FUNCTIONAL QA......................................................................................17
4.3 QA ASSESSMENT.....................................................................................17
5.
APPLICATION SCREENSHOTS...............................................................17
6.
FUTURE EXTENSIBILITY.......................................................................20
6.1 SCOPE FOR FUTURE EXTENSION....................................................................20
6.2 GUIDELINES...........................................................................................20
REFERENCES.................................................................................................22
6
1.
Objective
The basic objective of this tool is to help the analysts to generate various customer
or user segmentation models and *not* just ad-hoc segmentations based on the
given metric in quick time and to validate the models beforehand instead of
implementing the models in the real world & gauging their value later. Generally the
segmentation model will be derived by the analysts manually after multiple iterations
of research by writing codes to extract the data from the Data Warehouse tables.
Multiple manual iterations extends the lead time & in almost all the cases the model
will be redefined after a period of time based on the ROI value which involves
additional effort & cost.
The tool will present different options to choose and will guide the analyst to come
up with a segmentation model even if the analyst has least idea about what he/she
wants to generate. The various operational metrics for the users/customers are
user/customers revenue, number of purchases, Overall Feedback positive,
negative or neutral, Shipping rating, Number of disputes etc., and the various
behavioral metrics are page view count, count of Items viewed on the site, count of
purchase intentions showed like add to cart, wish list. Tool will also present the
options to customize user geography for which the analyst wants the segmentation
to happen.
2.
Scope
Out of Scope:
2.1.1
2.1.1.1
Creating a Buyer ABCDE segmentation model with trailing 7 days
GMB metrics & static thresholds for US region
1. User accesses the SSS model generator front end to create a new segment
through the user interface.
2. The user names the segment US buyer static segmentation model
3. The user selects the segmentation level as Buyer and the time period as Trailing
7 days
4. The user selects US as the region of segmentation
5. Metrics for segmentation is selected as GMB i.e., gross merchandise bought
6. User selects the threshold as static and the model as ABCDE model
7. User enters the static threshold values minimum and maximum values for GMB
8. SSS model generator creates the segment definition and enters one or more
rows as needed into the Segment table.
9. SSS model generator accesses buyer metrics table to filter out all US region
buyers and aggregate the GMB for last 7 days.
10. SSS model generator applies the static thresholds and creates the segmented
user list.
11. Records for the given static thresholds are entered into the Segment threshold
table.
12. SSS model generator returns the SQL generated and the sample buyers with
their segmentation to the user of their newly defined segment.
13. User can choose to export the complete set of segmented users from Segment
users table or the generated SQL to provide it to the developers to schedule it to
run on specified time intervals (happens outside SSS model generator flow)
2.1.1.2
Creating a Seller Large MerchantsMerchantsEntrepreneur
Regulars Occasional segmentation model with trailing 7 days GMV
metrics & dynamic thresholds for UK region
1. User accesses the SSS model generator front end to create a new segment
through the user interface.
2. The user names the segment UK seller dynamic segmentation Large
MerchantsMerchantsEntrepreneur Regulars Occasional model.
3. The user selects the segmentation level as Seller and the time period as Trailing
7 days
2.1.1.3
Creating a Customer based customized segmentation model with
trailing 30 days page views metric & static thresholds for DE region
1. User accesses the SSS model generator front end to create a new segment
through the user interface.
2. The user names the segment DE customer based customized segmentation Top
viewerMedium viewerLow viewer model.
3. The user selects the segmentation level as Customer and the time period as
Trailing 30 days
4. The user selects DE as the region of segmentation
5. Metrics for segmentation is selected as Page views count.
6. User selects the threshold as static and the model as customized model.
7. The application prompted for the segment name and user provides 3 segments Top viewer, Medium viewer and Low viewer.
8. User enters the static thresholds for the three segments.
9. SSS model generator creates the segment definition and enters one or more
rows as needed into the Segment table.
10. SSS model generator accesses Buyer metrics table to filter out all DE region
buyers and aggregates the page view count for last 30 days.
11. Customer to user link table will be accessed to get the customer ids for the
buyers & the metrics are aggregated at customer level.
12. SSS model generator applies the static thresholds and creates the segmented
user list.
13. Records for the given static thresholds are entered into the Segment threshold
table.
14. SSS model generator returns the SQL generated and the sample buyers with
their segmentation to the user of their newly defined segment.
15. User can choose to export the complete set of segmented user or the generated
SQL to provide it to the developers to schedule it to run on specified time
intervals (happens outside SSS model generator flow)
2.4.2
10
2.5.1
Data sources
Data sources for this tool comprises of various data warehouse tables like
transactional data, listing data, Behavioral data, user feedback data and customer to
user linking data. Daily, the users metrics will be aggregated and stored in the user
11
metrics table. For a given user and given date, there would be a record in this table
with all the metrics combined. The customer to user linking from linking data is
stored in a separate table. Customer is the household entity. One customer can have
multiple user accounts. If the segmentation is at the customer level (a household
account), then the metrics at user level have to be aggregated at customer level
using the customer to user mapping table.
2.5.2
Segmentation platform
Segmentation platform is the frontend application that will be used to define the
segmentation models by the analysts using the even driven approach of guiding the
analyst with different options for the segmentation. Segmentation metadata will
store the segment information, segmented member information and segmentation
thresholds entered by the analyst. SQLs for the segmentation models could be
generated and exported using this application.
12
13
SSS_BUYER_MTRC_SUM
Buyer ID
Buyer country ID
Buyer Gross merchandise
bought
BUYER_ID
CNTRY_ID
Buyer revenue
BUYER_RVNU_AMT
Total purchases
TOT_PRCHS_COUNT
BUYER_POS_FDBK_COUNT
BUYER_NEUT_FDBK_COUNT
TOT_PAGE_VIEWS_COUNT
TOT_VI_COUNT
TOT_BID_COUNT
TOT_BIN_COUNT
TOT_WATCH_COUNT
TOT_OFFER_COUNT
TOT_ASQ_COUNT
SSS_SELLER_MTRC_SUM
Seller ID
Seller Country ID
Seller Gross merchandise sold
volume
SELLER_ID
CNTRY_ID
Seller revenue
SELLER_RVNU_AMT
Column name
CAL_DT
BUYER_GMB_AMT
BUYER_NEG_FDBK_COUNT
Column name
CAL_DT
SELLER_GMB_AMT
14
Data type
DATE
DECIMAL(18
,0)
SMALLINT
DECIMAL(18
,2)
DECIMAL(18
,2)
DECIMAL(18
,0)
DECIMAL(18
,0)
DECIMAL(18
,0)
DECIMAL(18
,0)
DECIMAL(18
,0)
DECIMAL(18
,0)
DECIMAL(18
,0)
DECIMAL(18
,0)
DECIMAL(18
,0)
DECIMAL(18
,0)
DECIMAL(18
,0)
Data type
DATE
DECIMAL(18
,0)
SMALLINT
DECIMAL(18
,2)
DECIMAL(18
,2)
SELLER_NEG_FDBK_COUNT
DECIMAL(18
,0)
DECIMAL(18
,0)
DECIMAL(18
,0)
DECIMAL(18
,0)
DECIMAL(18
,0)
SELLER_SHPNG_TIME_RTNG
BYTEINT
SELLER_SHPNG_COST_RTNG
BYTEINT
SELLER_ITEM_AS_DESC_RTNG
BYTEINT
SELLER_INTERACT_RATNG
BYTEINT
DECIMAL(18
,0)
TOT_QTY_SOLD_COUNT
Total transactions
TOT_TRANS_COUNT
SELLER_POS_FDBK_COUNT
SELLER_NEUT_FDBK_COUNT
TOT_ACTV_LSTG_COUNT
SSS_CUST_USER_LINK_TABLE
Customer ID
CUSTOMER_ID
User ID
USER_ID
Column name
Segment table
Column description
SSS_SGMNTN_DTL_FACT
Segment ID
Create Date
Update Date
SGMNTNN_ID
SGMNTNN_CRE_DATE
SGMNTN_UPD_DATE
SGMNTN_CRE_USER
SGMNTN_LVL_ID
Segment Period
SGMNTN_TIME_PERIOD
Segment Metrics
Segment Region
SGMNTN_MTRCS
SGMNTN_RGN_ID
Column name
15
Data type
DECIMAL(18
,0)
DECIMAL(18
,0)
Data type
DECIMAL(18,
0)
DATE
DATE
VARCHAR(30
)
BYTEINT
VARCHAR(20
)
VARCHAR(50
0)
SMALLINT
Threshold type
SGMNTN_THRSHLD_TYPE
Segment Code
SGMNTN_SQL
SSS_SGMNTN_MEMBER_DIM
Segment ID
Column name
Segment level
SGMNTN_ID
SGMNTN_LVL
Member ID
Segment Member Begin date
Segment Member End date
MEMBER_ID
SGMNTN_MEMBER_BEG_DT
SGMNTN_MEMBER_END_DT
SSA_THRSLD_VALUE_FACT
Segment ID
SGMNTN_ID
Segment Name
SGMNTN_NAME
THRSHLD_MTRC_NAME
Threshold Operator
THRSHLD_OPERATOR
MIN_THRSHLD_VALUE
MAX_THRSLD_VALUE
SSS_SGMNTN_LVL_DIM
Column name
Segment level ID
Column name
SGMNTN_LVL_ID
SGMNTN_LVL_DESC
BYTEINT
VARCHAR(50
00)
Data type
DECIMAL(18,
0)
CHAR(1)
DECIMAL(18,
0)
DATE
DATE
Data type
DECIMAL(18,
0)
VARCHAR(30
)
VARCHAR(50
)
VARCHAR(20
)
DECIMAL(18,
2)
DECIMAL(18,
2)
Data type
BYTEINT
VARCHAR(15
)
SSS_SGMNTN_RGN_DIM
Column name
SGMNTN_RGN_ID
16
Data type
SMALLINT
3.
SGMNTN_RGN_DESC
VARCHAR(30
)
SSS_SGMNTN_RGN_CNTRY_LKP
Column name
SGMNTN_RGN_ID
CNTRY_ID
CNTRY_NAME
Data type
SMALLINT
SMALLINT
VARCHAR(30
)
SSS_SGMNTN_THRSHLD_TYPE_DIM
Column name
SGMNT_THRSHLD_TYPE
SGMNT_THRSHLD_TYPE_DESC
17
Data type
BYTEINT
VARCHAR(30
)
4.
QA
4.1 QA Strategy
4.1.1
QA Scope
4.2 Functional QA
4.3 QA Assessment
5.
Application Screenshots
Below are the mock screenshots for the front end application.
18
19
6.
Future Extensibility
6.2 Guidelines
The following are tips and guidelines for Teradata ETL developers
As a Teradata ETL developer,
20
21
References
22