Académique Documents
Professionnel Documents
Culture Documents
FORECASTING
94-832: Business Intelligence & Data Mining SAS
TEAM 7
MITHUN MATHEW
MEAGAN MUSGRAVE
AKASH PATEL
RENU THOMAS
IVY YANG
Report
Team 7
Table of Contents
1
Introduction .......................................................................................................................................... 3
2.1
2.2
3.2
3.3
4.2
4.3
5.2
5.3
5.4
5.5
6.2
6.3
6.4
7.1
7.2
7.3
7.4
8.2
8.3
Report
9
10
Team 7
Appendix A .................................................................................................................................................. 32
Appendix B .................................................................................................................................................. 33
2|Page
Report
Team 7
1 Introduction
This project has been done for the fulfilment of the project requirement of the course 94-832: Business
Intelligence & Data Mining SAS. The data which formed part of our core analysis was the Walmart data
set obtained from Kaggle.
The data contained weekly sales of various departments within different stores over different period of
time. Most of the work put into the project evolves around staging the data for cleaning the data and
modelling around different parameters and methodologies.
Using different methodologies, clustering, regression and decision tree, different models were
generated and their errors were noted. Variables of importance were identified and clustering insights
were drawn.
Time serie analysis was done for hierarchical clustering on sales trends, and portrayed how each cluster
was different from each other. To predict the sales for the end of the year holiday season of 2012, time
series forecasting was used.
3|Page
Report
Team 7
2 Business Questions
2.1 Question One
Retailers face many challenges when trying to forecast sales due to several reasons: the scale of the
problem, the erratic sales at the each individual store, season changes, constant introduction of new
items, and repeated promotional activity [1]. In an attempt to eradicate these issues, retailers have
turned to large-scale demand-forecasting that is able to accommodate large amounts of transaction
data. By collecting these data, retailers can then mine it and project future customer behavior. The
ability to forecast at such on such a large scale allows retailers the opportunity to optimize their revenue
system, thus enabling better choices on promotions and pricing. For our project we take on this
challenge and attempt to correctly forecast sales at Walmart. Given the reputation Walmart has about
its competitive pricing structure, the ability to accurately project sales is key in its ability to function.
However, research out of the University of Michigan recently affirmed that clustering prior to
forecasting sales greatly increases the accuracy of forecasts [2]. By clustering stores based on sales, and
attributes such as average temperature, fuel prices, etc., stores can eliminate the need to control for
seasonal indices and classes (summer shoes versus winter shirts etc.). After applying hierarchical
clustering to the data we hope to determine which stores are similar, in terms of both sales and store
attributes, so that we can ascern which characteristics are key drivers and sales, thus allowing us to
generate more accurate forecasts.
4|Page
Report
Team 7
5|Page
Report
FROM
Team 7
Train JOIN Store_Features USING(Store, Week, IsHoliday);
For analytical purposes and visualization, the variables TEMPERATURE, FUEL_PRICE and WEEKLY_SALES
were categorized into the following classes: (Refer appendix A for SQL queries)
Condition
TEMPERATURE < 32
TEMPERATURE >= 32 AND TEMPERATURE < 64
TEMPERATURE >= 64 AND TEMPERATURE < 79
TEMPERATURE >= 79 AND TEMPERATURE < 95
TEMPERATURE > 95
TEMP_CLASS
Freezing
Cold
Comfortable
Hot
Extremely Hot
3-1: TEMP_CLASS
Condition
FUEL_PRICE < 2.75
FUEL_PRICE >= 2.75 AND FUEL_PRICE < 3.12
FUEL_PRICE > 3.12
FUEL_CLASS
Low
Medium
High
3-2: FUEL_CLASS
Condition
WEEKLY_SALES <= 0
WEEKLY_SALES > 0 AND WEEKLY_SALES <= 10000
WEEKLY_SALES > 10000 AND WEEKLY_SALES <= 25000
WEEKLY_SALES > 25000 AND WEEKLY_SALES <= 100000
WEEKLY_SALES > 100000
SALES_CLASS
Negative
Low
Medium
High
Very High
3-3: SALES_CLASS
To visualize the data from a better perspective, further categorical attributes were added, including the
HOLIDAY (Super Bowl, Labor Day, Thanksgiving, Christmas). The two weeks before each holiday was
set as (Before Super Bowl, Before Labor Day, Before Thanksgiving, Before Christmas).
Furthermore, unemployment and CPI were categorized into Low, Medium and High. Store size was
categorized to Small, Medium and Large. (Refer appendix B for SQL Queries)
6|Page
Report
Team 7
4 Exploratory Analysis
4.1 Top 10 Stores by Sales
The above chart shows the top 10 stores in terms of sales revenue and their percentage contribution to
the total sales generated between them. Store 20 was the highest contributor with a total of 301
Million. The stores are mix of 7 large sized and 3 medium sized stores. Together, these 10 stores
accounted for 39% of the revenue generated by the given 45 stores.
7|Page
Report
4.2
Team 7
The above figure shows the top 5 departments across the 3 store types namely A,B & C. Interestingly,
Department number 72 showed a significant hike in sales across store type A and B. Store type A
fetched the most sales whereas Store type C fetched the least sales.
8|Page
Report
Team 7
The above figure shows the pre-holiday sales registered by the 3 store types. The sales were the highest
before christmas followed by pre thanksgiving, pre labor day and pre super bowl sales. Store type A
registered the highest sales followed by store type B and store type C.
No strong relationships were clear from visualizing the weekly sales data with respect to the CPI and the
fuel price during that week.
9|Page
Report
Team 7
5-2: Clustering
Each of these clusters represents a group of stores that share similar values of each distinct attribute
that has been clustered around the store ID. Based on the initial results table, we can see that each
cluster has different averages across each attribute.
10 | P a g e
Report
Team 7
Comparing these averages via the input means plot allows us to draw conclusions about each individual
segment (see sections 5.2-5.4)
5-3: Clusters
11 | P a g e
Report
Team 7
Per the Variable Importance graph, CPI, unemployment rate, and store size are the top three important
variables when considering this cluster of stores.
Again, the consumer price index, unemployment rate, and store size are all important variables within
this cluster of stores. It appears that the same variables are important across Clusters A and B, but the
averages of each of the attributes differs slightly relative to the overall attribute averages.
12 | P a g e
Report
Team 7
This cluster of stores are grouped together because holidays have a large impact, with the variable
Holiday? dwarfing all other attribute values.
To sum, an initial clustering analysis reveals that different groups of stores have different relationships
with weekly sales depending on which cluster it belongs to. Holidays only appear to have an impact
within Cluster C, while the other attributes of interest are more relevant to Clusters B and C. We now
move onto our second method of unsupervised learning in an effort to test of the relationships seen
above are statistically significant.
13 | P a g e
Report
Team 7
6-1: Regression
In this model, we maintains all the variables (CPI, DEPT, FUEL_PRICE, ISHOLIDAY, MARKDOWN1-5,
STORE_SIZE, STORE_TYPE, TEMPERATURE, UNEMPLOYMENT), also we have WEEK as Time ID, STORE as
ID and WEEKLY_SALES as target. We firstly use Data Partition node to split the data into 70% as training
set and 30% as validation set. And then we set the selection model as stepwise, forward and backward
separately, with validation error as the selection criterion.
Effect
DEPT
STORE_SIZE
Pr > F
<.0001
<.0001
The result of stepwise and forward models are pretty similar. But the backward model gives a worse
result hence we take the stepwise result here, which usually gives the best solution. In this model, we
get the average square training error of 2.0121E8 and validation error of 2.0451E8. Although the plot
seems good especially at the beginging, the overall error statistic does not perform well. As we can see
from the Type 3 Analysis of Effects above, this result is caused by getting only two important variables in
this model at the end, which are DEPT and STORE_SIZE. This linear regression model contains all the
values of DEPT, which means the norminal values of department will affect the regression result deeply.
The average price of products in different departments may varies a lot. However, it does not make
sense to predict the sales only by looking at their departments. Also, STORE_SIZE contains large
numbers compared with other variables, it will cover the other variables effects and affect the accuracy
of model.
14 | P a g e
Report
Team 7
To improve the results, we imputed the missing values of MARKDOWN1 5, and take the log of each
interval variable to remove their skewed. Then we got a better model whose average square training
error is 1.9549E8, and average squer validation error is 1.9831E8. Also, this model seems make more
sence than the before one. More attributes are involved in this model.
From the screenshot of the model below, we can see the DEPT still has huge influence.
15 | P a g e
Report
Team 7
To reduce the negative affect of DEPT, we filter out the department variable. We make the similar
settings for all other variables and get the new result. However, the result seems even worse. We get
the average square training error of 4.842E8 and validation error of 4.881E8. It means, the department
in this dataset is really important. And if we want to make our model more accuracte, we need to keep
the department in our regression model.
16 | P a g e
Report
Team 7
17 | P a g e
Report
Team 7
Hence in the linear regression models, the first model (using full data) gives the best performance.
18 | P a g e
Report
Team 7
Number
Rules
of
Splitting
72.0
41.0
14.0
7.0
6.0
4.0
4.0
3.0
2.0
1.0
1.0
1.0
0.0
0.0
Importance
Validation Importance
1.0
0.4964515440958596
0.04084201073998688
0.34207810450886517
0.09548333258131239
0.03701878109843059
0.023291162585169126
0.012643930440875074
0.0037723260628764943
0.034734597871631134
0.004704913825175917
0.0017796209442395006
0.0
0.0
1.0
0.4964821958816832
0.04144053541125881
0.3480075010929593
0.096222212274534
0.031383248828117584
0.021652254529342163
0.01080177216385246
0.0011206103465749833
0.03245507329474942
0.00679092144395149
0.0
0.0
0.0
A two split decision tree was generated on the train dataset. The weekly sales classes which were
generated earlier were used as target classes. The model was heavily dependent on department (DEPT)
and store (STORE). Majority of the splitting rules were based on these two attributes. The two-way split
decision tree generated an average square error of 0.04222.
The WEEKLY_SALES is less dependent on the attribute ISHOLIDAY as opposed to the STORE_SIZE,
STORE_TYPE, UNEMPLOYMENT and TEMPERATURE. Looking at the data from a broader perspective, the
location of the store played a major factor in the weekly sales. A store located in a densely populated
urban area would have more sales as opposed to one in a rural area, regardless of the week being a
holiday or not. The holiday sales in a store located far off from the city might still be less compared to
the average sales in a store located in the city on a day which is not a public holiday. Stores in the cities
would be larger and would have larger amount of sales. To explore this scenario another approach was
pursued. (Refer section 4)
19 | P a g e
Report
Team 7
Number of Splitting
Rules
150.0
73.0
39.0
69.0
49.0
11.0
43.0
42.0
29.0
4.0
4.0
4.0
3.0
4.0
Importance
Validation Importance
1.0
0.7197389364808112
0.16985483324111295
0.13354907518567805
0.12228743722593886
0.10896237311981295
0.0870216497060246
0.0838513980255293
0.055805194471475056
0.02260829885174643
0.01614486406343703
0.014878670018523063
0.014085153665462117
0.012841667939644022
1.0
0.7172217848271354
0.17565253608886602
0.11875136590034423
0.1227097368514723
0.1091215150571583
0.08140314276386681
0.07108265278770555
0.045466506369912014
0.01494100486542834
0.013490146227052534
0.013747824483138145
0.008854538152353212
0.016799803214149107
In terms of variable importance, DEPT and STORE were the most important variables. However, the
three-way split provided more flexibility to the model in terms or decision making and hence the errors
in classifying them into the weekly sales classes, were less as expected. The average square error was
found to be 0.02765.
20 | P a g e
Report
Team 7
21 | P a g e
Report
Team 7
To observe how the weekly sales are dependent on the other features in the dataset, information on the
department and store ID was rejected. The data was filtered such that the classes NEGATIVE and VERY
HIGH WEEKLY_SALES were filtered out. The data was further sampled such that all the remaining
classes, LOW, MEDIUM and HIGH WEEKLY_SALES had the same number of observations.
The decision trees modeled on this data returned results as expected: The STORE_SIZE was one of the
major factors that determined the weekly sales and hence ended up as the most important variable for
splitting nodes.
Variable
Importance
STORE_SIZE
CPI
UNEMPLOYMENT
STORE_TYPE
MARKDOWN3
FUEL_PRICE
TEMPERATURE
MARKDOWN2
MARKDOWN1
MARKDOWN5
ISHOLIDAY
MARKDOWN4
Number of Splitting
Rules
16.0
4.0
5.0
3.0
2.0
1.0
1.0
0.0
0.0
0.0
0.0
0.0
Importance
Validation Importance
1.0
0.2651285102856137
0.24441028209455445
0.15295892375537554
0.052709576258346755
0.022610228455857168
0.02168392343651502
0.0
0.0
0.0
0.0
0.0
1.0
0.24717493036369145
0.20442631071106554
0.15072179548628872
0.0315604300793954
0.012600386988009991
0.010275651333624921
0.0
0.0
0.0
0.0
0.0
However, the average square error for both trees (two-way split and three-way split) turned out to be
0.211, hence producing nodes with lower levels of purities for the tree.
The following table summarizes the decision tree models that were generated for the WALMART_TRAIN
dataset.
Average Squared Error
Two-way Split
Three-way Split
Two-way Split without DEPT and STORE
Two-way & Three-way Split on Sampled
Data
0.042
0.028
0.112
0.211
22 | P a g e
Report
Team 7
The setting up of this structure allowed flexibility in visualizing data on aggregation over different
dimensions.
The following plot shows the weekly sales for 100 of the store department combination. It is quite
evident from the plot that the sales was recorded high during the holiday seasons: Christmas in
December and before summer in May. Other notable peaks in sales was during Thanksgiving in
November, Superbowl in February and Labor Day in September.
23 | P a g e
Report
Team 7
For further analysis, the mean weekly sales for each store as well as each department was plotted. Some
of the departments had very high average weekly sales compared to the others. These departments
although not mentioned by Walmart for privacy purposes, might be the departments which sell
products required by people on a day to day basis like groceries; or high grossing departments like
electronics, etc.
24 | P a g e
Report
Team 7
25 | P a g e
Report
Team 7
Based on the values of the different input variables such as CPI, UNEMPLOYMENT, ISHOLIDAY,
TEMPERATURE and different MARKDOWN values, the time series inputs were used for clustering. The
clustering mechanism used mean squared error between the total weekly sales of the stores as the
similarity measure.
The following dendogram shows the distance between the different clusters that were generated.
Based on the minimum distance between clusters, at a value of 0.1 distance, three main clusters were
generated. Stores 7, 16, 17, 38 and 44 were clustered together as cluster A. Stores 28, 30, 33, 36, 37, 42
and 43 were clustered together as cluster B. And the rest of the stores belonged to cluster C. The
features of these clusters became more evident during the forecasting process.
26 | P a g e
Report
Team 7
The following graph shows how the different stores were clustered in terms of their trends on weekly
sales based on the trends of other attributes.
The features of these clusters became more evident when the trends in sales for the stores were
analyzed. Stores from the same cluster showed similar trends in weekly sales.
For each store, different models were used to forecast the sales. The model with the least standard
error was automatically selected as the best model for forecasting sales for that store. The additive
winters model and seasonal models proved to be the best fit for most stores. The following table
illustrates which model was used for each store, and the paaremeter estimate and the associated
standard error.
27 | P a g e
Report
Team 7
Time Series ID
1.0
1.0
1.0
2.0
2.0
2.0
3.0
3.0
3.0
4.0
4.0
4.0
5.0
5.0
5.0
6.0
6.0
7.0
7.0
7.0
8.0
8.0
8.0
9.0
9.0
9.0
Store
1.0
1.0
1.0
2.0
2.0
2.0
3.0
3.0
3.0
4.0
4.0
4.0
5.0
5.0
5.0
6.0
6.0
7.0
7.0
7.0
8.0
8.0
8.0
9.0
9.0
9.0
Model
ADDWINTERS
ADDWINTERS
ADDWINTERS
WINTERS
WINTERS
WINTERS
ADDWINTERS
ADDWINTERS
ADDWINTERS
ADDWINTERS
ADDWINTERS
ADDWINTERS
ADDWINTERS
ADDWINTERS
ADDWINTERS
SEASONAL
SEASONAL
ADDWINTERS
ADDWINTERS
ADDWINTERS
ADDWINTERS
ADDWINTERS
ADDWINTERS
ADDWINTERS
ADDWINTERS
ADDWINTERS
Parameter
LEVEL
SEASON
TREND
LEVEL
SEASON
TREND
TREND
SEASON
LEVEL
LEVEL
TREND
SEASON
TREND
SEASON
LEVEL
SEASON
LEVEL
LEVEL
SEASON
TREND
SEASON
LEVEL
TREND
LEVEL
TREND
SEASON
Parameter Estimate
0.0034631198964860067
0.6055475095192919
0.001
0.16545582491041058
0.921247729943568
0.001
0.001
0.6250302553028629
0.18422291502403845
0.08795186376589817
0.001
0.7130415339734698
0.001
0.5774493510072217
0.1367460310044868
0.7157152817395214
0.12046874034834318
0.1653571721391764
0.7697969237235239
0.001
0.6850143638004
0.0662748336317954
0.001
0.17980061535722747
0.001
0.814285370219351
Standard Error
0.00441693090198638
0.040597985445483306
0.002625959422274718
0.02507185069755535
0.05853998565921424
0.009608906293211008
0.004528332723662851
0.04869015651345081
0.025950895825345977
0.019526521663305617
0.006350001834242006
0.04124489642150956
0.01145787934036644
0.04518008362353768
0.023629244732261773
0.04219586666398573
0.016873522037056683
0.020244620448452346
0.050169815195813205
0.004356459668509099
0.04029437117033705
0.013963823601356068
0.005333437425109881
0.02424450837665769
0.030250497941758804
0.05696269299246877
Based on the models, the weekly sales of each store was forecasted for 12 weeks, covering the holiday
season in December (the forecasted sales are shown after the vertical line on the graph). The following
graph shows the forecasted sales of a store that is doing fairly well. Store 1 is a store from the cluster B.
All stores in the cluster show a similar trends very high peak of sales during Christmas.
The following graph shows the sales for Store 7. The store show a good amount of sales from May to
September and from November to January. This store could have good potential growth in the future.
This store was selected from cluster A. All stores in this cluster have similar trend, which brings in a
steady amount of income in addition to higher sales during holidays. These can be considered as stores
with steady growth rates.
28 | P a g e
Report
Team 7
The following graph shows the sales for Store 36, which Walmart should focus on. The store has been
losing out on sales and is likely to go out of business over the next couple of years. The total sales for the
store decreased by half over a period of 2 years. Store 36 was taken off from cluster C. Stores from this
cluster generally showed a declining trend.
29 | P a g e
Report
Team 7
9 Business Implications
Based on the analysis made, the Walmart should hire personnel a few weeks before the holiday seasons,
especially Thanksgiving and Christmas. This allows them to perform better when the sales go up
gradually as the holidays get closer.
Using the cluster information from the section 8.2 can be used in conjunction with sales forecasting to
come up with more accurated prediction.
Wal-mart should keep a close eye on the stores which are running out of business. Also provide an
incentive to other stores to improve their sales, and hire the right sales representatives.
30 | P a g e
Report
Team 7
10 References
[1] M. Gilliland, "Demand Forecasting in Retail," [Online]. Available:
http://www.sas.com/news/feature/retail/aug06forecast.html.
[2] M. K. &. R. R. Nitin Patel, "Clustering Models to Improve Forecasts in Retails Merchandising,"
[Online]. Available: http://www.cytel.com/Papers/INFORMS_Prac_%2004.pdf.
[3] L. C.-L. &. R. Dudley, "Wal-Mart Sees Profit at Low End of Forecast," [Online]. Available:
http://www.bloomberg.com/news/2014-01-31/wal-mart-sees-profit-at-low-end-of-forecast.html.
[4] R. Dudley, "Wal-Mart Cuts Annual Sales Forecast as Supercenters Struggle," [Online]. Available:
http://www.businessweek.com/news/2014-10-16/wal-mart-cuts-annual-sales-forecast-as-itssupercenters-.
[5] "Kaggle - Walmart Recruiting - Stores Sales Forecasting," [Online]. Available:
https://www.kaggle.com/c/walmart-recruiting-store-sales-forecasting.
[6] T. L. Sascha Schubert, "TIme Series Data Mining with SAS Enterprise Miner," [Online]. Available:
http://support.sas.com/resources/papers/proceedings11/160-2011.pdf.
[7] S. J. Satyajit Dwivedi, "Time-series Data Mining," [Online]. Available:
http://www.iasri.res.in/sscnars/data_mining/10SAS%20Enterprise%20Miner%207.1%20Time%20Series%20Data%20Mining.pdf.
31 | P a g e
Report
Team 7
Appendix A
ALTER TABLE WALMART_TRAIN
ADD TEMP_CLASS VARCHAR2(15);
UPDATE WALMART_TRAIN
SET
TEMP_CLASS = (CASE
WHEN
WHEN
WHEN
WHEN
WHEN
ELSE
END);
TEMPERATURE
TEMPERATURE
TEMPERATURE
TEMPERATURE
TEMPERATURE
NULL
< 32 THEN
>= 32 AND
>= 64 AND
>= 79 AND
> 95 THEN
'Freezing'
TEMPERATURE < 64 THEN 'Cold'
TEMPERATURE < 79 THEN 'Comfortable'
TEMPERATURE < 95 THEN 'Hot'
'Extremely Hot'
-- http://www.gasbuddy.com/gb_gastemperaturemap.aspx
ALTER TABLE WALMART_TRAIN
ADD FUEL_CLASS VARCHAR2(15);
UPDATE WALMART_TRAIN
SET
FUEL_CLASS = (CASE
WHEN
WHEN
WHEN
ELSE
END);
-- http://www.statisticbrain.com/wal-mart-company-statistics/
ALTER TABLE WALMART_TRAIN
ADD SALES_CLASS VARCHAR2(15);
UPDATE WALMART_TRAIN
SET
SALES_CLASS = (CASE
WHEN
WHEN
WHEN
WHEN
WHEN
ELSE
END);
WEEKLY_SALES
WEEKLY_SALES
WEEKLY_SALES
WEEKLY_SALES
WEEKLY_SALES
NULL
32 | P a g e
Report
Team 7
Appendix B
CREATE TABLE WALMART_TRAIN_HOLIDAY
AS
SELECT *
FROM
WALMART_TRAIN;
ALTER TABLE WALMART_TRAIN_HOLIDAY
ADD HOLIDAY VARCHAR2(25);
UPDATE WALMART_TRAIN_HOLIDAY
SET HOLIDAY ='Super Bowl'
WHERE WEEK IN (TO_DATE('12-Feb-10', 'DD-Mon-RR'), TO_DATE('11-Feb-11', 'DD-Mon-RR'), TO_DATE('10Feb-12', 'DD-Mon-RR'), TO_DATE('08-Feb-13', 'DD-Mon-RR'));
UPDATE
WALMART_TRAIN_HOLIDAY
SET HOLIDAY ='Labor
Day'
WHERE WEEK IN (TO_DATE('10-Sep-10', 'DD-Mon-RR'), TO_DATE('09-Sep-11', 'DD-Mon-RR'), TO_DATE('07Sep-12', 'DD-Mon-RR'), TO_DATE('06-Sep-13', 'DD-Mon-RR'));
UPDATE
WALMART_TRAIN_HOLIDAY
SET HOLIDAY
='Thanksgiving'
WHERE WEEK IN (TO_DATE('26-Nov-10', 'DD-Mon-RR'), TO_DATE('25-Nov-11', 'DD-Mon-RR'), TO_DATE('23Nov-12', 'DD-Mon-RR'), TO_DATE('29-Nov-13', 'DD-Mon-RR'));
UPDATE
WALMART_TRAIN_HOLIDAY
SET HOLIDAY
='Christmas'
WHERE WEEK IN (TO_DATE('31-Dec-10', 'DD-Mon-RR'), TO_DATE('30-Dec-11', 'DD-Mon-RR'), TO_DATE('28Dec-12', 'DD-Mon-RR'), TO_DATE('27-Dec-13', 'DD-Mon-RR'));
UPDATE WALMART_TRAIN_HOLIDAY
SET HOLIDAY ='Before Super Bowl'
WHERE (WEEK BETWEEN (TO_DATE('12-Feb-10', 'DD-Mon-RR') - 14) AND TO_DATE('12-Feb-10', 'DD-Mon-RR'))
OR (WEEK BETWEEN (TO_DATE('11-Feb-11', 'DD-Mon-RR') - 14) AND TO_DATE('11-Feb-11', 'DD-Mon-RR'))
OR (WEEK BETWEEN (TO_DATE('10-Feb-12', 'DD-Mon-RR') - 14) AND TO_DATE('10-Feb-12', 'DD-Mon-RR'))
OR (WEEK BETWEEN (TO_DATE('08-Feb-13', 'DD-Mon-RR') - 14) AND TO_DATE('08-Feb-13', 'DD-Mon-RR'));
UPDATE
WALMART_TRAIN_HOLIDAY
SET HOLIDAY ='Before Labor Day'
WHERE (WEEK BETWEEN (TO_DATE('10-Sep-10', 'DD-Mon-RR') - 14) AND
TO_DATE('10-Sep-10', 'DD-Mon-RR'))
33 | P a g e
Report
OR (WEEK BETWEEN (TO_DATE('09-Sep-11', 'DD-Mon-RR') - 14) AND
OR (WEEK BETWEEN (TO_DATE('07-Sep-12', 'DD-Mon-RR') - 14) AND
OR (WEEK BETWEEN (TO_DATE('06-Sep-13', 'DD-Mon-RR') - 14) AND
Team 7
TO_DATE('09-Sep-11', 'DD-Mon-RR'))
TO_DATE('07-Sep-12', 'DD-Mon-RR'))
TO_DATE('06-Sep-13', 'DD-Mon-RR'));
UPDATE
WALMART_TRAIN_HOLIDAY
SET HOLIDAY ='Before Thanksgiving'
WHERE (WEEK BETWEEN (TO_DATE('26-Nov-10', 'DD-Mon-RR') - 14) AND TO_DATE('26-Nov-10', 'DD-Mon-RR'))
OR (WEEK BETWEEN (TO_DATE('25-Nov-11', 'DD-Mon-RR') - 14) AND TO_DATE('25-Nov-11', 'DD-Mon-RR'))
OR (WEEK BETWEEN (TO_DATE('23-Nov-12', 'DD-Mon-RR') - 14) AND TO_DATE('23-Nov-12', 'DD-Mon-RR'))
OR (WEEK BETWEEN (TO_DATE('29-Nov-13', 'DD-Mon-RR') - 14) AND TO_DATE('29-Nov-13', 'DD-Mon-RR'));
UPDATE
WALMART_TRAIN_HOLIDAY
SET HOLIDAY ='Before Christmas'
WHERE (WEEK BETWEEN (TO_DATE('31-Dec-10', 'DD-Mon-RR') - 14) AND TO_DATE('31-Dec-10', 'DD-Mon-RR'))
OR (WEEK BETWEEN (TO_DATE('30-Dec-11', 'DD-Mon-RR') - 14) AND TO_DATE('30-Dec-11', 'DD-Mon-RR'))
OR (WEEK BETWEEN (TO_DATE('28-Dec-12', 'DD-Mon-RR') - 14) AND TO_DATE('28-Dec-12', 'DD-Mon-RR'))
OR (WEEK BETWEEN (TO_DATE('27-Dec-13', 'DD-Mon-RR') - 14) AND TO_DATE('27-Dec-13', 'DD-MonRR'));
UPDATE WALMART_TRAIN_HOLIDAY
SET HOLIDAY ='Not Holiday'
WHERE HOLIDAY IS NULL;
ALTER TABLE WALMART_TRAIN_HOLIDAY
ADD STORE_SIZE_CLASS VARCHAR2(10);
UPDATE WALMART_TRAIN_HOLIDAY
SET
STORE_SIZE_CLASS = CASE
WHEN STORE_SIZE < 100000 THEN 'Small'
WHEN STORE_SIZE >= 100000 AND STORE_SIZE < 200000 THEN 'Medium'
WHEN STORE_SIZE >= 200000 THEN 'Large'
END;
ALTER TABLE WALMART_TRAIN_HOLIDAY
ADD UNEMPLOYMENT_CLASS VARCHAR2(10);
UPDATE WALMART_TRAIN_HOLIDAY
SET
UNEMPLOYMENT_CLASS = CASE
WHEN UNEMPLOYMENT < 7 THEN 'Low'
WHEN UNEMPLOYMENT >= 7 AND UNEMPLOYMENT < 11 THEN 'Medium'
WHEN UNEMPLOYMENT >= 11 THEN 'High'
END;
ALTER TABLE WALMART_TRAIN_HOLIDAY
ADD CPI_CLASS VARCHAR2(10);
UPDATE WALMART_TRAIN_HOLIDAY
SET
CPI_CLASS = CASE
WHEN CPI < 159 THEN 'Low'
WHEN CPI >= 159 AND UNEMPLOYMENT < 192 THEN 'Medium'
WHEN CPI >= 192 THEN 'High'
END;
34 | P a g e
Report
Team 7
UPDATE WALMART_TRAIN_HOLIDAY OH
SET
DEPT_CLASS = 'Medium Sales'
WHERE DEPT IN ( SELECT DEPT
FROM
( SELECT DEPT, MEDIAN(WEEKLY_SALES) MD
FROM
WALMART_TRAIN_HOLIDAY
GROUP BY DEPT)
WHERE MD > = 20000 AND MD < 40000);
UPDATE WALMART_TRAIN_HOLIDAY OH
SET
DEPT_CLASS = 'High Sales'
WHERE DEPT IN ( SELECT DEPT
FROM
( SELECT DEPT, MEDIAN(WEEKLY_SALES) MD
FROM
WALMART_TRAIN_HOLIDAY
GROUP BY DEPT)
WHERE MD > = 40000);
35 | P a g e