Vous êtes sur la page 1sur 4

1. Imagine you are given a plot of land to farm that is 10,000 acres.

You know nothing


about the land. You have 4 seasons a year in which you can plant and 20 different
crops to test out to find the optimal planting time, crop and location. Each crop has a
different price it sells for, the same cost to farm and a different output depending on
the plot of land/time of year in which it is planted. You have the money to run up to
200 tests (one test is planting one acre with one crop in one season) per year for 10
years. How would you build an algorithm that would suggest the best 200 tests for
each year/season period to maximize your profit [output x (price cost)]? Please
provide pseudocode for this exercise along with comments/explanation.

This entire problem can be solved by using Multi-arm bandits from reinforcement learning.
The principle behind this problem is given N different arms or options to run the tests to
choose from, each with an unknown reward, what strategy should we use to explore and
learn the values of each arm or option, while exploiting our current knowledge to maximize
the profit.
We also used an epsilon-greedy algorithm where a fraction (1-epsilon) of the time, we
choose the arm with the largest estimated value (exploit) and the other fraction (epsilon)
chose a random arm (explore). We need to tune the epsilon to balance tradeoff.

class Bandit:
def __init__(self,keys,init,epsilon=0.1):
self.epsilon = epsilon
self.ActionValue = {}
for key in keys:
self.ActionValue[key] = init

def get_reward(self,action):
if ActionValue[action][1]*(price-cost) > ActionValue[action][0]*(price-cost)
return 1
else:
return 0

def choose_action(self):
"""
For 1-epsilon of the time, choose the action with the highest estimated value.
For epsilon of the time, randomly choose an action
"""
random_num = np.random.rand()
if random_num<self.epsilon:
return random.choice(self.ActionValue.keys())
else:
return max(self.ActionValue, key=lambda x:self.ActionValue.get(x)[1])

def update(self,action,reward):
"""
Update estimated value by keeping running average of rewards for each action
"""
K = self.ActionValue[action][0]
Value = self.ActionValue[action][1]
K += 1
alpha = 1./K
Value += alpha * (reward - Value)
self.ActionValue[action] = (K,Value)

A single experiment consists of pulling the arm 10,000 times for a given 200
experiments.

def experiment(bandit,Npulls,epsilon):
history = []
for i in range(Npulls):
action = bandit.choose_eps_greedy(epsilon)
R = bandit.get_reward(action)
bandit.update_est(action,R)
history.append(R)
return np.array(history)

Nexp = 200
Npulls = 10000

avg_outcome_eps0p0 = np.zeros(Npulls)
avg_outcome_eps0p01 = np.zeros(Npulls)
avg_outcome_eps0p1 = np.zeros(Npulls)

for i in range(Nexp):
bandit = Bandit()
avg_outcome_eps0p0 += experiment(bandit,Npulls,0.0)
bandit = Bandit()
avg_outcome_eps0p01 += experiment(bandit,Npulls,0.01)
bandit = Bandit()
avg_outcome_eps0p1 += experiment(bandit,Npulls,0.1)

avg_outcome_eps0p0 /= np.float(Nexp)
avg_outcome_eps0p01 /= np.float(Nexp)
avg_outcome_eps0p1 /= np.float(Nexp)

2. What is your favorite machine learning algorithm for predicting a user action in an
app (assume theaction choice space is discrete and you have access to all the app
interaction data). Why? What are the disadvantages of using this algorithm and how
could they be overcome or lessened?

Predicting user action in an app can be modeled using Dynamic Bayesian Networks
which extend the standard Bayesian Networks with the concept of time.
It is a generalization of hidden markov models and kalman filters and is more compacted
and better understandable. DBN is a graphical model where the nodes of the DBN
represent the dimension sof the system and the arcs represent the conditional
independencies.
There are 4 phases for model induction in DBN:
1) Identifying domain variables: Eg: User actions and user locations
2) Dependancies among domain variances: Eg: If there is a location based data, it could
be dependancies between the actions and the locations
3) Estimation of Conditional probability distributions
4) Procedurally develop the belief update

One of the major disadvantage is Not all possible configurations of variables can be
observed, so the conditional probabilities of some variables cannot be easily
computed.
To avoid zero probabilities, small number called the flattening constant is added to each
cell in sparse conditional probability tables.

3. How would you visualize feature weights for a high-dimensional, iterative,


predictive algorithm?
Can you provide an example (please cite your source if it is not your own
visualization)? How would you explain the weight change over iterations to a
marketing manager?

Neural networks are a great analytic tool for predictive modeling like finding which kind
of customers are more likely to respond to a marketing campaign. The NeuralNetTools
package in R provides interesting visualization of the feature weights and the hidden
layers which are captured in the algorithm.

Breaking down to a marketing manager:

Neural networks work like the brain where we get inputs in the form of for example :
different food items and there is a hidden layer which processes that which is the brain
and gives an output which is whether you liked the taste or not.
The hidden layer or the brain does a few things to the input food like assigning weights
which is a preference for some food and then calculate the output. Once output is
achieved, our brain compares to the result which we should have achieved based on
previous experience with a similar food and adjusts the weights accordingly. This process
keeps on working till each food is classified correctly.
4. What decision rule would you use for stopping a Bayesian AB test? Why would you
use this rule and what are its advantages and disadvantages? Please provide
pseudocode for how you would implement the statistical calculations for this rule.

Source: http://doingbayesiandataanalysis.blogspot.com.au/2013/11/optional-stopping-in-
data-collection-p.html

The rule that can be used to stopping a Bayesian A/B test while being concerned about
removing peaking effects is to use Precision method where
1) Given the minimum effect threshold t, we can define a region of practical equivalence
(ROPE) to zero difference as the interval [-t,t].
2) We compare the ROPE to the 95% Bayesian highest density interval (HDI) of the
distribution of the differences between A and B.
3) We then set the precision and the stop is effective if the HDI is narrower than the
precision multiplied by the width of the ROPE.

This rule has some distinct advantages over other commonly used like Bayes
factor or Bayesian HDI without the precision that they give biased estimates of the
parameter value or generally over estimate the parameter value when the null is not true
as they stop at extreme values.
The disadvantage of the rule is that there is a requirement for a faily large sample and the
curves might lead misleadingly over a larger sample size even when the data collection
has stopped.

Vous aimerez peut-être aussi