Vous êtes sur la page 1sur 15

hamzehal.blogspot.

com

http://hamzehal.blogspot.com/2014/06/adaboost-sparse-input-support.html

Hamzeh Alsalhi - Software Projects


Now at the end of this GSoC I have contributed four pull requests that have been merged into the code base. There is
one planed pull request that has not been started and another pull request nearing its final stages. The list below
gives details of each pull request and what was done or needs to be done in the future.
This GSoC has been an excellent experience. I wan't to thank the members of the scikit-learn community, most of all
Vlad, Gael, Joel, Oliver, and my mentor Arnaud, for their guidance and input which improved the quality of my projects
immeasurably.

Sparse Input for Ensemble Methods


PR #3161 - Sparse Input for AdaBoost
Status: Completed and Merged
Summary of the work done: The ensemble/weighted_boosting class was edited to avoid densifying the input data
and to simply pass along sparse data to the base classifiers to allow them to proceed with training and prediction on
sparse data. Tests were written to validate correctness of the AdaBoost classifier and AdaBoost regressor when using
sparse data by making sure training and prediction on sparse and dense formats of the data gave identical results, as
well verifying the data remained in sparse format when the base classifier supported it. Go to the AdaBoost blog
post to see the results of sparse input with AdaBoost visualized.
PR - Sparse input Gradient Boosted Regression Trees (GBRT)
Status: To be started
Summary of the work to be done: Very similar to sparse input support for AdaBoost, the classifier will need
modification to support passing sparse data to its base classifiers and similar tests will be written to ensure
correctness of the implementation. The usefulness of this functionality depends on the sparse support for decision
trees which is a pending mature pull request here PR #3173.

Sparse Output Support


PR #3203 - Sparse Label Binarizer
Status: Completed and Merged
Summary of the work done: The label binarizing function in scikit-learns label code was modified to support
conversion from sparse formats and helper functions to this function from the utils module were modified to be able to
detect the representation type of the target data when it is in sparse format. Read about the workings of the label
binarizer.
PR #3276 - Sparse Output One vs. Rest
Status: Completed and Merged
Summary of the work done: The fit and predict functions for one vs. rest classifiers modified to detect sparse target
data and handle it without densifying the entire matrix at once, instead the fit function iterates over densified columns
of the target data and fits an individual classifier for each column and the predict uses binarizaion on the results from
each classifier individually before combining the results into a sparse representation. A test was written to ensure that
classifier accuracy was within a suitable range when using sparse target data.
PR #3438 - Sparse Output Dummy Classifier
Status: Completed and Merged
Summary of the work done: The fit and predict functions were adjusted to accept the sparse format target data. To

reproduce the same behavior of prediction on dense target data first a sparse class distribution function was written to
get the classes of each column in the sparse matrix, second a random sampling function was created to provide a
sparse matrix of randomly drawn values from a user specified distribution. Read the blog post to see detailed results
of the sparse output dummy pull request.
PR #3350 - Sparse Output KNN Classifier
Status: Nearing Completion
Summary of the work done: In the predict function of the classifier the dense target data is indexed one column at a
time. The main improvement made here is to leave the target data in sparse format and only convert a column to a
dense array when it is necessary. This results in a lower peak memory consumption, the improvement is proportional
to the sparsity and overall size of the target matrix.

Future Directions
It is my goal for the Fall semester to support the changes I have made to the scikit-learn code base the best I can. I
also hope to see myself finalize the remaining two pull requests.
The Scikit-learn dummy classifier is a simple way to get naive predictions based only on the target data of your
dataset. It has four strategies of operation.
constant - always predict a value manually specified by the use
uniform - label each example with a label chosen uniformly at random from the target data given
stratified - label the examples with the class distribution seen in the training data
most-frequent - always predict the mode of the target data
The dummy classifier has built in support for multilabel-multioutput data. I have made a pull request #3438 this week
that has introduced support for sparsely formatted output data. This is useful because memory consumption can be
vastly improved when the data is highly sparse. Below a benchmark these changes with two memory consumption
results graphed for each of the four strategies, once in with sparsely formatted target data and once with densely
formatted data as the control.

Benchmark and Dataset


I used the Eurlex eurovoc dataset available here in libsvm format for use with the following script. The benchmark
script will let you recreate the results in this post easily. When run with the python module memory_profiler it
measures the total memory consumed when doing an initialization of a dummy classifier, along with a fit and predict
on the Eurlex data.
The dataset used has approximately 17,000 samples and 4000 classes for the training target data, and the test data
is similar. They both have sparsity of 0.001.

Results Visualized
Constant Results: Dense 1250 MiB, Sparse 300 MiB
The constants used in the fit have a level of sparsity similar to the data because they were chosen as an arbitrary row
from the target data.

Uniform Results: Dense 1350 MiB, Sparse 1200 MiB

Stratified Results: Dense 2300 MiB, Sparse 1350 MiB

Most-Frequent Results: Dense 1300 MiB, Sparse 300 MiB

Conclusions
We can see that in all cases expect for Uniform we get significant memory improvements by supporting sparse
matrices. The sparse matrix implementation for uniform is not useful because of the dense nature of the output even
when the input shows high levels of sparsity. It is possible this case will be revised to warn the user or even throw an
error.

Remaining Work
There is work to be done on this pull request to make the predict function faster in the stratified and uniform cases
when using sparse matrices. Although the uniform cases is not important in itself the underlying code for generating
sparse random matrices is used in the stratified case. Any improvements to uniform will come for free is the stratified
case speed is improved.
Another upcoming focus is to return to the sparse output knn pull request and make some improvements. There will
be code written in the sparse output dummy pull request for gathering a class distribution from a sparse target matrix
that can be abstracted to a utility function and will be reusable in the knn pull request.
Going forward with sparse support for various classifiers I have been working on a pull request for sparse one vs. rest
classifiers that will allow for sparse target data formats. This will results in a significant improvement in memory usage
when working with large amount of sparse target data, a benchmark is given bellow to measure the. Ultimately what
this means for users is that using the same amount of system memory it will be possible to train and predict with a ovr
classifier on a larger target data set. A big thank you to both Arnaud and Joel for the close inspection of my code so far
and the suggestions for improving it!

Implementation
The One vs. Rest classier works by binarizing the target data and fitting an individual classifier for each class. The
implementation of sparse target data support improves memory usage because it uses a sparse binarizer to give a
binary target data matrix that is highly space efficient.
By avoiding a dense binarized matrix we can slice the one column at a time required for a classifier and densify only
when necessary. At no point will the entire dense matrix be present in memory. The benchmark that follows illustrates
this.

Benchmark
A significant part of the work on this pull request has involved devising benchmarks to validate intuition about the
improvements provided, Arnaud has contributed the benchmark that is presented here to showcase the memory
improvements.
By using the module memory_profiler we can see how the fit and predict functions of the ovr classifier affect the
memory consumption. In the following examples we initialize a classifier and fit it to the train dataset we provide in
one step, then we predict on a test dataset. We first run a control benchmark which shows the state of one vs. rest
classifiers as they are without this pull request. The second benchmark repeats the same steps but instead of using

dense target data it passes the target data to the fit function in a sparse format.
The dataset used is generated with scikit-learns make multilabel classification, and is generated with the following
call:
from sklearn.datasets import make_multilabel_classification
X, y = make_multilabel_classification(sparse=True, return_indicator=True,
n_samples=20000, n_features=100,
n_classes=4000, n_labels=4,
random_state=0)

This results in a densely formatted target dataset with a sparsity of about 0.001

Control Benchmark
est = OneVsRestClassifier(MultinomialNB(alpha=1)).fit(X, y)
consumes 179.824 MB

est.predict(X)
consumes -73.969 MB. The negative value indicates that data has been deleted from memory.

Sparse OvR PR Benchamrk


est = OneVsRestClassifier(MultinomialNB(alpha=1)).fit(X, y)
consumes 27.426 MB

est.predict(X)
consumes 0.180 MB

Improvement
Considering the memory consumption for each case as 180 MB and 30 MB we see a 6x improvement in peak
memory consumption with the data set we benchmarked.

Upcoming Pull Request


The next focus in my summer of code after finalizing the sparse one vs. rest classifier will be to introduce sparse
target data support for the knn and dummy classifiers which have built in support for multiclass target data. I have
begun the knn pull request here. Implementing a row wise mode calculation for sparse matrices will be the main
challenge of the knn PR.
The first half of my summer of code has resulted in the implementation of just under half of my goals, one of the
planned six pull requests has been completely finalized, two are on their way to being finalized, and two of the
remaining three will be started shortly. The final pull request is a more independent feature which I aim to start as
soon as I am confident the others are close to being wrapped up.

Thank you Arnaud, Joel, Oliver, Noel, and Lars for the time taken to give the constructive criticism that has vastly
improved my implementation of these changes to the code base.

Sparse Input for Ensemble Methods


Gradient Boosted Regression Trees is the one remaining ensemble method that needs work before all of scikit-learns
ensemble classifiers support sparse input. The first pull request I made this summer was for sparse input on the
AdaBoost ensemble method. The AdaBoost pull request was merged, after AdaBoost sparse input support was
completed I have skipped to latter goals with the intention to come back and pickup work on GBRT when a pending
pull request for sparse input decision trees is merged which will make it easier to continue work on the sparse input
for the ensemble method.
PR #3161 - Sparse Input for AdaBoost
Status: Completed and Merged
Summary of the work done: The ensemble/weighted_boosting class was edited to avoid densifying the input data
and to simply pass along sparse data to the base classifiers to allow them to proceed with training and prediction on
sparse data. Tests were written to validate correctness of the AdaBoost classifier and AdaBoost regressor when using
sparse data by making sure training and prediction on sparse and dense formats of the data gave identical results, as
well verifying the data remained in sparse format when the base classifier supported it. Go to the AdaBoost blog
post to see the results of sparse input with AdaBoost visualized.
PR - Sparse input Gradient Boosted Regression Trees (GBRT)
Status: To be started
Summary of the work to be done: Very similar to sparse input support for AdaBoost, the classifier will need
modification to support passing sparse data to its base classifiers and similar tests will be written to ensure
correctness of the implementation. The usefulness of this functionality depends heavily on the sparse support for
decision trees which is a pending mature pull request here PR #3173.

Sparse Output Support


The Sparse Label Binarizer pull request has gone through numerous revisions after being based of existing code
written in PR and it contains a large part of the work necessary to support sparse output for One vs. Rest
classification. With this support in place many of the binary classifiers in scikit-learn can be used in a one vs. all
fashion on sparse target data. Support for sparse target data in the multilabel metrics will be implemented to provide
users with metrics while avoiding the need to densify the target data. Finally in attempt to push support for sparse
target data past one vs. rest methods I will work on spare target data support for decision trees .
PR #3203 - Sparse Label Binarizer
Status: Nearing Completion
Summary of the work done: The label binarizing function in scikit-learns label code was modified to support
conversion from sparse formats and helper functions to this function from the utils module were modified to be able to
detect the representation type of the target data when it is in sparse format. Read about the workings of the label
binarizer.
PR #3276 - Sparse Output One vs. Rest
Status: Work In Progress
Summary of the work done: The fit and predict functions for one vs. rest classifiers modified to detect sparse target
data and handle it without densifying the entire matrix at once, instead the fit function iterates over densified columns
of the target data and fits an individual classifier for each column and the predict uses binarizaion on the results from
each classifier individually before combining the results into a sparse representation. A test was written to ensure that
classifier accuracy was within a suitable range when using sparse target data.

PR - Sparse Metrics
Status: To be started
Summary of the work done: Modify the metrics and some misc tools to support sparse target data so sparsity can
be maintained throughout the entire learning cycle. The tools to be modified include precision score, accuracy score,
parameter search, and other metrics listed on scikit-learns model evaluation documentation under the classification
metrics header.
PR - Decision Tree and Random Forest Sparse Output
Status: To be started
Summary of the work done: Make revisions in the tree code to support sparsely formatted target data and update
the random forest ensemble method to use the new sparse target data support.

Plan for the Coming Weeks


In hopes that the sparse label binarizer will be merged soon after making final revisions, early next week I will begin to
respond to the reviews of the sparse One vs. Rest pull request and we will also see the beginnings of the sparse
metrics pull request which should be wrapped up and ready for reviews in the same week. Following that the next
focus will be rewinding to sparse input for ensemble methods and putting a week of work into sparse support for
GBRT. Finally the sparse output decision tree pull request will be started when the remaining goals are nearing
completion.
There are different ways to represent target data, this week I worked on a system that converts the different formats
to an easy to use matrix format. The pull request is here. My work on this system introduced support to have this data
matrix optionally be represented sparsely. The final result when the pull request is completed will be support for
sparse target data ready to be used by the up and coming sparse One vs. Rest classifiers.
The function of this data converter is to take multiclass or multilabel target data and represent it in a binary fashion so
classifiers that work on binary data can be used with no modification. For example, target data might come from the
user like this. With integer class: 1 for Car, 2-Airplane, 3-Boat, 4-Helicopter. We label each of the following 5 images
with the appropriate class.

This data in a list of list format would look like this, we list each images labels one after the other:

Y = [2,1,3,2,4]
This Label binarizer will give a matrix where each column is an indicator for the class and each row is an
image/example.

[0,1,0,0]
[1,0,0,0]
Y = [0,0,1,0]
[0,1,0,0]
[0,0,0,1]

Before my pull request all conversions from label binarizer would give the above matrix in dense format as it appears.
My pull request has made it so that the user can specify if they would like the matrix to be returned in sparse format, if
so the matrix will be a sparse matrix and has the potential to save a lot of space and runtime depending on how
sparse the target data is.
These two calls to the label binarizer illustrate how sparse output can be enabled, the first call will print a dense matrix
the second call will return a sparse matrix.
Input:
Y_bin = label_binarize(y,classes=[1,2,3,4])
print(type(Y_bin))
print(Y_bin)
Output:
<type
[[0 1
[1 0
[0 0
[0 1
[0 0

'numpy.ndarray'>
0 0]
0 0]
1 0]
0 0]
0 1]]

Input:
Y_bin = label_binarize(y,classes=[1,2,3,4],sparse_output=True)
print(type(Y_bin))
print(Y_bin)
Output:
<class 'scipy.sparse.csr.csr_matrix'>
(0, 1)
1
(1, 0)
1
(2, 2)
1
(3, 1)
1
(4, 3)
1

The next pull request for sparse One vs. Rest support is what motivated this update because we want to overcome
runtime constraints on datasets with large amounts of labels causing extreme runtime and space requirements.
Thank you to the reviewers Arnaud, Joel, and Oliver for their comments this week and to Rohit for starting the code
which I based my changes off of.
This week as part of my work on the scikit-learn code base I implemented sparse input support with AdaBoost. This
work is being done in pull request 3161. I will give an demonstration of the value of AdaBoost and how my
contributions improved the scikit-learn implementation of the classifier. In addition with the goal of implementing
sparse output support in scikit-learn I have been working on a this pull request 3203 for sparse label binarization,
building off of code written previously by Rohit Sivaprasad. Of course I had help and I would like to thank Arnaud Joly,
Joel Nothman, and Olivier Grisel, for reviewing my code to help finalize and verify the correctness!

What is AdaBoost?
AdaBoost is a meta classifier, it operates by repeatedly training many base classifiers that are not very accurate and
pooling their results together to make a more accurate classifier. This is a common ensemble method known as
boosting. AdaBoost in addition looks for examples that most base classifiers are having trouble getting right and it
increases the focus on these examples in hopes of improving overall prediction accuracy.
We can demonstrate AdaBoost honing in on hard samples by running a demonstration where we train AdaBoost to
recognize the integer value from an image of a handwritten digit. By running AdaBoost we will now be able to see
which examples it had the most trouble on by examining the sample weights. Images with high sample weight are
harder to get right for the classifier.
The idea behind this experiment is that samples with high final sample weights after AdaBoost has finished training
on them is this: These samples were more commonly miss classified, the reasons for miss classification could be
subtle or they could be very obvious. A possible reason for miss classification is that the image looks not much like the
digit it is supposed to be representing, so it gets classified as another incorrect digit. Maybe it is more likely these
images with high sample weights will be malformed examples since so many classifiers are getting them incorrect.
To test this I trained AdaBoost on the digits dataset. I then retrieved the sample weights from the AdaBoost classifier,
I sorted them and got the four highest sample weights. These sample weights correspond to the four samples I have
put in the top row of the following image. I also found the four lowest sample weighted samples and put them on the
bottom row of the image.

Top Row: High Sample Weight - Hard to Get Right


Bottom Row: Low Sample Weight

In line with our intuition there is a very sloppy and vague example in the top row. The third image would be very hard
to identify as a two, the AdaBoost training process identified this sample and gave it a high sample weight. What is
interesting is that 1) Most of the other digits in the top row look easy to identify 2) The digits in the top row are all twos
and threes 2) The bottom row is all eights.
The way that I interpret this is that in this given data set the eights are the most consistently portrayed digit. They all
look the same in the bottom row, this is very important for accurate classification since the classifiers make their
prediction only on data they have seen before.
In the top row alone however we see two very different looking twos and two very different looking threes.
Understandably this makes these two digits hard to label correctly if the variation seen here is represented
throughout the entire data set.

Sparse Input Results


My improvements to AdaBoost came as runtime improvements by modifying the classifier to accept sparse input data
when the base classifiers do as well. See "Sparse Support for scikit-learn GSoC 2014" read more about sparsity in
scikit-learn. Here I demonstrate the elapsed time for training the classifier and using it to make predictions and the
differences sparse vs dense data create.
Using the 20 Newsgroups data we benchmark the performance, the dataset used has 200 features and 1000
samples. The AdaBoost classifier is made up of 50 SVM classifiers. Find the source here. Running the demonstration
and using python to time the results we get that the training and predict time both come down considerably when
using the sparse input data feature.

Low Training and Prediction time is important because it allows us to refactor experiments more rapidly but also is
necessary for quick realtime applications of prediction such as facial recognition, or handwriting recognition from a
live video stream.

Sparse Label Binarization


My work on sparse label binarization has been a small part of a bigger goal to get sparse output support for One vs.
Rest classifiers in scikit-learn. This functionality is used to take target data such as what categories or labels an
example falls under and standardize it by transformation to a representation that uses only on an on or off indicator
for each category or label. Typically this data before transformation is what is called a sequence of sequences. This
format is hard to work with and reason about efficiently. Sparse output support is an important part of my proposal
which I will expand on in coming blog posts when I am further along and able to demonstrate some examples and
performance where I utilize the changes I made to the label binarizer.
This summer I am going to improve sparsity support in the scikit-learn project, a Python machine learning library,
through Google Summer of Code.
Large data sets are becoming ubiquitous, they come in text, image, audio, and in rarer cases even video
format. Text datasets are the most widely used for their versatility, corporations use text data to store consumer
information and scientific research projects can use text data to represent their experimental results. To give an
example of how data is generated for data analysis projects we walk through the steps used in analyzing text data
and converting it to a form useful as input to machine learning algorithms.
We look at an example based of Wikipedia bag of words entry. In our example we begin with three documents, and
we process them to get a numerical representation of the data.
David likes to paint. Mary likes to sail.
David also likes to write.
Mary also likes to paint.
We take every unique word as we encounter it and insert it into a dictionary as an indexed entry.
Entry

Index

David
likes
to
paint
Mary
sail
also
write

1
2
3
4
5
6
7
8

We now look back over each original document and count the number of times an entry occurs in it. We represent
each document as a set of eight numbers, one for the count of each dictionary entry.
[1, 2, 2, 1, 1, 1, 0, 0]
[1, 0, 1, 0, 0, 0, 1, 1]
[1, 2, 2, 1, 1, 1, 0, 0]

We now look at a second example designed to illustrate data sparsity. Our three docuemnts this time will be full
wikipedia articles, Instead of pasting the text of the articles here I will put a link in place.

Tardigrade
Cinco de Mayo
Southeast Asia

Using the same approach as above we build a dictionary and find it has 2820 words.
Entry

Index

A
A.D
API
ASEAN
Abagatan
Abackay
...
Zone

1
2
3
4
5
6
...
2820

Each document will now have significantly more zeros in the vector representation since there are many words in the
dictionary that a document might not use. As we keep adding more and more documents we will have a larger
dictironary, but the number of non zero entries in our vectors will remain the same for each document since it still only
uses the words it contains.
[877 non-zero entires, 1943 zeros]
[706 non-zero entries, 2114 zeros]
[1790 non-zero entires, 1030 zeros]

This means the parts of our numeric data that encode the information in each document have become a smaller part
of the numerical representation. As data grows with more documents this trend continues until the number of empty
data or zeros would overwhelm the non-zero data that represents the words present.
In practical contexts where these documents correspond to text heavy encyclopedia entries or new articles we also
have some metadata about the document such as topics or categories it falls under. One document could have the
labels: biology, nature, and wilderness. Another document could fall under history, Britain, and nature.
With machine learning, the process of a computer improving with respect to a test as it sees more data, we would
want to teach software to recognize documents and be able to accurately label them. The matrix we generated above
to describe the documents would be called the input data. The output data, some of which we would provide initially
to give examples of correct answers, would be the labels or categories each document falls under.
A classifier is the piece of code we train to place data into categories, one such classifier is a decision tree which
gives its answers to new instances by using the most influential signs it can as identified by previous examples it has
seen. In machine learning a common way to get the best performance from training classifiers is to train many of
them and then combine them as a population to make one final decision, this is referred to as ensemble learning. The
first method for doing this is boosting which builds a strong classifier by combining identifiably weak classifiers which
do better than random guessing. Bagging is another ensemble method, where classifiers are trained independently
on random pieces of the data and combined resulting in an increased accuracy.

Data Density
Consider a set of approximately 2.4 million documents where each document can be labeled with a set of labels from
a collection of 325,000 labels. This is exactly the data set behind the competition: Large Scale Hierarchical Text
Classification (LSHTC) held on Kaggle, a host for machine learning competitions.
With data sets this large we accumulate a massive dictionary because of the vast vocabulary encountered across all
the different groups of the 2 million documents. Consequently each document only uses a small subset of the entries
in the dictionary, and the documents term counts for most of the words in dictionary is zero. This means our data will
consist of mostly zeroes. Sparsity is the fraction of zeros with respect to all elements.
The LSTHC train data set consists of 3.8 trillion elements, only 100 million of which are non zero. Assuming 4 bytes
per element represented as an integer, a computer would need 15.31 TB to load this data set into memory. Since this
is not all realistic, it is useful to represent the data in a sparse format by excluding all the zeros. This format only
records the non zero elements and their locations. Since the data is approximately 99.9% zeros this shrinks the
memory require way down to 700 MB.

Improvements With Sparsity


In the following example we see the runtime benefits of sparse representation by training a Knn classifier on the
same data once in sparse format and once in dense format.
We run an experiment on a subset of the 20 Newsgroups sparse data set, the source code is available here. By
counting the number of zeros in the input data and dividing by the total number of elements The data consists of 99%
zeros. With 39.9 million zeros and 40 million elements. In the two following benchmarks we time the time the training
step of the classifier in addition to the classification of three new examples. The two code comparisons below differ in
the first two lines X_train = X_train.toarray() and X_test = X_test.toarray() these two lines in the dense format run
convert the data to a matrix that explicitly includes all the zeros. In the lines that follow the code for both cases is
identical. We initialize the classifier, begin a timer, train the classifier, predict on three examples (index slices the list
for three examples), finally we stop the timer and print the result. Notice the densification of the data is not taken to be
part of timed section so we are really only comparing the classifiers training and predict time.

Dense Format

Sparse Format

# Densify data, originally sparse


X_train = X_train.toarray()
X_test = X_test.toarray()

# Initialize K-nn
neigh =
KNeighborsClassifier(n_neighbors=3)

# Initialize K-nn
neigh =
KNeighborsClassifier(n_neighbors=3)

# Begin Timing
start = time.clock()

# Begin Timing
start = time.clock()
# Train on data
neigh.fit(X_train, Y_train)

# Train on data
neigh.fit(X_train, Y_train)
# Predict
neigh.predict(X_test[index])
return (time.clock() - start)

# Predict
neigh.predict(X_test[index])

Result: 0.01 Seconds

return (time.clock() - start)

Result: 0.84 Seconds

The sparse format shows a speedup of close to two orders of magnitude on the same data. To see how sparse and
dense performance vary with data size we rerun the above experiment with different sizes of the data trained on and
plot the progression of performance.

Sparsity occurs in images, video and audio through appropriate data representation, images can be encoded in
wavelets and all formats can be encoded sparsely with key feature extraction.

GSoC 2014 - Key Points


My GSoC Project will have two main goals.
1)To improve input sparsity support with ensemble methods and the underlying decision tree classifier.
2)Improve multioutput sparse support.
The sparse input support for decision tree is under good progress thanks to @fareshedayati (see pull request 2984).
Within the ensemble methods, sparse support to bagging has been completed by @glouppe (see pull request 2375)
and tested by @msalahi (see pull request 3076). Note that the documentation still needs to be updated. However, lots
of things remains for boosting meta-estimators adaboost and gradient tree boosting.
For sparse multi-output, especially sparse multi-label, there is no support yet. This will be the next challenge during
my GSOC. Finally, I will improve sample weight support which is important in the ensemble context.

Vous aimerez peut-être aussi