Vous êtes sur la page 1sur 5

1.

Simultaneous Detection and Segmentation

ABSTRACT

We aim to detect all instances of a category in an image and, for each instance, mark the pixels that
belong to it. We call this task Simultaneous Detection and Segmentation (SDS). Unlike classical
bounding box detection, SDS requires a segmentation and not just a box. Unlike classical semantic
segmentation, we require individual object instances. We build on recent work that uses
convolutional neural networks to classify category-independent region proposals (R-CNN [16]),
introducing a novel architecture tailored for SDS. We then use category-specific, topdown figure-
ground predictions to refine our bottom-up proposals. We show a 7 point boost (16% relative) over
our baselines on SDS, a 5 point boost (10% relative) over state-of-the-art on semantic segmentation,
and state-of-the-art performance in object detection. Finally, we provide diagnostic tools that
unpack performance and provide directions for future work.
2. One Millisecond Face Alignment with an Ensemble of Regression Trees

This paper addresses the problem of Face Alignment for a single image. We show how an ensemble
of regression trees can be used to estimate the face’s landmark positions directly from a sparse
subset of pixel intensities, achieving super-realtime performance with high quality predictions. We
present a general framework based on gradient boosting for learning an ensemble of regression
trees that optimizes the sum of square error loss and naturally handles missing or partially labelled
data. We show how using appropriate priors exploiting the structure of image data helps with
efficient feature selection. Different regularization strategies and its importance to combat
overfitting are also investigated. In addition, we analyse the effect of the quantity of training data on
the accuracy of the predictions and explore the effect of data augmentation using synthesized data.
3. Deep Residual Learning for Image Recognition

Deeper neural networks are more difficult to train. We present a residual learning framework to
ease the training of networks that are substantially deeper than those used previously. We explicitly
reformulate the layers as learning residual functions with reference to the layer inputs, instead of
learning unreferenced functions. We provide comprehensive empirical evidence showing that these
residual networks are easier to optimize, and can gain accuracy from considerably increased depth.
On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers—8× deeper
than VGG nets [40] but still having lower complexity. An ensemble of these residual nets achieves
3.57% error on the ImageNet test set. This result won the 1st place on the ILSVRC 2015 classification
task. We also present analysis on CIFAR-10 with 100 and 1000 layers. The depth of representations is
of central importance for many visual recognition tasks. Solely due to our extremely deep
representations, we obtain a 28% relative improvement on the COCO object detection dataset. Deep
residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions1 , where we
also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection,
and COCO segmentation.
4. How transferable are features in deep neural networks?

Many deep neural networks trained on natural images exhibit a curious phenomenon in common:
on the first layer they learn features similar to Gabor filters and color blobs. Such first-layer features
appear not to be specific to a particular dataset or task, but general in that they are applicable to
many datasets and tasks. Features must eventually transition from general to specific by the last
layer of the network, but this transition has not been studied extensively. In this paper we
experimentally quantify the generality versus specificity of neurons in each layer of a deep
convolutional neural network and report a few surprising results. Transferability is negatively
affected by two distinct issues: (1) the specialization of higher layer neurons to their original task at
the expense of performance on the target task, which was expected, and (2) optimization difficulties
related to splitting networks between co-adapted neurons, which was not expected. In an example
network trained on ImageNet, we demonstrate that either of these two issues may dominate,
depending on whether features are transferred from the bottom, middle, or top of the network. We
also document that the transferability of features decreases as the distance between the base task
and target task increases, but that transferring features even from distant tasks can be better than
using random features. A final surprising result is that initializing a network with transferred features
from almost any number of layers can produce a boost to generalization that lingers even after fine-
tuning to the target dataset.
5.A REVIEW ON MULTI LABEL LEARNING ALGORITHMS

Multi-label learning studies the problem where each example is represented by a single
instance while associated with a set of labels simultaneously. During the past decade,
significant amount of progresses have been made toward this emerging machine learning
paradigm. This paper aims to provide a timely review on this area with emphasis on state-of-
the-art multi-label learning algorithms. Firstly, fundamentals on multi-label learning including
formal definition and evaluation metrics are given. Secondly and primarily, eight
representative multi-label learning algorithms are scrutinized under common notations with
relevant analyses and discussions. Thirdly, several related learning settings are briefly
summarized. As a conclusion, online resources and open research problems on multi-label
learning are outlined for reference purposes.

Vous aimerez peut-être aussi