Weakly Supervised Object Localization with ConvNet Feature and Multi-fold MIL Training


Object localization is a problem on how to locate a bounding box into object position in an image. Weakly supervised learning refer to when the training data is incomplete, in this case without object bounding box annotation. This study focus on how to perform object localization in weakly supervised fashion. The main contribution would be how to use features extracted from pre-trained convolutional neural network (ConvNet) with Multi-fold MIL training [1]. ConvNet feature has been shown as effective feature descriptor for various vision task [2]. Selective Search is used as region proposal algorithm to generate training data and possible object location in test phase [3].

[1] Cinbis, Ramazan Gokberk, Jakob Verbeek, and Cordelia Schmid. "Multi-fold mil training for weakly supervised object localization." Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on. IEEE, 2014.
[2] Razavian, Ali Sharif, et al. "CNN Features off-the-shelf: an Astounding Baseline for Recognition." Computer Vision and Pattern Recognition Workshops (CVPRW), 2014 IEEE Conference on. IEEE, 2014.
[3] Uijlings, Jasper RR, et al. "Selective search for object recognition." International journal of computer vision 104.2 (2013): 154-171.

object localization, weakly supervised, multiple instance learning, convolutional neural network

The objective for this is to improve the accuracy for weakly supervised object localization in image using combination of ConvNet feature and multi-fold MIL training.


In the first phase, a region proposal algorithm such as Selective Search is used to generate candidate object window from input image.
Then each regions is having its ConvNet features extracted.
This training data is then used to train a detector using Multi-fold weakly supervised training.
The trained detector then can be used to predict the object location on the test set.

The performance will then be compared with other state-of-the-art method in weakly supervised object localization using PASCAL VOC 2007 dataset using average precision (AP).


I will be the one who conduct the research under supervision of my master thesis supervisor, Professor Josephine Sullivan and her PhD student.

6.Computation plan (required processor core hours, data storage, software, etc)

A dual core processor would suffice to run the experiment. More than 100 GB storage for dataset is necessary. OS Ubuntu 12.04 is preferred. One important equipment is nvidia GPU. As the dataset is quite big, RAM size of more than 8 GB is needed.

7.Source of funding
1 Master Thesis with possible extension to conference paper
9.Date of usage
06/02/2015 - 01/06/2015
10.Gpu usage
use gpu
11.Supporting files
12.Created at
13.Approval status