A smart JPEG camera for home security

By in raspberry-pi    

First Prize IoT Builders Contest 2016 (IBM Watson IoT)

Screenshot 2016-11-05 18.18.12.png
Screenshot 2016-11-05 18.26.44.png
Screenshot 2016-11-05 18.28.35.png
thumb_IMG_0615_1024.jpg
Screenshot 2016-11-04 22.09.36.png
Screenshot 2016-11-04 22.35.22.png
Screenshot 2016-11-05 18.06.44.png
14877793_1118884861498137_1672939215_n.jpg

Introduction

This instructable will cover the basic steps that you need to follow to get started with open sources such as Watson nodes(Visual Recognition V3, Text To Speech) for IBM Bluemix, Node-RED, OpenCV, MQTT v3.1. MQTT(Message Queueing Telemetry Transport) is a Machine-To-Machine(M2M) or Internet of Things (IoT) connectivity protocol that was designed to be extremely lightweight and useful when low battery power consumption and low network bandwidth is at a premium. It was invented in 1999 by Dr. Andy Stanford-Clark and Arlen Nipper and is now an Oasis Standard.

I’ve already published an instructable of the Smart Gas Valve For Safety. In addition, I’m going to communicate between A Smart JPEG Camera and A Smart Gas Valve for M2M Communication by MQTT. Specifically, this instructable will cover how to code the Node-RED on Raspberry Pi2 as a MQTT client by connecting to your home wireless network and how to send sensor data. I will be using A Smart Gas Valve for M2M communication by MQTT.

Step 1: Table of Contents

  • Step 0: Introduction
  • Step 1: Table of Contents
  • Step 2: Bill of Materials
  • Step 3: Setting up the Camera & PIR Sensor with Raspberry Pi
  • Step 4: Programming NodeRED on Raspberry Pi2
  • Step 5: Setting up MQTT v3.1 on Raspberry Pi2
  • Step 6: Checking your NodeRED codes with MQTT on Raspberry Pi2
  • Step 7: Programming Python JPEG Camera
  • Step 8: Adding IBM Watson, IBM NoSQL DB, Play-Audio, and Twilio
  • Step 9: Adding autostart files for every boot
  • Step 10: Testing M2M Communication
  • Step 11: (Optional) Using OpenCV
  • Step 12: Download list
  • Step 13: List of references

Step 2: Bill of Materials

  • Wifi dongle X 1ea
  • PIR motion sensor X 1ea
  • Android smartphone’s portable battery X 2ea
  • Nod-RED software X 1ea
    • Free open source
    • Use the version pre-installed in Raspbian Jessie image since November 2015
    • Installation guide
  • MQTT v3.1 software X 1ea
    • Free open source
    • Installation guide includes at Step 5
  • NodeRED’s IBM Watson Nodes for Bluemix
    • Text to speech node X 1ea
    • Visual Recognition X 1ea
  • Speaker X 1ea
  • Minion X 1ea
    • You can easily buy it from eBay.

Step 3: Setting up the Camera & PIR Sensor with Raspberry Pi

Setting up the  Camera & PIR Sensor with Raspberry Pi
Screenshot 2016-10-29 17.50.26.png

Assembly steps for Smart JPEG Camera

(1) Connect the Raspberry Pi2 with a PIR motion sensor as shown above in the circuit diagram.

(2) Connect the PIR motion sensor with Raspberry Pi2.

  • Raspberry Pi2 PIR motion Sensor
    • 5V —————- VCC
    • GND ————- GND
    • GPIO 18 ——– OUT

(4) Assemble carefully the Pi camera with Raspberry Pi2.

(5) Connect a portable battery with Raspberry Pi2. (Use any portable battery to connect with the same size connector cable on Raspberry Pi2. )

Assembly steps for Smart Gas Valve : here

Step 4: Programming NodeRED on Raspberry Pi2

Programming NodeRED on Raspberry Pi2
581db15415be4d4ed700153d.jpeg
581db1e345bceb7607000d17.jpeg
581db2994fbadef11c001536.jpeg
581db31c4936d4c09200053f.jpeg
581db24315be4d1908000c68.jpeg
581db3dd4fbadef11c00153e.jpeg
Screenshot 2016-11-04 08.36.16.png

How to start Node-RED on web-browser.

(1) Write down command shown below to a terminal window. node-red-start

(2) You can find an IP address as below. ‘Once Node-RED has started, point a browser at http://169.254.170.40:1880’ (It depends on your IP address)

(3) Open your web browser.

(4) Copy the IP address and paste on the web browser.

(5) It will display a visual editor of Node-RED on the web browser.

(6) You can start coding with visual editor on the web browser.

(7) Try dragging & dropping any node from the left-hand side to right-hand side. It’s really easy to code. ( You can conveniently use the visual editor offline as well as online. ) Download the ‘SmartGasValve_NodeRED.txt’ file. (1) Click the number (1) at the right-hand side corner shown in NodeRED on the web browser.

(2) Click the Import button on the drop down menu.

(3) Open the Clipboard shown in the above 1st picture.

(4) Lastly, paste the given JSON format text of ‘SmartJPGCameraNoCredits_NodeRED_ver0.1.txt‘ in Import nodes editor.

Step 5: Setting up MQTT v3.1 on Raspberry Pi2

Setting up MQTT v3.1 on Raspberry Pi2
Screenshot 2016-10-25 23.12.34.png
Screenshot 2016-10-25 23.13.03.png
Screenshot 2016-10-25 23.11.12.png
Screenshot 2016-10-25 23.10.09.png

Setting up MQTT v3.1 on Raspberry Pi2

This message broker(Mosquitto) is supported by MQTT v3.1 and it is easily installed on the Raspberry Pi and somewhat less easy to configure. Next, we step through installing and configuring the Mosquitto broker. We are going to install & test the MQTT “mosquitto” on the terminal window.

curl -O http://repo.mosquitto.org/debian/mosquitto-repo.gpg.key
sudo apt-key add mosquitto-repo.gpg.key
rm mosquitto-repo.gpg.key
cd /etc/apt/sources.list.d/
sudo curl -O http://repo.mosquitto.org/debian/mosquitto-jessie.list
sudo apt-get update

Next install the broker and command line clients:

  • mosquitto – the MQTT broker (or in other words, a server)
  • mosquitto-clients – command line clients, very useful in debugging
  • python-mosquitto – the Python language bindings
sudo apt-get install mosquitto mosquitto-clients python-mosquitto

As is the case with most packages from Debian, the broker is immediately started. Since we have to configure it first, stop it.

sudo /etc/init.d/mosquitto stop

Now that the MQTT broker is installed on the Pi we will add some basic security.
Create a config file:

cd /etc/mosquitto/conf.d/

sudo nano mosquitto.conf

Let’s stop anonymous clients connecting to our broker by adding a few lines to your config file. To control client access to the broker we also need to define valid client names and passwords. Add the lines:

allow_anonymous false

password_file /etc/mosquitto/conf.d/passwd

require_certificate false

Save and exit your editor (nano in this case).
From the current /conf.d directory, create an empty password file:

sudo touch passwd

We will use the mosquitto_passwd tool to create a password hash for user pi:

sudo mosquitto_passwd -c /etc/mosquitto/conf.d/passwd pi

You will be asked to enter your password twice. Enter the password you wish to use for the user you defined.

Testing Mosquitto on Raspberry Pi

Now that Mosquitto is installed we can perform a local test to see if it is working:
Open three terminal windows. In one, make sure the Mosquitto broker is running:

mosquitto

In the next terminal, run the command line subscriber:

mosquitto_sub -v -t 'topic/test'

You should see the first terminal window echo that a new client is connected.
In the next terminal, run the command line publisher:

mosquitto_pub -t 'topic/test' -m 'helloWorld'

You should see another message in the first terminal window saying another client is connected. You should also see this message in the subscriber terminal:

topic/test helloWorld

We have shown that Mosquitto is configured correctly and we can both publish and subscribe to a topic.
When you finish testing all, let’s set up below that.

sudo /etc/init.d/mosquitto start

Step 6: Checking your NodeRED codes with MQTT on Raspberry Pi2

Checking your NodeRED codes with MQTT on Raspberry Pi2
Screenshot 2016-10-25 23.25.14.png
Screenshot 2016-11-05 20.48.11.png
Screenshot 2016-10-25 23.26.11.png
Screenshot 2016-10-25 23.26.28.png
Screenshot 2016-11-05 20.46.30.png

When you have already used the JSON format of the ‘SmartGasValve_NodeRED.txt’ on Node-RED, it’s automatically set up & coded each data. I have already set up the each data in each node.

(1) Click each node.

(2) Check information inside each node has been prefilled.

(3) Please don’t change the set data.

(The above can be customized for more advanced users.)

Step 7: Programming Python JPEG Camera

Programming Python JPEG Camera
Screenshot 2016-10-26 01.35.49.png
Screenshot 2016-10-26 00.46.23.png
Screenshot 2016-10-26 01.35.31.png
14885984_1118885038164786_1036372151_n.jpg

Programming Python JPEG Camera

First of all, you should test the camera module in the terminal window.

raspistill -o test.jpg

You should see the test.jpg in ‘/home/pi’

cd /home/pi
mkdir pythonPir
cd pythonPir
sudo nano pircameraNodeRED.py

Type the below (the enclosed file) Or Put ‘pircameraNodeRED.py’ file into ‘/home/pi/pythonPir’ folder.

import RPi.GPIO as GPIO 
import time
import picamera
import datetime 

timeFormat = 0

GPIO.setmode(GPIO.BCM)
GPIO.setup(17, GPIO.IN)  # For M2M Communication from Gas Valve signal
GPIO.setup(18, GPIO.IN)
camera = picamera.PiCamera()

while True:
        input17 = GPIO.input(17)  #Pin number 17 activates
        input18 = GPIO.input(18)  #Pin number 18 activates
        now = datetime.datetime.now()
        timeFormat = now.strftime("%Y%m%d_%H%M_%S.%s") #To put date and time in images

        if input17 == True or input18 == True:  #If PIR Sensor detects something, the Picamera will take.
                print('Motion_Detected_%s' %timeFormat)
                camera.capture('image_%s.jpg' %timeFormat) #To take a picture

                time.sleep(1) #sleeping time 1 second

When you finish typing, you should press the keys ‘Control‘ + ‘x‘ and press ‘y‘ to save this file.

Making an image file server

cd /home/pi
mkdir camserver
sudo nano requirements.txt

Type the below (the enclosed file) Or Put ‘requirements.txt’ file into ‘/home/pi/camserver’ folder.

numpy==1.10.1
websocket-client==0.35.0
websocket-server==0.4
ibmiotf==0.2.3
pip install --user -r requirements.txt

Execute an image file server in /home/pi/ below.

cd /home/pi
python -m SimpleHTTPServer 7000

Step 8: Adding IBM Watson, IBM NoSQL DB, Play-Audio, and Twilio

Adding IBM Watson, IBM NoSQL DB, Play-Audio, and Twilio
Screenshot 2016-11-04 08.35.52.png
Screenshot 2016-11-04 08.36.16.png
Screenshot 2016-11-04 08.33.28.png
Screenshot 2016-11-04 08.35.43.png

Searching the Nodes

Node-RED comes with a core set of useful nodes, but there are a growing number of additional nodes available for installing from both the Node-RED project as well as the wider community. You can search for available nodes in the Node-RED library or on the npm repository.

  • For example, we are going to search Twilio at the npm web. Click here.
  • Then, we are going to install Twilio on Raspberry pi.

Installing npm packaged node

To add additional nodes you must first install the npm tool, as it is not included in the default installation. The following commands install npm and then upgrade it to the latest 2.x version.

sudo apt-get update
sudo apt-get install npm
sudo npm install -g npm@2.x
hash -r
cd /home/pi/.node-red
  • For example, ‘npm install node-red-{example node name}’
  • Copy the ‘npm install node-red-node-twilio’ from the npm web. Paste it on a terminal window.
  • Ex: node-red-node-watson, node-red-contrib-play-audio, node-red-dashboard, and node-red-node-pidcontrol.
npm install node-red-node-twilio
  • You will need to restart Node-RED for it to pick-up the new nodes.
node-red-stop

node-red-start
  • Close your web browser and reopen the web browser.

Step 9: Adding autostart files for every boot.

Adding autostart files for every boot.

How to make autostart files at every boot.

  • Mosquitto
cd /etc/xdg/autostart/
sudo nano flyMosquitto.desktop

Type the below (this will enclose the file) Or Put ‘flyMosquitto.desktop’ file into autostart folder.

[Desktop Entry] 
Type=Application
Name=flyMosquitto
Comment=Fly my mosquitto
Exec=cd /etc/mosquitto/conf.d/
Exec=mosquitto
  • Node-RED
sudo systemctl enable nodered.service
  • Python JPEG Camera
cd /etc/xdg/autostart/
sudo nano pircameraNodeRED.desktop

Type the description below or put the ‘pircameraNodeRED.desktop’ file into /etc/xdg/autostart/ folder.

[Desktop Entry]
Type=Application
Name=pircameraNodeRED.py
Comment=Start my security camera
NoDisplay=false
Exec=python /home/pi/pythonPir/pircameraNodeRED.py
NotShowIn=GNOME;KDE;XFCE;
Name[en_US]=pircamera.py
  • Image file Server
cd /etc/xdg/autostart/
sudo nano imageFileServer.desktop

Type the description below or put the ‘imageFileServer.desktop’ file into /etc/xdg/autostart/ folder.

[Desktop Entry]
Type=Application 
Name=imageFileServer 
Comment=Start an image file server 
NoDisplay=false 
Exec=cd /home/pi 
Exec=python -m SimpleHTTPServer 7000

Step 10: Testing M2M Communication.

Testing M2M Communication.
Screenshot 2016-10-29 18.29.58.png
IMG_0395.JPG
IMG_0400.JPG

Importing the enclosed files in each NodeRED.

(1) Using a smart JPEG camera

Import the ‘M2M_SmartJPGCamera.txt‘ into the NodeRED of the smart JPEG camera.

(2) Using a smart gas valve

Import the ‘M2M_SmartGasValve.txt‘ into the NodeRED of the smart gas valve.

(3) Check an IP address of the smart gas valve in the Raspberry Pi2.

Type ‘ifconfig’ on a terminal window as shown below.

ifconfig

When you see the IP address, copy the IP address in a terminal window.

(4) Put the IP address into the MQTT node in other Raspberry Pi2.

  1. Click the MQTT node.
  2. Put the IP address into Server.

Step 11: (Optional) Using OpenCV

(Optional) Using OpenCV
Screenshot 2016-11-05 18.11.14.png

Installing & Using OpenCV on Raspberry Pi2

We have already used the IBM Watson Visual Recognition. Watson Visual Recognition is very excellent whereas we can’t use it without connecting wifi. OpenCV is possible to use without internet connection but It’s not very easy for a beginner to install & code into OpenCV. So, I’m going to install the OpenCV.

  • Download ‘opencv-3.1.0.zip from opecv.org
  • Install dependencies
sudo apt-get update
sudo apt-get install build-essential
sudo apt-get install cmake git libgtk2.0-dev pkg-config libavcodec-dev libavformat-dev libswscale-dev python-dev python-numpy libjpeg-dev libpng-dev libtiff-dev libjasper-dev
  • (Optional) Install OpenCV 2
sudo apt-get install python-opencv
  • Install OpenCV 3
unzip ~/Downloads/opencv-3.1.0.zip
cd opencv-3.1.0/
mkdir build
cd build/
cmake -DCMAKE_BUILD_TYPE=Debug -DBUILD_TESTS=NO -DBUILD_PERF_TESTS=NO ..
make -j3
sudo make install
sudo ldconfig
  • Check which version of OpenCV you have in Python
python
import cv2
cv2.__version__
  • Run the simple face detect sample, and look at its code to see how it works:
  • Before, you should connect an USB-cam with Raspberry Pi2
cd /home/pi
cd opencv-3.1.0
python ./facedetect.py

CUDA

[ CUDA ]

Motivation
Modern GPU accelerators has become powerful and featured enough to be capable to perform general purpose computations (GPGPU). It is a very fast growing area that generates a lot of interest from scientists, researchers and engineers that develop computationally intensive applications. Despite of difficulties reimplementing algorithms on GPU, many people are doing it to check on how fast they could be. To support such efforts, a lot of advanced languages and tool have been available such as CUDA, OpenCL, C++ AMP, debuggers, profilers and so on.

Significant part of Computer Vision is image processing, the area that graphics accelerators were originally designed for. Other parts also suppose massive parallel computations and often naturally map to GPU architectures. So it’s challenging but very rewarding to implement all these advantages and accelerate OpenCV on graphics processors.

History
OpenCV includes GPU module that contains all GPU accelerated stuff. Supported by NVIDIA the work on the module, started in 2010 prior to the first release in Spring of 2011. It includes accelerated code for siginifcant part of the library, still keeps growing and is being adapted for the new computing technologies and GPU architectures.

Goals
1. Provide developers with a convenient computer vision framework on the GPU, maintain conceptual consistency with the current CPU functionality.
2. Achieve the best performance with GPUs (efficient kernels tuned for modern architectures, optimized dataflow like async. execution, copy overlaps, zero-copy)
3. Completeness (implement as much as possible, even if speed-up is not fantastic; such allows to run an algorithm entirely on GPU and save on coping overheads)

 

Performance

GPUvsCPU

Design considerations
OpenCV GPU module is written using CUDA, therefore it benefits from the CUDA ecosystem. There is a large community, conferences, publications, many tools and libraries developed such as NVIDIA NPP, CUFFT, Thrust.
The GPU module is designed as host API extension. This design provides the user an explicit control on how data is moved between CPU and GPU memory. Although the user has to write some additional code to start using the GPU, this approach is both flexible and allows more efficient computations.
GPU modules includes class cv::gpu::GpuMat which is a primary container for data kept in GPU memory. It’s interface is very similar with cv::Mat, its CPU counterpart. All GPU functions receive GpuMat as input and output arguments. This allows to invoke several GPU algorithms without downloading data. GPU module API interface is also kept similar with CPU interface where possible. So developers who are familiar with Opencv on CPU could start using GPU straightaway.

Short sample
In the sample below an image is loaded from png0file, next it is uploaded to GPU, thresholded, downloaded and displayed.

 

#include <iostream>
#include "opencv2/opencv.hpp"
#include "opencv2/gpu/gpu.hpp"

int main (int argc, char* argv[])
{
    try
    {
        cv::Mat src_host = cv::imread("file.png", CV_LOAD_IMAGE_GRAYSCALE);
        cv::gpu::GpuMat dst, src;
        src.upload(src_host);

        cv::gpu::threshold(src, dst, 128.0, 255.0, CV_THRESH_BINARY);

        cv::Mat result_host = dst;
        cv::imshow("Result", result_host);
        cv::waitKey();
    }
    catch(const cv::Exception& ex)
    {
        std::cout << "Error: " << ex.what() << std::endl;
    }
    return 0;
}

 

References

 

CUDA Video Lecture (NVIDA Developer)

Octave bindings for OpenCV

[ Octave bindings for OpenCV ]
Here are Octave bindings for OpenCV, a collection of over 500 functions implementing computer vision, image processing, machine learning, and general-purpose numeric algorithms. The library can optionally use Intel’s IPP library for better performance (via MMX, SSE, SSE2, etc).


Installing
This is a port of the Python bindings for OpenCV, and as such lives in the OpenCV source tree. When configured properly, the tree will produce an Octave binary package.

You must have a recent version of SWIG in your PATH (like from SVN, such that it supports Octave) and configure the tree with –with-octave and –with-swig. The build will produce an Octave package at $prefix/share/opencv/opencv-1.0.tar.gz. For example (assuming the loader sees /usr/local/lib),

$ tar xzf octave-opencv-031808.tar.gz
$ cd opencv-031808
$ ./configure --with-swig --with-octave
$ make
$ make install
$ octave -q
octave:1>> pkg install /usr/local/share/opencv/octave/opencv-1.0.tar.gz
octave:2>> pkg list
Package Name  | Version | Installation directory
--------------+---------+-----------------------
      opencv *|   1.0.0 | /home/x/octave/opencv-1.0.0
octave:3> I=cvLoadImage("image002.png");

 


Using the library
There is extensive documention on the library available here, and online help is available using the help command in octave. The online help gives only the function prototypes, but when used in conjunction with the OpenCV manual, it should be sufficient to use the library.

Interoperability with Octave matrices and images is provided via the cv2mat, mat2cv, cv2im, and im2cv functions. There is online help available for these.

 


Releases
3/27/08: These bindings have been integrated into OpenCV. Latest sources are now available only from OpenCV CVS.

3/17/08: first release; alpha quality. Samples translated but not all running.
octave-opencv-031808.tar.gz (patched opencv, based on recent cvs snapshot)
octave-opencv-031808.patch.gz (just the patch)

octave-opencv was written by Xavier Delacour. Please send feedback, bugs, and/or patches to xavier dot delacour at gmail dot com.

Configuring Qt for OpenCV on OSX

[ Configuring Qt for OpenCV on OSX ]

In this tutorial we will learn how to configure Qt to use OpenCV. Although the tutorial is targeted for OSX users, you can modify my suggestions for use in Linux and Windows. I assume you have a working knowledge of Qt and you have at least built a “hello world” application using it.

Of the few different ways of configuring Qt for OpenCV, we will use the one that involves pkg-config.

Install OpenCV

For this tutorial I am assuming you have installed OpenCV 2 or OpenCV 3. If you have not, you can install them using Homebrew. The basic installation commands are shown below.

 

Install OpenCV 2.4.x on OSX using Homebrew

brew tap homebrew/science
brew install opencv

 

Install OpenCV 3 on OSX using Homebrew

brew tap homebrew/science
brew install opencv3

You can find detailed instructions for installing OpenCV using Homebrew by clicking here. Using OpenCV 3 in a Qt application is a bit tricky for OSX because opencv and opencv3 packages contain the same libraries and so opencv3 is not deployed to /usr/local/lib like other packages. I will help you navigate through these complexities.

Install Qt Creator

You can download Qt Creator and follow the onscreen instructions for installing it. There are some restrictions on using Qt in a commercial application and you should make sure you know about licensing issues. You may find this discussion helpful.

Build settings for Qt / OpenCV project

The very first thing to do is instruct Qt where to find pkg-config. The default location for pkg-config is /usr/local/bin.

which pkg-config
# returns /usr/local/bin/pkg-config

We need to add /usr/local/bin to PATH. Go to Project, expand Build Environment and add /usr/local/bin to PATH. Don’t forget to add a colon (:) before appending /usr/local/bin! See the screenshot below.

Qt build settings for OpenCV

If you are using OpenCV 3 you also need to add a new variable called PKG_CONFIG_PATH and set it to the directory that contains opencv.pc for your OpenCV 3 installation. You can find it using the following command

find /usr/local -name "opencv.pc"
# /usr/local/Cellar/opencv3/3.1.0_3/lib/pkgconfig/opencv.pc

You may not need the above step for OpenCV 2.4.x.

How to modify Qt project file ( .pro ) for OpenCV

Now that the paths are correctly set up, we need to add a few lines to our project file (.pro) to tell qmake to use pkg-config for OpenCV.

# The following lines tells Qmake to use pkg-config for opencv
QT_CONFIG -= no-pkg-config
CONFIG += link_pkgconfig
PKGCONFIG += opencv

If you are using OpenCV 3, the compiler may complain

ld: library not found for -lippicv

 

How to fix compiler error “ld: library not found for -lippicv”

The ippicv library the compiler is complaining about is actually included in your OpenCV installation, but it is not in the right path. Let’s first locate the library

find /usr/local/Cellar -name "libippicv*"
# /usr/local/Cellar/opencv3/3.1.0_3/share/OpenCV/3rdparty/lib/libippicv.a

You can tell the compiler where to look for libippicv.a by adding the following line to the .pro file

LIBS += -L/usr/local/Cellar/opencv3/3.1.0_3/share/OpenCV/3rdparty/lib/

Alternatively, you can symlink libippicv.a to /usr/local/lib using the following command on the terminal

ln -s /usr/local/Cellar/opencv3/3.1.0_3/share/OpenCV/3rdparty/lib/libippicv.a

and then add the following to the .pro file

LIBS += -L/usr/local/lib/
# See cautionary note below

 

Caution : When both opencv and opencv3 packages are installed

When both opencv and opencv3 packages are installed, and you are using opencv3 in your Qt application, adding /usr/local/lib to your path ( as shown in the previous section ) can lead to linking errors. The best thing to do in such a case is unlink opencv.

brew unlink opencv
I have spent an embarrassing amount of time debugging the above problem in El Capitan. It does not show up on Yosemite. That’s all you need to know to build a Qt application with OpenCV, but let me throw in another goodie.

 

How to convert OpenCV Mat to QImage

The following piece of code converts OpenCV Mat to QImage. This will come in handy when you build your first Qt based OpenCV application.

# Create an OpenCV image.
Mat image(320, 240, CV_8UC3, Scalar(0,0,0));

# Convert it to QImage
QImage qImage=QImage((const unsigned char*)(image.data),image.cols,image.rows,QImage::Format_RGB888).rgbSwapped());

 

Subscribe & Download Code

If you liked this article and would like to download code, please subscribe to our newsletter. You will also receive a free Computer Vision Resource guide. In our newsletter we share OpenCV tutorials and examples written in C++/Python, and Computer Vision and Machine Learning algorithms and news.

TUTORIAL ON DEEP LEARNING FOR VISION BY CVPR 2014

[ TUTORIAL ON DEEP LEARNING FOR VISION BY CVPR 2014 ]

A tutorial in conjunction with the Intl. Conference in Computer Vision (CVPR) 2014.
Monday June 23, 2014
Grand Ballroom 2
Columbus, Ohio

Schedule

Morning Session: foundations

12.30-1.30 Lunch

Afternoon Session: advanced topics

 
Organizers
Graham Taylor, University of Guelph
Marc’Aurelio Ranzato, Facebook AI Research
Honglak Lee, University of Michigan

Awesome Deep Vision

[ Awesome Deep Vision ]

Awesome

A curated list of deep learning resources for computer vision, inspired by awesome-php and awesome-computer-vision.

Maintainers – Jiwon Kim, Heesoo Myeong, Myungsub Choi, Jung Kwon Lee, Taeksoo Kim

We are looking for a maintainer! Let me know (jiwon@alum.mit.edu) if interested.

Contributing

Please feel free to pull requests to add papers.

Join the chat at https://gitter.im/kjw0612/awesome-deep-vision

Sharing

Table of Contents

Papers

ImageNet Classification

classification(from Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton, ImageNet Classification with Deep Convolutional Neural Networks, NIPS, 2012.)

  • Microsoft (Deep Residual Learning) [Paper][Slide]
    • Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun, Deep Residual Learning for Image Recognition, arXiv:1512.03385.
  • Microsoft (PReLu/Weight Initialization) [Paper]
    • Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun, Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification, arXiv:1502.01852.
  • Batch Normalization [Paper]
    • Sergey Ioffe, Christian Szegedy, Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift, arXiv:1502.03167.
  • GoogLeNet [Paper]
    • Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, Andrew Rabinovich, CVPR, 2015.
  • VGG-Net [Web] [Paper]
  • Karen Simonyan and Andrew Zisserman, Very Deep Convolutional Networks for Large-Scale Visual Recognition, ICLR, 2015.
  • AlexNet [Paper]
    • Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton, ImageNet Classification with Deep Convolutional Neural Networks, NIPS, 2012.

Object Detection

object_detection(from Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun, Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks, arXiv:1506.01497.)

  • OverFeat, NYU [Paper]
  • OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks, ICLR, 2014.
  • R-CNN, UC Berkeley [Paper-CVPR14] [Paper-arXiv14]
  • Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik, Rich feature hierarchies for accurate object detection and semantic segmentation, CVPR, 2014.
  • SPP, Microsoft Research [Paper]
  • Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun, Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition, ECCV, 2014.
  • Fast R-CNN, Microsoft Research [Paper]
  • Ross Girshick, Fast R-CNN, arXiv:1504.08083.
  • Faster R-CNN, Microsoft Research [Paper]
  • Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun, Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks, arXiv:1506.01497.
  • R-CNN minus R, Oxford [Paper]
  • Karel Lenc, Andrea Vedaldi, R-CNN minus R, arXiv:1506.06981.
  • End-to-end people detection in crowded scenes [Paper]
  • Russell Stewart, Mykhaylo Andriluka, End-to-end people detection in crowded scenes, arXiv:1506.04878.
  • You Only Look Once: Unified, Real-Time Object Detection [Paper]
  • Joseph Redmon, Santosh Divvala, Ross Girshick, Ali Farhadi, You Only Look Once: Unified, Real-Time Object Detection, arXiv:1506.02640
  • Inside-Outside Net [Paper]
  • Sean Bell, C. Lawrence Zitnick, Kavita Bala, Ross Girshick, Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks
  • Deep Residual Network (Current State-of-the-Art) [Paper]
  • Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun, Deep Residual Learning for Image Recognition
  • Weakly Supervised Object Localization with Multi-fold Multiple Instance Learning [Paper]

Video Classification

  • Nicolas Ballas, Li Yao, Pal Chris, Aaron Courville, “Delving Deeper into Convolutional Networks for Learning Video Representations”, ICLR 2016. [Paper]
  • Michael Mathieu, camille couprie, Yann Lecun, “Deep Multi Scale Video Prediction Beyond Mean Square Error”, ICLR 2016. [Paper]

Object Tracking

  • Seunghoon Hong, Tackgeun You, Suha Kwak, Bohyung Han, Online Tracking by Learning Discriminative Saliency Map with Convolutional Neural Network, arXiv:1502.06796. [Paper]
  • Hanxi Li, Yi Li and Fatih Porikli, DeepTrack: Learning Discriminative Feature Representations by Convolutional Neural Networks for Visual Tracking, BMVC, 2014. [Paper]
  • N Wang, DY Yeung, Learning a Deep Compact Image Representation for Visual Tracking, NIPS, 2013. [Paper]
  • Chao Ma, Jia-Bin Huang, Xiaokang Yang and Ming-Hsuan Yang, Hierarchical Convolutional Features for Visual Tracking, ICCV 2015 [GitHub]
  • Lijun Wang, Wanli Ouyang, Xiaogang Wang, and Huchuan Lu, Visual Tracking with fully Convolutional Networks, ICCV 2015 [GitHub] [Paper]
    • Hyeonseob Namand Bohyung Han, Learning Multi-Domain Convolutional Neural Networks for Visual Tracking, [Paper] [Code] [Project Page]

Low-Level Vision

Super-Resolution

  • Super-Resolution (SRCNN) [Web] [Paper-ECCV14] [Paper-arXiv15]
    • Chao Dong, Chen Change Loy, Kaiming He, Xiaoou Tang, Learning a Deep Convolutional Network for Image Super-Resolution, ECCV, 2014.
    • Chao Dong, Chen Change Loy, Kaiming He, Xiaoou Tang, Image Super-Resolution Using Deep Convolutional Networks, arXiv:1501.00092.
  • Very Deep Super-Resolution
  • Jiwon Kim, Jung Kwon Lee, Kyoung Mu Lee, Accurate Image Super-Resolution Using Very Deep Convolutional Networks, arXiv:1511.04587, 2015. [Paper]
  • Deeply-Recursive Convolutional Network
  • Jiwon Kim, Jung Kwon Lee, Kyoung Mu Lee, Deeply-Recursive Convolutional Network for Image Super-Resolution, arXiv:1511.04491, 2015. [Paper]
  • Casade-Sparse-Coding-Network
  • Zhaowen Wang, Ding Liu, Wei Han, Jianchao Yang and Thomas S. Huang, Deep Networks for Image Super-Resolution with Sparse Prior. ICCV, 2015. [Paper] [Code]
  • Perceptual Losses for Super-Resolution
  • Justin Johnson, Alexandre Alahi, Li Fei-Fei, Perceptual Losses for Real-Time Style Transfer and Super-Resolution, arXiv:1603.08155, 2016. [Paper] [Supplementary]
  • Others
    • Osendorfer, Christian, Hubert Soyer, and Patrick van der Smagt, Image Super-Resolution with Fast Approximate Convolutional Sparse Coding, ICONIP, 2014. [Paper ICONIP-2014]

Other Applications

  • Optical Flow (FlowNet) [Paper]
  • Philipp Fischer, Alexey Dosovitskiy, Eddy Ilg, Philip Häusser, Caner Hazırbaş, Vladimir Golkov, Patrick van der Smagt, Daniel Cremers, Thomas Brox, FlowNet: Learning Optical Flow with Convolutional Networks, arXiv:1504.06852.
  • Compression Artifacts Reduction [Paper-arXiv15]
    • Chao Dong, Yubin Deng, Chen Change Loy, Xiaoou Tang, Compression Artifacts Reduction by a Deep Convolutional Network, arXiv:1504.06993.
  • Blur Removal
  • Christian J. Schuler, Michael Hirsch, Stefan Harmeling, Bernhard Schölkopf, Learning to Deblur, arXiv:1406.7444[Paper]
  • Jian Sun, Wenfei Cao, Zongben Xu, Jean Ponce, Learning a Convolutional Neural Network for Non-uniform Motion Blur Removal, CVPR, 2015 [Paper]
  • Image Deconvolution [Web] [Paper]
  • Li Xu, Jimmy SJ. Ren, Ce Liu, Jiaya Jia, Deep Convolutional Neural Network for Image Deconvolution, NIPS, 2014.
  • Deep Edge-Aware Filter [Paper]
  • Li Xu, Jimmy SJ. Ren, Qiong Yan, Renjie Liao, Jiaya Jia, Deep Edge-Aware Filters, ICML, 2015.
  • Computing the Stereo Matching Cost with a Convolutional Neural Network [Paper]
  • Jure Žbontar, Yann LeCun, Computing the Stereo Matching Cost with a Convolutional Neural Network, CVPR, 2015.

Edge Detection

edge_detection(from Gedas Bertasius, Jianbo Shi, Lorenzo Torresani, DeepEdge: A Multi-Scale Bifurcated Deep Network for Top-Down Contour Detection, CVPR, 2015.)

  • Holistically-Nested Edge Detection [Paper] [Code]
  • Saining Xie, Zhuowen Tu, Holistically-Nested Edge Detection, arXiv:1504.06375.
  • DeepEdge [Paper]
  • Gedas Bertasius, Jianbo Shi, Lorenzo Torresani, DeepEdge: A Multi-Scale Bifurcated Deep Network for Top-Down Contour Detection, CVPR, 2015.
  • DeepContour [Paper]
  • Wei Shen, Xinggang Wang, Yan Wang, Xiang Bai, Zhijiang Zhang, DeepContour: A Deep Convolutional Feature Learned by Positive-Sharing Loss for Contour Detection, CVPR, 2015.

Semantic Segmentation

semantic_segmantation(from Jifeng Dai, Kaiming He, Jian Sun, BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation, arXiv:1503.01640.)

  • PASCAL VOC2012 Challenge Leaderboard (02 Dec. 2015) VOC2012_top_rankings (from PASCAL VOC2012 leaderboards)
  • Adelaide
  • Guosheng Lin, Chunhua Shen, Ian Reid, Anton van dan Hengel, Efficient piecewise training of deep structured models for semantic segmentation, arXiv:1504.01013. [Paper] (1st ranked in VOC2012)
  • Guosheng Lin, Chunhua Shen, Ian Reid, Anton van den Hengel, Deeply Learning the Messages in Message Passing Inference, arXiv:1508.02108. [Paper] (4th ranked in VOC2012)
  • Deep Parsing Network (DPN)
  • Ziwei Liu, Xiaoxiao Li, Ping Luo, Chen Change Loy, Xiaoou Tang, Semantic Image Segmentation via Deep Parsing Network, arXiv:1509.02634 / ICCV 2015 [Paper] (2nd ranked in VOC 2012)
  • CentraleSuperBoundaries, INRIA [Paper]
  • Iasonas Kokkinos, Surpassing Humans in Boundary Detection using Deep Learning, arXiv:1411.07386 (4th ranked in VOC 2012)
  • BoxSup [Paper]
  • Jifeng Dai, Kaiming He, Jian Sun, BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation, arXiv:1503.01640. (6th ranked in VOC2012)
  • POSTECH
  • Hyeonwoo Noh, Seunghoon Hong, Bohyung Han, Learning Deconvolution Network for Semantic Segmentation, arXiv:1505.04366. [Paper] (7th ranked in VOC2012)
  • Seunghoon Hong, Hyeonwoo Noh, Bohyung Han, Decoupled Deep Neural Network for Semi-supervised Semantic Segmentation, arXiv:1506.04924. [Paper]
  • Conditional Random Fields as Recurrent Neural Networks [Paper]
  • Shuai Zheng, Sadeep Jayasumana, Bernardino Romera-Paredes, Vibhav Vineet, Zhizhong Su, Dalong Du, Chang Huang, Philip H. S. Torr, Conditional Random Fields as Recurrent Neural Networks, arXiv:1502.03240. (8th ranked in VOC2012)
  • DeepLab
  • Liang-Chieh Chen, George Papandreou, Kevin Murphy, Alan L. Yuille, Weakly-and semi-supervised learning of a DCNN for semantic image segmentation, arXiv:1502.02734. [Paper] (9th ranked in VOC2012)
  • Zoom-out [Paper]
  • Mohammadreza Mostajabi, Payman Yadollahpour, Gregory Shakhnarovich, Feedforward Semantic Segmentation With Zoom-Out Features, CVPR, 2015
  • Joint Calibration [Paper]
  • Holger Caesar, Jasper Uijlings, Vittorio Ferrari, Joint Calibration for Semantic Segmentation, arXiv:1507.01581.
  • Fully Convolutional Networks for Semantic Segmentation [Paper-CVPR15] [Paper-arXiv15]
  • Jonathan Long, Evan Shelhamer, Trevor Darrell, Fully Convolutional Networks for Semantic Segmentation, CVPR, 2015.
  • Hypercolumn [Paper]
  • Bharath Hariharan, Pablo Arbelaez, Ross Girshick, Jitendra Malik, Hypercolumns for Object Segmentation and Fine-Grained Localization, CVPR, 2015.
  • Deep Hierarchical Parsing
  • Abhishek Sharma, Oncel Tuzel, David W. Jacobs, Deep Hierarchical Parsing for Semantic Segmentation, CVPR, 2015.[Paper]
  • Learning Hierarchical Features for Scene Labeling [Paper-ICML12] [Paper-PAMI13]
  • Clement Farabet, Camille Couprie, Laurent Najman, Yann LeCun, Scene Parsing with Multiscale Feature Learning, Purity Trees, and Optimal Covers, ICML, 2012.
  • Clement Farabet, Camille Couprie, Laurent Najman, Yann LeCun, Learning Hierarchical Features for Scene Labeling, PAMI, 2013.
  • University of Cambridge [Web]
  • Vijay Badrinarayanan, Alex Kendall and Roberto Cipolla “SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation.” arXiv preprint arXiv:1511.00561, 2015. [Paper]
  • Alex Kendall, Vijay Badrinarayanan and Roberto Cipolla “Bayesian SegNet: Model Uncertainty in Deep Convolutional Encoder-Decoder Architectures for Scene Understanding.” arXiv preprint arXiv:1511.02680, 2015. [Paper]
  • POSTECH
    • Seunghoon Hong,Junhyuk Oh, Bohyung Han, and Honglak Lee, Learning Transferrable Knowledge for Semantic Segmentation with Deep Convolutional Neural Network, arXiv:1512.07928 [Paper] [Project Page]
  • Princeton
  • Fisher Yu, Vladlen Koltun, “Multi-Scale Context Aggregation by Dilated Convolutions”, ICLR 2016, [Paper]
  • Univ. of Washington, Allen AI
  • Hamid Izadinia, Fereshteh Sadeghi, Santosh Kumar Divvala, Yejin Choi, Ali Farhadi, “Segment-Phrase Table for Semantic Segmentation, Visual Entailment and Paraphrasing”, ICCV, 2015, [Paper]
  • INRIA
  • Iasonas Kokkinos, “Pusing the Boundaries of Boundary Detection Using deep Learning”, ICLR 2016, [Paper]
  • UCSB
  • Niloufar Pourian, S. Karthikeyan, and B.S. Manjunath, “Weakly supervised graph based semantic segmentation by learning communities of image-parts”, ICCV, 2015, [Paper]

Visual Attention and Saliency

saliency(from Nian Liu, Junwei Han, Dingwen Zhang, Shifeng Wen, Tianming Liu, Predicting Eye Fixations using Convolutional Neural Networks, CVPR, 2015.)

  • Mr-CNN [Paper]
  • Nian Liu, Junwei Han, Dingwen Zhang, Shifeng Wen, Tianming Liu, Predicting Eye Fixations using Convolutional Neural Networks, CVPR, 2015.
  • Learning a Sequential Search for Landmarks [Paper]
  • Saurabh Singh, Derek Hoiem, David Forsyth, Learning a Sequential Search for Landmarks, CVPR, 2015.
  • Multiple Object Recognition with Visual Attention [Paper]
  • Jimmy Lei Ba, Volodymyr Mnih, Koray Kavukcuoglu, Multiple Object Recognition with Visual Attention, ICLR, 2015.
  • Recurrent Models of Visual Attention [Paper]
  • Volodymyr Mnih, Nicolas Heess, Alex Graves, Koray Kavukcuoglu, Recurrent Models of Visual Attention, NIPS, 2014.

Object Recognition

  • Weakly-supervised learning with convolutional neural networks [Paper]
  • Maxime Oquab, Leon Bottou, Ivan Laptev, Josef Sivic, Is object localization for free? – Weakly-supervised learning with convolutional neural networks, CVPR, 2015.
  • FV-CNN [Paper]
  • Mircea Cimpoi, Subhransu Maji, Andrea Vedaldi, Deep Filter Banks for Texture Recognition and Segmentation, CVPR, 2015.

Understanding CNN

understanding(from Aravindh Mahendran, Andrea Vedaldi, Understanding Deep Image Representations by Inverting Them, CVPR, 2015.)

  • Karel Lenc, Andrea Vedaldi, Understanding image representations by measuring their equivariance and equivalence, CVPR, 2015. [Paper]
  • Anh Nguyen, Jason Yosinski, Jeff Clune, Deep Neural Networks are Easily Fooled:High Confidence Predictions for Unrecognizable Images, CVPR, 2015. [Paper]
  • Aravindh Mahendran, Andrea Vedaldi, Understanding Deep Image Representations by Inverting Them, CVPR, 2015.[Paper]
  • Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva, Antonio Torralba, Object Detectors Emerge in Deep Scene CNNs, ICLR, 2015. [arXiv Paper]
  • Alexey Dosovitskiy, Thomas Brox, Inverting Visual Representations with Convolutional Networks, arXiv, 2015. [Paper]
  • Matthrew Zeiler, Rob Fergus, Visualizing and Understanding Convolutional Networks, ECCV, 2014. [Paper]

Image and Language

Image Captioning

image_captioning(from Andrej Karpathy, Li Fei-Fei, Deep Visual-Semantic Alignments for Generating Image Description, CVPR, 2015.)

  • UCLA / Baidu [Paper]
    • Junhua Mao, Wei Xu, Yi Yang, Jiang Wang, Alan L. Yuille, Explain Images with Multimodal Recurrent Neural Networks, arXiv:1410.1090.
  • Toronto [Paper]
    • Ryan Kiros, Ruslan Salakhutdinov, Richard S. Zemel, Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models, arXiv:1411.2539.
  • Berkeley [Paper]
    • Jeff Donahue, Lisa Anne Hendricks, Sergio Guadarrama, Marcus Rohrbach, Subhashini Venugopalan, Kate Saenko, Trevor Darrell, Long-term Recurrent Convolutional Networks for Visual Recognition and Description, arXiv:1411.4389.
  • Google [Paper]
    • Oriol Vinyals, Alexander Toshev, Samy Bengio, Dumitru Erhan, Show and Tell: A Neural Image Caption Generator, arXiv:1411.4555.
  • Stanford [Web] [Paper]
    • Andrej Karpathy, Li Fei-Fei, Deep Visual-Semantic Alignments for Generating Image Description, CVPR, 2015.
  • UML / UT [Paper]
    • Subhashini Venugopalan, Huijuan Xu, Jeff Donahue, Marcus Rohrbach, Raymond Mooney, Kate Saenko, Translating Videos to Natural Language Using Deep Recurrent Neural Networks, NAACL-HLT, 2015.
  • CMU / Microsoft [Paper-arXiv] [Paper-CVPR]
    • Xinlei Chen, C. Lawrence Zitnick, Learning a Recurrent Visual Representation for Image Caption Generation, arXiv:1411.5654.
    • Xinlei Chen, C. Lawrence Zitnick, Mind’s Eye: A Recurrent Visual Representation for Image Caption Generation, CVPR 2015
  • Microsoft [Paper]
    • Hao Fang, Saurabh Gupta, Forrest Iandola, Rupesh Srivastava, Li Deng, Piotr Dollár, Jianfeng Gao, Xiaodong He, Margaret Mitchell, John C. Platt, C. Lawrence Zitnick, Geoffrey Zweig, From Captions to Visual Concepts and Back, CVPR, 2015.
  • Univ. Montreal / Univ. Toronto [Web] [Paper]
    • Kelvin Xu, Jimmy Lei Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhutdinov, Richard S. Zemel, Yoshua Bengio, Show, Attend, and Tell: Neural Image Caption Generation with Visual Attention, arXiv:1502.03044 / ICML 2015
  • Idiap / EPFL / Facebook [Paper]
    • Remi Lebret, Pedro O. Pinheiro, Ronan Collobert, Phrase-based Image Captioning, arXiv:1502.03671 / ICML 2015
  • UCLA / Baidu [Paper]
    • Junhua Mao, Wei Xu, Yi Yang, Jiang Wang, Zhiheng Huang, Alan L. Yuille, Learning like a Child: Fast Novel Visual Concept Learning from Sentence Descriptions of Images, arXiv:1504.06692
  • MS + Berkeley
    • Jacob Devlin, Saurabh Gupta, Ross Girshick, Margaret Mitchell, C. Lawrence Zitnick, Exploring Nearest Neighbor Approaches for Image Captioning, arXiv:1505.04467 [Paper]
    • Jacob Devlin, Hao Cheng, Hao Fang, Saurabh Gupta, Li Deng, Xiaodong He, Geoffrey Zweig, Margaret Mitchell, Language Models for Image Captioning: The Quirks and What Works, arXiv:1505.01809 [Paper]
  • Adelaide [Paper]
    • Qi Wu, Chunhua Shen, Anton van den Hengel, Lingqiao Liu, Anthony Dick, Image Captioning with an Intermediate Attributes Layer, arXiv:1506.01144
  • Tilburg [Paper]
    • Grzegorz Chrupala, Akos Kadar, Afra Alishahi, Learning language through pictures, arXiv:1506.03694
  • Univ. Montreal [Paper]
    • Kyunghyun Cho, Aaron Courville, Yoshua Bengio, Describing Multimedia Content using Attention-based Encoder-Decoder Networks, arXiv:1507.01053
  • Cornell [Paper]
    • Jack Hessel, Nicolas Savva, Michael J. Wilber, Image Representations and New Domains in Neural Image Captioning, arXiv:1508.02091
  • MS + City Univ. of HongKong [Paper]
    • Ting Yao, Tao Mei, and Chong-Wah Ngo, “Learning Query and Image Similarities with Ranking Canonical Correlation Analysis”, ICCV, 2015

Video Captioning

  • Berkeley [Web] [Paper]
    • Jeff Donahue, Lisa Anne Hendricks, Sergio Guadarrama, Marcus Rohrbach, Subhashini Venugopalan, Kate Saenko, Trevor Darrell, Long-term Recurrent Convolutional Networks for Visual Recognition and Description, CVPR, 2015.
  • UT / UML / Berkeley [Paper]
    • Subhashini Venugopalan, Huijuan Xu, Jeff Donahue, Marcus Rohrbach, Raymond Mooney, Kate Saenko, Translating Videos to Natural Language Using Deep Recurrent Neural Networks, arXiv:1412.4729.
  • Microsoft [Paper]
    • Yingwei Pan, Tao Mei, Ting Yao, Houqiang Li, Yong Rui, Joint Modeling Embedding and Translation to Bridge Video and Language, arXiv:1505.01861.
  • UT / Berkeley / UML [Paper]
    • Subhashini Venugopalan, Marcus Rohrbach, Jeff Donahue, Raymond Mooney, Trevor Darrell, Kate Saenko, Sequence to Sequence–Video to Text, arXiv:1505.00487.
  • Univ. Montreal / Univ. Sherbrooke [Paper]
    • Li Yao, Atousa Torabi, Kyunghyun Cho, Nicolas Ballas, Christopher Pal, Hugo Larochelle, Aaron Courville, Describing Videos by Exploiting Temporal Structure, arXiv:1502.08029
  • MPI / Berkeley [Paper]
    • Anna Rohrbach, Marcus Rohrbach, Bernt Schiele, The Long-Short Story of Movie Description, arXiv:1506.01698
  • Univ. Toronto / MIT [Paper]
    • Yukun Zhu, Ryan Kiros, Richard Zemel, Ruslan Salakhutdinov, Raquel Urtasun, Antonio Torralba, Sanja Fidler, Aligning Books and Movies: Towards Story-like Visual Explanations by Watching Movies and Reading Books, arXiv:1506.06724
  • Univ. Montreal [Paper]
    • Kyunghyun Cho, Aaron Courville, Yoshua Bengio, Describing Multimedia Content using Attention-based Encoder-Decoder Networks, arXiv:1507.01053

Question Answering

question_answering(from Stanislaw Antol, Aishwarya Agrawal, Jiasen Lu, Margaret Mitchell, Dhruv Batra, C. Lawrence Zitnick, Devi Parikh, VQA: Visual Question Answering, CVPR, 2015 SUNw:Scene Understanding workshop)

  • Virginia Tech / MSR [Web] [Paper]
    • Stanislaw Antol, Aishwarya Agrawal, Jiasen Lu, Margaret Mitchell, Dhruv Batra, C. Lawrence Zitnick, Devi Parikh, VQA: Visual Question Answering, CVPR, 2015 SUNw:Scene Understanding workshop.
  • MPI / Berkeley [Web] [Paper]
    • Mateusz Malinowski, Marcus Rohrbach, Mario Fritz, Ask Your Neurons: A Neural-based Approach to Answering Questions about Images, arXiv:1505.01121.
  • Toronto [Paper] [Dataset]
    • Mengye Ren, Ryan Kiros, Richard Zemel, Image Question Answering: A Visual Semantic Embedding Model and a New Dataset, arXiv:1505.02074 / ICML 2015 deep learning workshop.
  • Baidu / UCLA [Paper] [Dataset]
    • Hauyuan Gao, Junhua Mao, Jie Zhou, Zhiheng Huang, Lei Wang, Wei Xu, Are You Talking to a Machine? Dataset and Methods for Multilingual Image Question Answering, arXiv:1505.05612.
  • POSTECH [Paper] [Project Page]
    • Hyeonwoo Noh, Paul Hongsuck Seo, and Bohyung Han, Image Question Answering using Convolutional Neural Network with Dynamic Parameter Prediction, arXiv:1511.05765
  • CMU / Microsoft Research [Paper]
    • Yang, Z., He, X., Gao, J., Deng, L., & Smola, A. (2015). Stacked Attention Networks for Image Question Answering. arXiv:1511.02274.
  • MetaMind [Paper]
    • Xiong, Caiming, Stephen Merity, and Richard Socher. “Dynamic Memory Networks for Visual and Textual Question Answering.” arXiv:1603.01417 (2016).

Other Topics

  • Visual Analogy [Paper]
    • Scott Reed, Yi Zhang, Yuting Zhang, Honglak Lee, Deep Visual Analogy Making, NIPS, 2015
  • Surface Normal Estimation [Paper]
  • Xiaolong Wang, David F. Fouhey, Abhinav Gupta, Designing Deep Networks for Surface Normal Estimation, CVPR, 2015.
  • Action Detection [Paper]
  • Georgia Gkioxari, Jitendra Malik, Finding Action Tubes, CVPR, 2015.
  • Crowd Counting [Paper]
  • Cong Zhang, Hongsheng Li, Xiaogang Wang, Xiaokang Yang, Cross-scene Crowd Counting via Deep Convolutional Neural Networks, CVPR, 2015.
  • 3D Shape Retrieval [Paper]
  • Fang Wang, Le Kang, Yi Li, Sketch-based 3D Shape Retrieval using Convolutional Neural Networks, CVPR, 2015.
  • Generate image [Paper]
  • Alexey Dosovitskiy, Jost Tobias Springenberg, Thomas Brox, Learning to Generate Chairs with Convolutional Neural Networks, CVPR, 2015.
  • Weakly-supervised Classification
  • Samaneh Azadi, Jiashi Feng, Stefanie Jegelka, Trevor Darrell, “Auxiliary Image Regularization for Deep CNNs with Noisy Labels”, ICLR 2016, [Paper]
  • Weakly-supervised Object Detection
  • Generate Image with Adversarial Network
  • Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, Yoshua Bengio, Generative Adversarial Networks, NIPS, 2014. [Paper]
  • Emily Denton, Soumith Chintala, Arthur Szlam, Rob Fergus, Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks, NIPS, 2015. [Paper]
  • Lucas Theis, Aäron van den Oord, Matthias Bethge, “A note on the evaluation of generative models”, ICLR 2016. [Paper]
  • Zhenwen Dai, Andreas Damianou, Javier Gonzalez, Neil Lawrence, “Variationally Auto-Encoded Deep Gaussian Processes”, ICLR 2016. [Paper]
  • Elman Mansimov, Emilio Parisotto, Jimmy Ba, Ruslan Salakhutdinov, “Generating Images from Captions with Attention”, ICLR 2016, [Paper]
  • Jost Tobias Springenberg, “Unsupervised and Semi-supervised Learning with Categorical Generative Adversarial Networks”, ICLR 2016, [Paper]
  • Harrison Edwards, Amos Storkey, “Censoring Representations with an Adversary”, ICLR 2016, [Paper]
  • Takeru Miyato, Shin-ichi Maeda, Masanori Koyama, Ken Nakae, Shin Ishii, “Distributional Smoothing with Virtual Adversarial Training”, ICLR 2016, [Paper]
  • Artistic Style [Paper] [Code]
  • Leon A. Gatys, Alexander S. Ecker, Matthias Bethge, A Neural Algorithm of Artistic Style.
  • Human Gaze Estimation
  • Xucong Zhang, Yusuke Sugano, Mario Fritz, Andreas Bulling, Appearance-Based Gaze Estimation in the Wild, CVPR, 2015. [Paper] [Website]
  • Face Recognition
  • Yaniv Taigman, Ming Yang, Marc’Aurelio Ranzato, Lior Wolf, DeepFace: Closing the Gap to Human-Level Performance in Face Verification, CVPR, 2014. [Paper]
  • Yi Sun, Ding Liang, Xiaogang Wang, Xiaoou Tang, DeepID3: Face Recognition with Very Deep Neural Networks, 2015.[Paper]
  • Florian Schroff, Dmitry Kalenichenko, James Philbin, FaceNet: A Unified Embedding for Face Recognition and Clustering, CVPR, 2015. [Paper]

Courses

Books

Videos

Software

Framework

  • Tensorflow: An open source software library for numerical computation using data flow graph by Google [Web]
  • Torch7: Deep learning library in Lua, used by Facebook and Google Deepmind [Web]
  • Caffe: Deep learning framework by the BVLC [Web]
  • Theano: Mathematical library in Python, maintained by LISA lab [Web]
  • MatConvNet: CNNs for MATLAB [Web]

Applications

  • Adversarial Training
  • Code and hyperparameters for the paper “Generative Adversarial Networks” [Web]
  • Understanding and Visualizing
  • Source code for “Understanding Deep Image Representations by Inverting Them,” CVPR, 2015. [Web]
  • Semantic Segmentation
  • Source code for the paper “Rich feature hierarchies for accurate object detection and semantic segmentation,” CVPR, 2014. [Web]
  • Source code for the paper “Fully Convolutional Networks for Semantic Segmentation,” CVPR, 2015. [Web]
  • Super-Resolution
  • Image Super-Resolution for Anime-Style-Art [Web]
  • Edge Detection
  • Source code for the paper “DeepContour: A Deep Convolutional Feature Learned by Positive-Sharing Loss for Contour Detection,” CVPR, 2015. [Web]
  • Source code for the paper “Holistically-Nested Edge Detection”, ICCV 2015. [Web]

Tutorials

Blogs

Awesome Computer Vision

[ Awesome Computer Vision ]

Awesome

A curated list of awesome computer vision resources, inspired by awesome-php.

For a list people in computer vision listed with their academic genealogy, please visit here

Contributing

Please feel free to send me pull requests or email (jbhuang1@illinois.edu) to add links.

Table of Contents

Books

Computer Vision

OpenCV Programming

Machine Learning

Fundamentals

Courses

Computer Vision

Computational Photography

Machine Learning and Statistical Learning

Optimization

Papers

Conference papers on the web

Survey Papers

Tutorials and talks

Computer Vision

Recent Conference Talks

3D Computer Vision

Internet Vision

Computational Photography

Learning and Vision

Object Recognition

Graphical Models

Machine Learning

Optimization

Deep Learning

Software

External Resource Links

General Purpose Computer Vision Library

Multiple-view Computer Vision

Feature Detection and Extraction

  • VLFeat
  • SIFT
    • David G. Lowe, “Distinctive image features from scale-invariant keypoints,” International Journal of Computer Vision, 60, 2 (2004), pp. 91-110.
  • SIFT++
  • BRISK
    • Stefan Leutenegger, Margarita Chli and Roland Siegwart, “BRISK: Binary Robust Invariant Scalable Keypoints”, ICCV 2011
  • SURF
    • Herbert Bay, Andreas Ess, Tinne Tuytelaars, Luc Van Gool, “SURF: Speeded Up Robust Features”, Computer Vision and Image Understanding (CVIU), Vol. 110, No. 3, pp. 346–359, 2008
  • FREAK
    • A. Alahi, R. Ortiz, and P. Vandergheynst, “FREAK: Fast Retina Keypoint”, CVPR 2012
  • AKAZE
    • Pablo F. Alcantarilla, Adrien Bartoli and Andrew J. Davison, “KAZE Features”, ECCV 2012
  • Local Binary Patterns

High Dynamic Range Imaging

Semantic Segmentation

Low-level Vision

Stereo Vision
Optical Flow
Image Denoising

BM3D, KSVD,

Super-resolution
  • Multi-frame image super-resolution
    • Pickup, L. C. Machine Learning in Multi-frame Image Super-resolution, PhD thesis 2008
  • Markov Random Fields for Super-Resolution
    • W. T Freeman and C. Liu. Markov Random Fields for Super-resolution and Texture Synthesis. In A. Blake, P. Kohli, and C. Rother, eds., Advances in Markov Random Fields for Vision and Image Processing, Chapter 10. MIT Press, 2011
  • Sparse regression and natural image prior
    • K. I. Kim and Y. Kwon, “Single-image super-resolution using sparse regression and natural image prior”, IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 32, no. 6, pp. 1127-1133, 2010.
  • Single-Image Super Resolution via a Statistical Model
    • T. Peleg and M. Elad, A Statistical Prediction Model Based on Sparse Representations for Single Image Super-Resolution, IEEE Transactions on Image Processing, Vol. 23, No. 6, Pages 2569-2582, June 2014
  • Sparse Coding for Super-Resolution
    • R. Zeyde, M. Elad, and M. Protter On Single Image Scale-Up using Sparse-Representations, Curves & Surfaces, Avignon-France, June 24-30, 2010 (appears also in Lecture-Notes-on-Computer-Science – LNCS).
  • Patch-wise Sparse Recovery
    • Jianchao Yang, John Wright, Thomas Huang, and Yi Ma. Image super-resolution via sparse representation. IEEE Transactions on Image Processing (TIP), vol. 19, issue 11, 2010.
  • Neighbor embedding
    • H. Chang, D.Y. Yeung, Y. Xiong. Super-resolution through neighbor embedding. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), vol.1, pp.275-282, Washington, DC, USA, 27 June – 2 July 2004.
  • Deformable Patches
    • Yu Zhu, Yanning Zhang and Alan Yuille, Single Image Super-resolution using Deformable Patches, CVPR 2014
  • SRCNN
    • Chao Dong, Chen Change Loy, Kaiming He, Xiaoou Tang, Learning a Deep Convolutional Network for Image Super-Resolution, in ECCV 2014
  • A+: Adjusted Anchored Neighborhood Regression
    • R. Timofte, V. De Smet, and L. Van Gool. A+: Adjusted Anchored Neighborhood Regression for Fast Super-Resolution, ACCV 2014
  • Transformed Self-Exemplars
    • Jia-Bin Huang, Abhishek Singh, and Narendra Ahuja, Single Image Super-Resolution using Transformed Self-Exemplars, IEEE Conference on Computer Vision and Pattern Recognition, 2015
Image Deblurring

Non-blind deconvolution

Blind deconvolution

Non-uniform Deblurring

Image Completion
Image Retargeting
Alpha Matting
Image Pyramid
Edge-preserving image processing

Intrinsic Images

Contour Detection and Image Segmentation

Interactive Image Segmentation

Video Segmentation

Camera calibration

Simultaneous localization and mapping

SLAM community:
Tracking/Odometry:
Graph Optimization:
Loop Closure:
Localization & Mapping:

Single-view Spatial Understanding

Object Detection

Nearest Neighbor Search

General purpose nearest neighbor search
Nearest Neighbor Field Estimation

Visual Tracking

Saliency Detection

Attributes

Action Reconition

Egocentric cameras

Human-in-the-loop systems

Image Captioning

Optimization

  • Ceres Solver – Nonlinear least-square problem and unconstrained optimization solver
  • NLopt– Nonlinear least-square problem and unconstrained optimization solver
  • OpenGM – Factor graph based discrete optimization and inference solver
  • GTSAM – Factor graph based lease-square optimization solver

Deep Learning

Machine Learning

Datasets

External Dataset Link Collection

Low-level Vision

Stereo Vision
Optical Flow
Image Super-resolutions

Intrinsic Images

Material Recognition

Multi-view Reconsturction

Saliency Detection

Visual Tracking

Visual Surveillance

Saliency Detection

Change detection

Visual Recognition

Image Classification
Scene Recognition
Object Detection
Semantic labeling
Multi-view Object Detection
Fine-grained Visual Recognition
Pedestrian Detection

Action Recognition

Image-based
Video-based
Image Deblurring

Image Captioning

Scene Understanding

# SUN RGB-D – A RGB-D Scene Understanding Benchmark Suite # NYU depth v2 – Indoor Segmentation and Support Inference from RGBD Images

Resources for students

Resource link collection

Writing

Presentation

Research

Time Management

Blogs

Links

Songs

Licenses

License

CC0

To the extent possible under law, Jia-Bin Huang has waived all copyright and related or neighboring rights to this work.