deepdetect/demo/imgsearch at master · jolibrain/deepdetect

Name	Name	Last commit message	Last commit date
parent directory ..
README.md	README.md
dd_client.py	dd_client.py
imgsearch.py	imgsearch.py
imgsearch_dd.py	imgsearch_dd.py

Image similary search demo

This is a demo of an image similarity search application. Below are two examples:

Similarity search using the DD server's built-in similarity search
Using DD for deep learning inference only and building the similarity search component with Python

The two techniques above are equivalent, the first one being more practical since it embeds everything within the C++ server.

It does index images and retrieve similar images via similarity search.

To run the code on your own collection of images:

build DeepDetect with the similarity search component, i.e. using the -DUSE_SIMSEARCH=ON flag at build time.
start a DeepDetect server:
```
./dede
```
create a model repository with the pre-trained image classification network of your choice. Here we are using a pre-trained GoogleNet, but you can also use a built-in ResNet or other provided models:
```
mkdir model
cd model
wget http://www.deepdetect.com/models/ggnet/bvlc_googlenet.caffemodel
```
make sure that the model repository is in the same repository as the script imgsearch.py
index your collection of images:
```
python imgsearch_dd.py --index /path/to/your/images --index-batch-size 64
```
Here index-batch-size controls the number of images that are processed at once. The index file is then index.ann in the repository. names.bin indexes the filenames.

Index and name files are erased upon every new indexing call
search for similar images:
```
python imgsearch_dd.py --search /path/your/image.png --search-size 10
```
Here search-size controls the number of approximate neighbors.

This is a small Python demo of an image similarity search application.

It does two things:

use a DeepDetect image classification service in order to generate a numerical or binary code for every image
indexes images with annoy, an approximate nearest neighbors C++/Python library
search by images, even for new images, not previously indexed, and return the closest images

To run the code on your own collection of images:

install Annoy:
```
pip install annoy
```
or go look at https://github.com/spotify/annoy
create a model repository with the pre-trained image classification network of your choice. Here we are using a pre-trained GoogleNet, but you can also use a built-in ResNet or other provided models:
```
mkdir model
cd model
wget http://www.deepdetect.com/models/ggnet/bvlc_googlenet.caffemodel
```
make sure that the model repository is in the same repository as the script imgsearch.py
start a DeepDetect server:
```
./dede
```
index your collection of images:
```
python imgsearch.py --index /path/to/your/images --index-batch-size 64
```
Here index-batch-size controls the number of images that are processed at once. The index file is then index.ann in the repository. names.bin indexes the filenames.

Index and name files are erased upon every new indexing call
search for similar images:
```
python imgsearch.py --search /path/your/image.png --search-size 10
```
Here search-size controls the number of approximate neighbors.

The search uses a deep convolutional net layer as a code for every image. Using top layers (e.g. loss3/classifier with GoogleNet) uses high level features and thus image similarity is based on high level concepts such as whether the image contains a lakeshore, a bottle, etc... Using bottom or mid-range layers (e.g. pool5/7x7_s1 with GoogleNet) makes image similarity based on lower level, potentially invariant, universal features such as lightning conditions, basic shapes, etc... Experiment and see what is best for your application.
Annoy is a nice piece of code but in experiments the index building step becomes very memory inefficient and time-consuming around a million of images. If this is an issue, get in touch, as they are other, more complicated, ways to index and perform the search and scale.
The code in imgsearch.py allows for more options such as whether to use binarized codes, angular or euclidean metric for similar image retrieval, and control of the accuracy of the search through ntrees.