This is a demo of an image similarity search application. Below are two examples:
-
Similarity search using the DD server's built-in similarity search
-
Using DD for deep learning inference only and building the similarity search component with Python
The two techniques above are equivalent, the first one being more practical since it embeds everything within the C++ server.
It does index images and retrieve similar images via similarity search.
To run the code on your own collection of images:
-
build DeepDetect with the similarity search component, i.e. using the
-DUSE_SIMSEARCH=ONflag at build time. -
start a DeepDetect server:
./dede -
create a model repository with the pre-trained image classification network of your choice. Here we are using a pre-trained GoogleNet, but you can also use a built-in ResNet or other provided models:
mkdir model cd model wget http://www.deepdetect.com/models/ggnet/bvlc_googlenet.caffemodelmake sure that the
modelrepository is in the same repository as the scriptimgsearch.py -
index your collection of images:
python imgsearch_dd.py --index /path/to/your/images --index-batch-size 64Here
index-batch-sizecontrols the number of images that are processed at once. The index file is thenindex.annin the repository.names.binindexes the filenames.Index and name files are erased upon every new indexing call
-
search for similar images:
python imgsearch_dd.py --search /path/your/image.png --search-size 10Here
search-sizecontrols the number of approximate neighbors.
This is a small Python demo of an image similarity search application.
It does two things:
- use a DeepDetect image classification service in order to generate a numerical or binary code for every image
- indexes images with annoy, an approximate nearest neighbors C++/Python library
- search by images, even for new images, not previously indexed, and return the closest images
To run the code on your own collection of images:
-
install Annoy:
pip install annoyor go look at https://github.com/spotify/annoy
-
create a model repository with the pre-trained image classification network of your choice. Here we are using a pre-trained GoogleNet, but you can also use a built-in ResNet or other provided models:
mkdir model cd model wget http://www.deepdetect.com/models/ggnet/bvlc_googlenet.caffemodelmake sure that the
modelrepository is in the same repository as the scriptimgsearch.py -
start a DeepDetect server:
./dede -
index your collection of images:
python imgsearch.py --index /path/to/your/images --index-batch-size 64Here
index-batch-sizecontrols the number of images that are processed at once. The index file is thenindex.annin the repository.names.binindexes the filenames.Index and name files are erased upon every new indexing call
-
search for similar images:
python imgsearch.py --search /path/your/image.png --search-size 10Here
search-sizecontrols the number of approximate neighbors.
-
The search uses a deep convolutional net layer as a code for every image. Using top layers (e.g.
loss3/classifierwith GoogleNet) uses high level features and thus image similarity is based on high level concepts such as whether the image contains a lakeshore, a bottle, etc... Using bottom or mid-range layers (e.g.pool5/7x7_s1with GoogleNet) makes image similarity based on lower level, potentially invariant, universal features such as lightning conditions, basic shapes, etc... Experiment and see what is best for your application. -
Annoy is a nice piece of code but in experiments the index building step becomes very memory inefficient and time-consuming around a million of images. If this is an issue, get in touch, as they are other, more complicated, ways to index and perform the search and scale.
-
The code in
imgsearch.pyallows for more options such as whether to usebinarizedcodes,angularoreuclideanmetric for similar image retrieval, and control of the accuracy of the search throughntrees.