T DAT 902 - Project
T DAT 902 - Project
T-DAT-902
Nostradamovies
posters standardisation you say?
1.2.5
Nostradamovies
delivery method: Github
repository name: $CourseCode-$GroupName.git
language: Python or R is advised
• The totality of your source files, except all useless files (binary, temp files, obj
files,...), must be included in your delivery.
From an 80 000 row dataset containing movie posters, movie synopsis and full IMDB webiste informa-
tion, you are asked to make movie genre predictions, based on posters only.
It’s at your discretion to decide to modify the dataset genres, e.g. replacing comedy horror, action with only
comedy horror for instance. You can also add your own genres, such as blockbuster, teen movie or Cannes
Palme winner for instance.
1
Neural networks, deep learning and every possible algorithm are welcome, but do not
spend time on them.
Use bullet-proof libraries instead of reinventing the wheel, and focus on data.
2
A document synthetising visualization and statistics is also required.
It must also include the methodology and algorithms used to make your extractions/predicitions, and fea-
tures importance for both posters and synopsis (for instance, a black color for horror movies, or a large
rounded title font for comedies).
The relevancy of your vizualisation is of prime importance; display any data you consider
meaningful.
3
Last but not least, you must extract archetypal posters from your feature importance classifications.
You are expected to display the most typical poster from your database, based on the feature importance
for each movie gender.
For example, you might have this kind of features importance, for
“blockbusters”:
4
Your final algorithm will be tested on recent movies. The program should contain a function able to process
a PNG or JPEG image and make the prediction along with the features extracted.
e.g. For this independant comedy drama named Little Miss Sunshine your output could be:
∇ Terminal - + x
∼/T-DAT-902> python genre_prediction.py little_miss_sunshine.jpg
Genre predicted : Comedy Drama
Probability : 0.76
Features extracted ->
Number of faces : 5
Colorimetry : Yellow
Similar poster : xxx
...
Feature importance for the genre predicted ->
...
Rather than giving you a single prediction, algorithms will give probabilities for each
genre possible. You can display the top 3 predictions for example with the associated
probabilities