The goal of this project is to apply the concepts and the technologies previously seen. To this end, you have to choose one of the following datasets1).
On the considered dataset, you have to either bring some insights according to an already given task or combination of tasks (e.g., clustering, pattern mining ) or define yourself the general aims and discover some knowledge from the data (produce added value from the data). To this end, you can use any data mining/machine learning method2) as well as any algorithm or software (Knime, Sci-Kit Learn (Python), Web Api (Google, Bing, Yahoo, …)).
Datasets | Possible Mining Tasks |
Collection of tweets in N.Y. | Detection of geolocated events |
Foursquare datasets (several cities) | Characterization of a city. I like Croix Rousse, where should I live in N.Y or S.F? Characterization of food supply in a city. |
Flickr data | Discovery and characterization of points of interest. |
European parliament votes | What are the subjects of votes that are consensual or polarizing? |
French presidential election candidates tweets | What are the terms (through time) that characterize the candidates? |
Esport data (e.g. LoL Picks and Bans3)) | What are the victorious or losing choices? |
Telematic data | User Re-identification |
Keystroke analytics | User identification based on typing patterns (create your dataset capturing keyboard signals) |
GID | Subject | Time |
M. Philibert and P.-E. Polet | Flickr | 13h30 |
E. Kerinec and N. Derumigny | French presidential elections | 13h50 |
X. Badin de Montjoye, H. Menet, L. Paulin and Y. Gaziello | Keystroke analysis and user re-idendification | 14h10 |
R. Cerda, N. Levy, A. Slowik and D. Sintiari | Credit card fraud detection | 14h30 |
E. Prebet and R. Coudert | League of Legends | 14h50 |
A. Martin, F. Lecuyer, T. Sterin, S.-M. Mutsotso and T. Nguyen | Twitter event detection | 15h10 |
Etienne Desbois and P. Mangold | Flickr | 15h30 |
You have – using the different concepts seen during the lectures (but not uniquely) – produce added value from data (answer the a specific question, discover knowledge, …). You can use any tools/techno/algorithms. These datasets can also be the support of the development of your proper algorithms (pattern sampling approach, interactive exploration, …).
You have to:
<note important> The report, presentation and source code must be sent by email (marc.plantevit-at-liris.cnrs.fr, cc: marc.plantevit@univ-lyon1.fr) before May, 22nd, 2016 (23h59) 4). </note>
<note important> You can work in group of maximum 5 persons.
</note>