Knowledge Discovery Group

Classification and Clustering of Large-scale Datasets of User Feedback


Users in online shopping portals utter their comments and feedback about the platform in many ways. Among others, the users' opinions are collected via surveys, feedback forms, and from email. Independent of the source of the feedback, the uttered feedback is typically provided in a quite unstructured fashion and thus ranges w.r.t. to size, quality, and category.

Goal of this thesis is to apply different techniques from machine learning and data mining to better understand a large-scale text corpus of user feedback. To this end, different features are determined that allow to classify the content and cluster them along different criteria. An evaluation of the methods will be conducted based on a manually created gold standard.

In more detail, the work should cover:

- Development of different approaches for classifying and clustering a large-scale text corpus
- Evaluation of the approaches on the data set w.r.t. to a gold standard

This work will be conducted in collaboration with Dr. Andreas Lattner with the Otto Group in Hamburg.


- Good programming skills
- Knowledge of methods in data mining/machine learning is an advantage


  • Homepage kicked off!