Knowledge Discovery Group

Quadflor text-processing and classification pipeline

Quadflor is a text-processing pipeline for multi-label classification of documents and its evaluation. Given a domain-specific thesaurus with descriptor labels, the different algorithms learn how to assign these labels to documents from a training set. The framework supports opportunities to conduct concept extraction, synonym set resolution, and spreading activation including hierarchical re-weighting. The further processing of these features is performed by a classifier. As built-in classifier options, Quadflor provides:

  • Naive Bayes (two variants)
  • Logistic Regression
  • Linear Support Vector Machine
  • K-Nearest Neighbors (two multi-label adaption variants, as well as Rocchio),
  • Stochastic Gradient Descent
  • Stacked Decision Tree Classifier
  • Learning2Rank
  • Multilayer Perceptron (MLP)

While the text-processing pipeline is designed for automatic evaluation of novel classification strategies, it can also be employed in a practical setting. Using all known documents as training data to classify new, unseen documents on-the-fly. 

For the source code and more information, please refer to the repository on GitHub:


L. Galke, F. Mai, A. Schelten, D. Brunsch und A. Scherp: Using Titles vs. Full-text as Source for Automated Semantic Document Annotation, Knowledge Capture (KCAP); Austin, TX, USA, 2017.


  • Homepage kicked off!