Knowledge Discovery Group

Current Projects

Contents

Current Projects

DigitalChampions_SH

Duration: 2017-2018

Project Members: NN, Falk Böschen, Tilman Beck, Ansgar Scherp

Funding: EU

Description: The DigitalChampions_SH project makes employees from companies in Schleswig-Holstein fit for the digital world. Employees are recommended individual training courses relevant to their career path. ZBW develops advanced machine learning and data mining methods for document indexing and document management.

DyeSE - Dynamic Semantic Sensor Networks

Duration: 2016-2017

Project Members: Prof. Thomas Plagemann, University of Oslo and Prof. Ansgar Scherp, Kiel University and ZBW--Leibniz Information Center for Economics

Funding: German Norwegian Study Center

Description: This project brings together researchers from the University of Oslo working on sensor networks with researchers in Kiel in the area of semantic technologies. Goal of the joined collaboration is to create a new field of DyeSE (short for "dynamic semantic sensor networks"). DyeSE is becoming highly relevant due to the widespread adoption of mobile devices and the increasing availability of sensor data on the web.

LOC-DB - Linked Open Citation Database

Duration: 2016-2018

Project Members: NN, Ansgar Scherp

Funding: DFG

Description: The LOC-DB project develops ready-to-use tools and processes based on the linked-data technology that make it possible for a single library to meaningfully contribute to an open, distributed infrastructure for the cataloguing of citations. The project aims to prove that, by widely automating cataloguing processes, it is possible to add a substantial benefit to academic search tools by regularly capturing citation relations. These data will be made available in the semantic web to make future reuse possible. Moreover, we document effort, number and quality of the data in a well-founded cost-benefit analysis. The project will use well-known methods of information extraction and adapt them to work for arbitrary layouts of reference lists in electronic and print media. The obtained raw data will be aligned and linked with existing metadata sources. Moreover, it will be shown how these data can be integrated in library catalogues. The system will be deployable to use productively by a single library, but in principle it will also be scalable for using it in a network.

Official website: https://locdb.bib.uni-mannheim.de/

MOVING - Training Towards a Society of Data-savvy Information Professionals to Enable Open Leadership Innovation

Duration: April 2016 - March 2019

Project Members: Till Blume, Ahmed Saleh, Ansgar Scherp

Funding: EU H2020 INSO-4-2015 - No 693092

Description: MOVING is an innovative training platform that enables users from all societal sectors to fundamentally improve their information literacy by training how to use, choose, reflect and evaluate data mining methods in connection with their daily research tasks and to become data-savvy information professionals. The platform provides users with technical support as well as social advice and learning input to organise, filter and exploit information in a more efficient and sustainable way. Thus, we tackle the core challenge of knowledge society to manage large amounts of information in a professional way. The ability for understanding, using and developing data mining strategies will become a basic cultural technique. In fact, information management is one of the basic competences today. The open innovation training platform MOVING is both: a working environment for the quality analysis of large data collections with data mining methods and a training environment with information, learning and exchange offers for digital information management. This connection of technical application and curriculum does overcome any artificial distinction in training and practice. The MOVING platform provides beyond state-of-the-art semantic search and analysis of large data sets. It makes its own functioning understandable to the users and offers individually configurable training programmes and guidance based on a proved qualification concept. The MOVING platform will be implemented in two use cases: EY provides the use case of compliance officers with its worldwide operating public administrators. TUD provides a use case on educating young researchers on how to apply and interpret data-intensive research tasks.

Official website: http://www.moving-project.eu/

SGD4ML - Approximating Multi-label Classification with Stochastic Gradient Descent

Duration: 2016/4-2016/7

Project Members: Florian Mai, Ansgar Scherp

Funding: CAU Keil

Description: Multi-label classification deals with the machine learning task of assigning a set of k labels to an item d. Particular challenge is that the k labels are to be chosen from a set of n labels where typically k is two or more orders of magnitude smaller than n. In the past, we have developed various different techniques for addressing this task. However, with the increasing size of document corpora expensive classification techniques require approximations. In this work, we systematically investigate the usefulness of Stochastic Gradient Descent (SGD) as promising technique for approximating the classification task on very-large real-world data sets. The results of this research will provide important insights to help the development of future scalable methods for large-scale document analysis. We collaborate with Prof. Lee Giles from the Pennsylvania State University. Lee is principal investigator and director of the scientific search engine CiteSeerX.

SurveyCAU - Methods of Empirical Social Research: Theory and Application with Digital Media

Duration: 2015/3-2015/12

Project Members: Sakura Yamamura, Kai-Phillip Otte, Thomas Slawig, Ansgar Scherp

Funding: CAU Kiel (through PerLe/BMBF)

Description: The project aims to improve knowledge transfer through blended learning between on-campus programme and off-campus e-learning tools. To this end, the university's e-learning platform OLAT with the iSpring-Quizmakers is improved by more advanced features for exams. Furthermore, a WebApp is developed in an interdisciplinary collaboration between economic geography, psychology, and computer science that allows to conduct socioscientific survey on mobile devices (tablets, smartphones). The WebApp enables students to create their own surveys to aquire quantiative data and aquire skills through digital media. The Knowledge Discovery group provides its expertise in quantative methods for experiments involving human subjects as well as developing mobile applications. Official website: http://www.surveycau.uni-kiel.de/

TempoDeg - Cross-system User Interests Modeling from Social Media Sources Considering Temporal Degradation

Duration: April 2014 - March 2017

Project Members: Chifumi Nishioka, Ansgar Scherp

Funding: DAAD, Leibniz Association

Description: User-adaptation and personalization are vital issues due to an increasing information overload on the Internet. User profiling is an indispensable task in order to achieve personalization. In this research project, we investigate how to extract user interests from different social media sources. To this end, we apply the user profile to make recommendations. User profiling from social media is promising, since many users disseminate their thoughts and ideas on social media platforms on a daily basis. Further, cross-system user profiling, which derives user interests from several sources, is beneficial, because users reveal different facets in different social media platforms, web communities, and other data sources. In this work, a user profile is a set of user interests and each user interest has a weight which represents how important it is among the whole user profile. A simple way to compute a weight of a user interest is counting the number of occurrences (e.g., a weight of the user interest “tennis”, which appears three times in a user’s social media items, is three). So far, no study has taken into account the temporal decay of user interests. The temporal decay is the notion that a weight (importance) of information decreases gradually as time passes. This research project aims at constructing a user profile from multiple social media sources, which takes into account the temporal decay of information. The proposed user profile is evaluated in the context of a recommender system.

Multi-class Labeling Approaches for Automated Subject Indexing of Scientific Documents

Duration: 2014-2017

Project Members: Tobias Rebholz, Alexander Prange, Steffen Goos, Karin Wortmann, Ansgar Scherp

Funding: Leibniz Association

Description: In contrast to the notion of indexing in the database community, subject indexing in library sciences refers to the task of selecting multiple labels for the classification of documents such as scientific publications. In the past, the subject indexing of scientific publications has been conducted by library scientists of ZBW for more than 1.6 million economic documents using the ZBW's economics thesaurus STW. On average, each of the 1.6 million scientific publications has been an- notated with five STW descriptors. The increasing number of publications in the scientific disciplines and the digitization of science also change the subject indexing of scientific publications. In order to ensure an elaborated selection of descriptors and a high-quality subject indexing at maximal coverage of the published scientific works, automated techniques are developed in order to support the intellectual subject indexing. Thus, we develop novel machine learning methods for multi-class labeling of scientific documents and empirically compare their performance. While a single method is unlikely to solve the problem, we aim at developing a framework and create ensemble methods that combine the results from different methods. The performance of the labeling methods is evaluated using a data corpus of 62.000 documents of economics literature and 28.000 documents of political literature with gold standard annotations from manual subject indexing. The techniques will be implemented in a web-based review-system that provides insights about the automatically suggested subject indexing and allows manual quality control.

Past Projects

SeMuDocs - Semantic Annotation, Indexing, and Search of Multimedia Documents

Duration: 2013-2015

Project Members: Lydia Weiland, Ansgar Scherp

Funding: Ministry for Science, Research, and Arts, Baden-Wuerttemberg, Germany

Description: There has been enormous progress regarding media search of a single media type such as images and videos in the past. In contrast, search for multimedia documents such as PowerPoint presentations, Flash documents, and Adobe's Edge documents (HTML 5) is still very limited or not supported at all by the related work. Multimedia documents are composed of media objects like images, videos, audios, and text objects, which are coherently organized in time, space, and interaction. The information about the structure of multimedia documents, however, is not used for indexing and retrieving the content. This is desirable to improve visibility of the multimedia documents in the repositories and to ease reuse of the multimedia documents and the media objects they contain.
Goal of the proposed research project is to improve the search of multimedia documents. To this end, a formal object model and query language for multimedia documents will be developed. The object model allows for representing the arrangement of the media objects in time, space, and interaction and making it available for retrieval tasks. By this, complex queries against the multimedia document repository can be stated that make use of the relations defined between the media objects. In addition, new methods for the semantic enrichment of multimedia documents will be developed. By semantic enrichment, one can derive, e.g., that a text object is a title of a multimedia document or a caption of an image. In combination with existing text classifiers, it is possible to state queries regarding semantic concepts against the multimedia document repository. For example, one can search for topics in domains like sports and entertainment but also look for entities like places, organizations, and persons.
A prototypical implementation of the multimedia search will be evaluated with participants of different age groups and background (including pupils). It allows for deriving new insights into how multimedia document search is used, which was not possible before. As such, the technical as well as empirical results of the proposed research project will significantly influence the development of future multimedia information systems.

SMILER - Social Mobile Location and Event Finder

Duration: 2013-2014

Project Members: Jun.-Prof. Dr. Ansgar Scherp, Lydia Weiland, Dhaval Ranjane, Julian Abe, Julian Seitner, Linh Hoang, Martina Dukadinova, Ramana Dasari, Sven Rullmann

Industry cooperation: Telegate Media AG, Germany

Description: This project considers the question how a mobile application like mobEx (see project description below) can be designed such that it allows for the collaborative creation, modification, and sharing of places and events on the social web. Particular challenge is here that the mobile application SMILER does not only consider a single data source that is used in the backend system. Rather, a set of different data sources are employed that are curated by different organizations and offering different types of data as well as functionalities. This set of data sources are not assumed to be static. Rather, we also address the challenge what happens when new data sources occur or existing ones are discontinued.

mobEx - Mobile location and event finder

Duration: 2013

Project Members: Jun.-Prof. Dr. Ansgar Scherp, Michael Jess, Christian Bikar, Florian Knip, Bernd Opitz, Bernd Pfister, Timo Sztyler

Industry cooperation: Telegate Media AG, Germany

Description: Existing mobile applications do not allow for efficiently and intuitively exploring for events such as concerts, weekly markets, opening hours, etc. and at the same time explore for places such as sights, restaurants, organizations, and persons. Examples are the Christmas market in Mannheim or the opening hours of the bakery "around the corner", where you want to buy some bread rolls on Sunday. Such information is available on the Web from open APIs like the Telegate AG or directories for events such as Eventful, OpenPOI, Last.fm, Qype, and Flickr. The challenge addressed in mobEx project is to integrate in real-time the data retrieved from the different sources of social media data and make it available to the mobile user. To this end, we develop an efficient matching approach specialized for dealing with spatio/temporal data in the social media context such as events, persons, organizations, and places and the digital media associated with it.

You can try mobEx on your mobile phone. Simply install mobEx from Google's Playstore.

SocialSensor - Sensing User Generated Input for Improved Media Discovery and Experience

Duration: 2011-2014

Funding: Integrated Project, European Union

Description: Goal of the SocialSensor project is to analyze user-generated content and user interactions on social networking platforms by means of data mining and aggregation. From this analysis, new information and recommendations for the users will be derived. To this end, the SocialSensor framework will be developed that allows for indexing and search textual and in particular non-textual multimedia content from the social web in almost realtime. Information about the interaction behavior and the activities of the users on the social networking platforms are directly integrated and used in the multimedia analysis and multimedia search tasks. For example, user contributions in form of comments and ratings will be analyzed in order to derive trends and detect events. This information will be used together with the social network relationships to provide recommendations to other users. Novel user interfaces will be developed following the user-centric approach in order to better visualize and explore the social media.

Involvement: Scientific leader for the workpackages and person months of the University of Koblenz-Landau

Linked Networked Graphs - Semantic technologies for the next-generation media management solutions

Duration: 2010-2012

Project Members: Raphael Feld, Dr. Thomas Franz, Carsten Saathoff, Simon Schenk, Ansgar Scherp

Funding: EXIST research transfer, BMWI

Description: Goal of the project Linked Networked Graphs is the development of an infrastructure for semantic media management. To this end, the different processes for media management such as acquisition, distributed storage and search are supported. In contrast to existing solutions, ontologies and background knowledge in form of Linked Data is used. The EXIST research transfer has the goal to found a start-up company. The start-up company is called Kreuzverweis Solutions GmbH. The founding members are three former scientific assistants of the university T. Franz, C. Saathoff, and S. Schenk, an economist R. Feld, and Mr. Scherp.

EventMedia IRP

Duration: 2010-2011

Project Partners: CWI, Amsterdam, Netherlands and Eurecom, Sophia Antipolis, France

Funding: Network of Excellence, European Union

Description: Goal of the EventMedia IRP project of the Network of Excellence PetaMedia is the exploration, extraction and storage of events are occurrences in the real world in which humans participate. Following a user-centric approach, the semantic storage infrastructure SemaPlorer and winner of the Billion Triples Challenge 2008 developed at the University of Koblenz is combined with a novel user-interface from CWI. Eurecom provides the integration of event-related information from various data sources.

Involvement: Project lead for the University of Koblenz

SemanticMM4U - Emergent Semantics in Personalized Multimedia Content

Duration: 2007-2008

Funding: Marie Curie Fellowship, European Union

Description: Deriving semantic information for multimedia presentations and the media assets used for the presentation during the authoring process of the presentation. Application of semantics derivation in the domain of authoring photo albums in cooperation with CeWe Color Digital, Oldenburg, Germany.

News

  • Homepage kicked off!