Data Search for Data Mining

Semi-automated data enrichment and data table extraction from non-trivial document formats


Here you find a list of past presentations to learn more about the project DS4DM.

  • Presentation on project overview (delivered at the 2nd Smart Data Innovation Conference, Karlsruhe Germany, 2017)

    • Download the presentation: here

  • Project poster (presented at KMU Conference, Berlin, 2018)

    • Download the poster: here

  • Project poster (presented at GOR Conference, Berlin, 2017)

    • Download the poster: here

Project Website

Project Agenda for IDS Expo

DS4DM was a former project sponsored by the German government. It was completed in July 2018.

For the IDS Expo, the agenda of this project is to share with you some of our past presentations that resulted in novel extensions for RapidMiner platform.

Short Description of Project

In the following, we provide a short description to project DS4DM.


The research project DS4DM (Data Search for Data Mining) was sponsored by the German ministry of education and research. The project ran from August 2015 till July 2018. The project partners included RapidMiner GmbH and the University of Mannheim (Data and Web Science Group). The project developed novel methods to enrich data in automatic and semi-automatic manner, and extracting data tables out of non-trivial document formats. These challenging goals were reached through varying degrees of generalizations. The outcome of the project included several RapidMiner extensions as following:

  • Data Search for Data Mining extension
  • Web Table Extraction extension
  • PDF Table Extraction extension
  • Spreadsheet Table Extraction extension
  • SharePoint Connector extension

Additionally, a web connector for the Informatica Cloud was developed to access RapidMiner processes as webservices from within Informatica mappings.

More information on the project is available at its official website located at:

Some past publications from DS4DM are listed below for your reference.