Semi-automated data enrichment and data table extraction from non-trivial document formats
Here you find a list of past presentations to learn more about the project DS4DM.
Presentation on project overview (delivered at the 2nd Smart Data Innovation Conference, Karlsruhe Germany, 2017)
Project poster (presented at KMU Conference, Berlin, 2018)
Project poster (presented at GOR Conference, Berlin, 2017)
Project Agenda for IDS Expo
DS4DM was a former project sponsored by the German government. It was completed in July 2018.
For the IDS Expo, the agenda of this project is to share with you some of our past presentations that resulted in novel extensions for RapidMiner platform.
Short Description of Project
In the following, we provide a short description to project DS4DM.
The research project DS4DM (Data Search for Data Mining) was sponsored by the German ministry of education and research. The project ran from August 2015 till July 2018. The project partners included RapidMiner GmbH and the University of Mannheim (Data and Web Science Group). The project developed novel methods to enrich data in automatic and semi-automatic manner, and extracting data tables out of non-trivial document formats. These challenging goals were reached through varying degrees of generalizations. The outcome of the project included several RapidMiner extensions as following:
- Data Search for Data Mining extension
- Web Table Extraction extension
- PDF Table Extraction extension
- Spreadsheet Table Extraction extension
- SharePoint Connector extension
Additionally, a web connector for the Informatica Cloud was developed to access RapidMiner processes as webservices from within Informatica mappings.
More information on the project is available at its official website located at: http://ds4dm.de
Links to Project Resources
Some past publications from DS4DM are listed below for your reference.
Psat Publications/Blogs/Articles, etc.
Data Search for Data Mining extension to enrich data table from a collection of relevant tables
Web Table Extraction extension to easily extract web tables in RapidMiner
PDF Table Extraction extension to easily extract data tables from the PDF documents
Spreadsheet Table Extraction extension to easily extract data tables from online Spreadhseets