data

BAW-Datenrepository starts productive operations

BAW-Datenrepository home page
BAW-Datenrepository home page

Bundesanstalt für Wasserbau (BAW) has positioned itself to support Open Access principles by providing open access to text publications and scientific data which is produced by its employees. As part of this initiative, scientific papers produced by BAW staff members have been made available via the Hydraulic ENgineering RepositorY (Henry) since 2017. In order to expand this scope beyond text publications to scientific data of BAW, a second online platform in the form of BAW-Datenrepository has been made available.

The primary goal of the BAW-Datenrepository is to provide open and unrestricted access to the scientific data produced at BAW. In doing so, BAW seeks to provide the general public an insight into its role as an advisor to the federal government on issues relating to hydraulic engineering. Another very important purpose is to enable the scientific community to reuse the provided data in their own work and to add value to it. This in turn improves the visibility of scientific activities of BAW and provides recognition to the employees involved in the creation of data published on BAW-Datenrepository. Finally, the BAW-Datenrepository serves as one of the platforms for the BAW to fulfil its legal obligations to provide open access to its data as stipulated by laws such as the E-Government Act.

In case of the BAW-Datenrepository, BAW is not just as a curator of data but also its publisher. The data publication processes happens completely in-house and the published data and metadata are hosted using our own IT infrastructure.

A hybrid multipurpose data-platform

BAW-Datenrepositoy has been built using the open source software InGrid. All published datasets are described using metadata conforming to the national and international metadata-standards such as GDI-DE and INSPIRE. As a result, external geoinformation platforms such as Geoportal.de and mCLOUD can harvest this metadata and make it searchable on their own portals. In turn they can also disseminate this metadata to other platforms such as the German GovData portal, INSPIRE Geoportal, etc. improving the visibility of the data published through the BAW-Datenrepository.

The metadata of the published data provides the users of BAW-Datenrepository an initial insight into the data before they can download and use the data themselves. The metadata is also the basis of text-based search functionality to find relevant datasets for one’s own purposes. Metadata-based free text search is really vital for finding suitable data, since searching through binary datasets in different data formats isn’t trivial. The decision to use InGrid for BAW-Datenrepository brings it in line with systems being built for data management internally at BAW and is intended to streamline the publication of data from the internal infrastructure to the openly accessible BAW-Datenrepository.

Bibliographical information for a dataset with a DOI
Bibliographical information for a dataset with a DOI

As a government research institute, two customisations extending the standard InGrid-functionality are very important for BAW. These are a persistent read-only data storage and an ability to assign persistent identifiers in the form of Digital Object Identifiers (DOI) to the published data. This brings BAW-Datenrepository in line with platforms for data publication such as Zenodo that also provide similar functionality. The widespread use of DOIs in the scientific community, facilitates the citation of data by other scientists and engineers, if they use it in their own work. This is especially important for the young scientists who are in the early stages of their career by providing recognition to their contributions in the creation of that data. Additionally, the combination of persistent read-only storage and persistent identifiers ensure that the published data is available long-term without any changes. This is vital for ensuring the reproducibility of published scientific works that made use of data published on BAW-Datenrepository.

Outlook

Bringing together of functions for metadata standardisation and data curation has many advantages such as those listed above. However, it also comes with its own challenges. One such example is the lack of full multilingualism of the published metadata. This is a known issue and the InGrid community is working to improve the situation, so that we can hopefully soon publish bilingual metadata in German and English, with the correct language being automatically displayed based on the visitor’s browser-settings.

Currently BAW-Datenrepository also doesn’t make use of the InGrid-Mapclient. In the future we plan to reactivate this feature, so that the geographic services published on the platform can also be viewed directly within BAW-Datenrepository simply by clicking on a link.