|Main authors:||Marc Laurencelle, Nicolas Surdyk, Matjaž Glavan, Birgitte Hansen, Claudia Heidecke, Hyojin Kim, Susanne Klages|
|FAIRWAYiS Editor:||Jane Brandt|
|Source document:||»Laurencelle, M. et al 2021. (Short note for the) database containing harmonised datasets, 28 pp. FAIRWAY Project Deliverable 3.3|
Download the database
Indicators database: tabular data (public)
This excel database contains all "tabular" (non-GIS) data related to the 13 case studies that was gathered for the purposes of FAIRWAY's Monitoring & Indicators research theme. It is structured as one "data sheet" and one "summary sheet" per case study. (Note: this is a PUBLIC VERSION: restricted (confidential) data rows have been removed from the data sheets.)
Indicators database: GIS data
To complement the excel indicators database, this zipped folder contains all GIS data gathered for the FAIRWAY's Monitoring & Indicators research theme. The GIS files are grouped in subfolders, by case study, and then by keywords describing the nature of the spatial data.
|1. Database architecture|
|2. Overall structure of the Excel database|
|3. Organization of the GIS data files|
|4. Linking between the Excel database and the GIS data|
A simple architecture was chosen for the database:
- An Excel file consisting of one sheet of data and one sheet of summary per case study.
- A zipped folder containing GIS files from all case studies.
The choice of developing an “Excel database”, rather than a more rigid-structured SQL relational database, was made soon after inspection of the datasets provided by Fairway case-study leaders, which revealed a very large disparity in the types and amounts of data provided between the case studies. That is indeed the main reason why it is considered more appropriate to present the data in separate data tables (Excel “sheets”) and separate subfolders of GIS data for each case study. Other reasons notably include the advantages that an Excel database is easier to disseminate, use, and modify locally for instance to add more recent monitoring data for a given case study. That being said, it remains possible to merge all data sheets into one large table, thanks to the uniform scheme used for the main columns in all case study data sheets.
The Excel database contains all “tabular” (non-GIS) data related to the 13 case studies of Fairway that was gathered for the purposes of WP3. It is structured as one “data sheet” and one “summary sheet” per case study. Specific names are assigned to the sheets to indicate the case study they are related to:
- The “data sheets” are named as follows: CS_[case study number] [abbreviated country name](_[suffix]). Moreover, a few additional characters (suffix) are appended to the abbr. country name to better distinguish sites better when there are more than one CS for the same country. For instance, datasheet “CS_4 FR” contains the data from case study #4 from France (one site only), while datasheets “CS_1 DK_it” and “CS_2 DK_a” respectively contain the data from the “Island Tunoe” and “Aalborg” case studies in Denmark.
- The “summary sheets”, which list the available parameters that are available for a given CS, are named simply by inserting “list_” before the name of its corresponding datasheet. For instance, the summary sheet “list_CS_1 DK_IT” lists the parameters that are available in the datasheet “CS_1 DK_IT” for the “Aalborg” case study from Denmark.
Figure 1 shows screenshots of the tabs for the first (a) and last (b) summary and data worksheets of the Excel database.
In parallel to the Excel database, a zipped folder contains all “GIS” data gathered for FAIRWAY. The zip file must first be unzipped (decompressed) to a local folder or server before use, in most cases. The GIS files are grouped in subfolders, by case study, and then by keywords describing the nature of the spatial data. For instance, a GIS file delineating river segments within the area of interest of Derg Catchment, UKis stored in a subfolder “river_segments” placed in the case study’s subfolder “CS07_UK_DergCatchment” (Figure 2). Moreover, basic QGIS project files (.qgz) offer an easy and straightforward way to explore the GIS data from each case study. QGIS is a free and open source geographic information system. (For more information see https://qgis.org/)
The vast majority of the tabular data in the Excel database properly refers to a geographical object. The information required to determine which spatial feature a tabular data belongs to is provided in the Excel database and consists of:
(In the case study’s datasheet)
- The text value in the field “Sampling_Point_ID”, which gives the unique identifier to the geographical object that was sampled (or characterized by other means) to produce the value that is reported in the “Results” field (of a given row);
(In the case study’s summary sheet)
- The “GIS_Related_Files” column, which gives the relative path and name of the GIS file containing spatial features related to the parameters, and;
- The “GIS_Key_Field” column, which specifies the name of the field of the GIS file’s attribute table that should contain the text IDs, for a given parameter;
- (And the optional “GIS_Key_Value” column, which is also required in some cases to target a specific spatial feature of the GIS file.)
The steps one needs to follow to trace back the link from a “Results” tabular data to its corresponding geographical object consist of (Figure 3):
- Finding the row describing the parameter in the Summary sheet;
- Reading the “GIS_Related_Files” and “GIS_Key_Field” column values of that row;
- Opening the GIS file(s) corresponding to [“GIS_Related_Files”]…;
- Finding the spatial feature in the opened GIS file(s), where the attribute [“GIS_Key_Field”] is equal to [“Sampling_Point_ID”]. That can be done in several ways depending on the GIS software that is used. In a manual approach, it can be carried out using selection/filtering tools with the attribute table of the opened GIS file visible on screen;
- Viewing the result of the feature selection/filtering on a map (if relevant).
- once the linking process has been explored and tested for the parameters of interest, it can be automated quite easily in a custom computer program. For instance, if one wants to relate all tabular data of a given case study to their corresponding geographical objects (from different GIS files). In this process, links will be made between the content of the Excel sheets of the case study of interest and the attribute tables of the relevant GIS files. Only in the end, the spatial features themselves are considered, for instance, to get their X and Y coordinates and/or calculated areas (in case of polygon features).
- linking the data of interest to their corresponding geographical objects is not always necessary, in practical contexts. It is especially the case when catchment-scale data is considered. Generally, one may select and extract the tabular data from a couple of related parameters in the Excel database, calculate general statistics, and then perform some analyses (such as trend analysis, cross correlation analysis, etc.) without having to worry about the actual location of the sampling sites. Nevertheless, even in such cases, one should still ensure that the data selected for those analyses are consistent in terms of sampling sites, which can be done by carefully checking the “Sampling_Point_ID” identifiers for the selected rows.
Note: For full references to papers quoted in this article see