Detailed structure of the database

Main authors:	Marc Laurencelle, Nicolas Surdyk, Matjaž Glavan, Birgitte Hansen, Claudia Heidecke, Hyojin Kim, Susanne Klages
FAIRWAYiS Editor:	Jane Brandt
Source document:	»Laurencelle, M. et al 2021. (Short note for the) database containing harmonised datasets, 28 pp. FAIRWAY Project Deliverable 3.3

In this article we define all fields that are present in the Excel database (and which can be downloaded from »Indicator database architecture).

Contents table
1. Datasheets: detailed description of the fields (data columns)
2. Summary sheets

1. Datasheets: detailed description of the fields (data columns)

The data sheets of the Excel database use a uniform scheme in terms of field (column) names and order. This data scheme is documented in the following two tables, which describe the mandatory fields (Table 2), and the optional fields (Table 3) added to the right end of a few CS datasheets. Note that mandatory fields marked by an asterisk must always be filled with a value and that the optional fields (columns) are always placed after the mandatory ones, so that it remains easy to merge several data sheets when necessary..

Table 2: Description of the mandatory fields in all data sheets of the Excel database. Fields are presented in the order they must follow. The asterisk put on a field name indicates that this field must always have a value (cannot be blank). The leftmost column of the table gives the Excel column letter (index) for that field in an Excel worksheet environment.

	Field name	Description
A	Case_Study*	Number-based identifier of the case study Example: “CS_1” for case study n°1 Note: This field would be very useful to one who wants to prepare a large table with all data from the Excel database by merging all CS datasheets.
B	Date_Format*	Symbolic format of the “Date” values: several formats are allowed including: “yyyy”: year only (yearly data), “yyyy-mm”: year and month (monthly data), “yyyy-mm-dd”: full date (daily data), “yyyy-mm-dd HH:MM”: full date & time (instant data), “mm”: (average monthly data: e.g., 30-year climatic average monthly precipitation data) Note: The “Date_Format” must be uniform for a given parameter / sub-parameter of a CS.
C	Data_Type*	Tells if the reported “Results” value is of a “numerical” or “categorical” (any text) type This greatly facilitates the reading and processing of the “Results” field values. Note: The “Data_Type” must be uniform for a given parameter / sub-parameter of a CS.
D	Date*	Time (instant time or period of time) at/during which the reported “Results” value is deemed representative The “Date” value must strictly respect the associated “Date_Format”.
E	Time_Scale*	The representative time scale for the reported value. Can be: “independent” (~assumed constant), “year” (or “multiyear” if the value applies during several consecutive years), “month”, “week”, “day”, or “instant”. To better understand: an “instant” time scale is for a measurement representing an instantaneous state (typically observed at a precise date & time), whereas a “day” time scale is for a result value that is related to a full day (e.g., daily precipitation in mm). Non-instant time scales are most often used for parameter values that are totals, averages, or temporary constants (e.g., the main crop of a field plot for a given year).
F	Sampling_Point_ID*	Identifier of the sampling site location, or more precisely: unique identifier to the geographical object that was sampled to produce the value reported in the “Results” field. The same ID can be found in the corresponding GIS file (in general). Note: The ID for catchment-scale results can be either “Catchment”, the full name of the catchment or study area, or some other variant.
G	Spatial_Scale*	Representative spatial scale for the reported value. Can be: “catchment”, “sub-catchment”, “local…” (plot, piezometer, groundwater well, spring, along with/near stream…), “national”, “international”, etc. International or national scales are for some reference values (e.g., of Crop yields).
H	Parameter_Group*	Group (family) of parameters, which is mostly there to facilitate exploration of a CS’ data or summary sheets Possible values: “Site data”, “Quality data from farmers”, “Simple quantit farm data”, “Compound or calculated quantit farm data”, “State indicator”, “Link indicator”. (Note: “quantit” means “quantitative”.)
I	Parameter_Name*	Parameter, which sometimes also acts as a group subdivided into several sub-parameters (see below) Note: Some parameter names are implicitly defining subgroups by using a colon (“:”) to separate the main name (left part) and the subgroup names (right part), in special cases where the “Subparameter_Name” field is already used.
J	Subparameter_Name (can be blank)	Subparameter: Most often “” (blank) whenever a parameter name suffices. A Subparameter can either be a subdivision of a Parameter, or provide additional information on the Parameter. Note: Some sub-parameter names implicitly define subgroups by using a colon (“:”) or a comma (“,”) to separate a group name (left part) and its subgroup names (right part), or to provide additional information.
K	CAS_Number (can be blank)	CAS Registry Number ideally provided for all non-element chemical substances. The CAS number may prevent some confusion when reading substance names. Search for chemicals using EACH s. engine: https://echa.europa.eu/search-for-chemicals
L	Results*	Measured, reported, or calculated value for the Parameter (and Subparameter) at the given location (Sampling_Point_ID) and time (Date) Value: number or text (consistently with the specified “Data_Type”)
M	Unit*	Measurement unit for the reported “Results” value. Mandatory. Specified as simple text avoiding special characters (e.g., “μ” →”u”). Examples: “%”, “ug/L”, “boolean”, “kg N/ha”, “mg/L”, “m³/s”, “uS/cm”, “text”, “ha” …
N	Below_LQ (can be blank)	Boolean (yes/no) values telling whether the reported “Results” value is below the Limit of Quantification. This column contains only the reported information because it is difficult and risky to assume something missing. Therefore, this field is blank most of the time. Possible values: “Y”, “N” or “” (unknown)
O	LQ_Value (can be blank)	Limit of Quantification value: rarely reported in most of the CS. No attempt was made to fill the blanks (for the same reasons as for the “Below_LQ” field). When it is reported, it is expressed in the measurement unit (“Unit”).
P	Origin (should not be blank)	Information about where the data comes from: “as reported” (in most cases) means that the data in the database comes from the datasets initially provided by the CS leaders during the surveys or first effective data exchanges; “added in 20XX…” indicates additional data that was added to the database later on; “reformatted from reported” indicates data that needed major changes in terms of formatting; “translated from reported” indicates data that is not expressed in the language originally used. Note: The “Origin” field was used quite freely to keep a trace of the changes made in the database. It is for information only.
Q	Analytical_Method (can be blank)	Analytical method or instrument used for the measurement. This information, primarily used for chemical measurements, is rarely reported. Accordingly, the field was also used to store various other information on the procedures used to prepare/obtain some of the calculated values (e.g., “Calculated with ArcGIS based on…”) or on the detailed source of the data.
R	Top_of_screen_Depth	(can be blank) These 3 fields contain additional information about the precise Z location of the sampling: The top-of-screen depth usually gives this information. The bottom-of-screen depth is used only to give this specific information. The approximate water-table depth, used only in CS 6 of Greece, was otherwise not used . The “…screen…” fields were used to store screen-related information in CS 1 and 2 of Denmark only. In CS 10 of Norway, the “Top_of_screen_Depth” field was rather used to store information about the sampling depth in a lake or river (m below the water surface). The 3 fields are completely blank in 9 of the 13 case studies. Unit: meters below ground surface (general), or meters below the water surface (in CS 10) Note: The fact that “approximate water-table depth” information was almost never provided by the CS leaders most likely means that such information was not readily available to them in any of the datasets they had access to. We think that in many cases this additional information did not already exist in the monitoring data sets, so that obtaining this information would have required a lot of extra work. In other words, this blank field tells us that this information is virtually never stored alongside the water-quality monitoring data, in practice.
S	Bottom_of_screen_Depth
T	Approx_WaterTable_Depth
U	Place (can be blank)	U Indicates where the analysis was carried out (mainly for chemical substances), or gives another name for the sampling site, as the place of analysis was very rarely reported. Typical values: “Field”, “Lab”, or “Unknown”.
V	Confidentiality	Level of confidentiality of the reported “Results” data. Possible values: “public” or “restricted”. Only the “public” data can be made publicly available online. The “restricted” data should not be shared.

Table 3: Description of the optional fields that may be present to the right of the mandatory fields (columns) in some of the datasheets of the Excel database. These fields provide extra information that may be useful in specific situations.

Optional field name	Description
(…) Sampling_Point_NAME (optional field)	This optional field is sometimes used in some CS data sheets to store the long name of the sampling site when the “Sampling_Point_ID” value is a coded identifier. It plays a role similar to the “Place” field described above. For instance: In CS 10 of Norway, “003-38229” is the coded identifier (in “Sampling_Point_ID”) corresponding to the long name “Saebyvannet” (stored in the “Sampling_Point_NAME” field)
(…) EIONET_Source_URL	This optional field is used in the CS data sheets where public data from the European database EIONET has been imported. It gives a link (URL) to the web page that was accessed in order to download the data.
(…) Orig_Parameter_Name (…) Orig_Sampling_Point_ID (…) Orig_Results (…) Orig_Analytical_Method (optional fields)	Extra fields added to a CS datasheet to facilitate backward linking with the previous (more raw) versions of the dataset. They are used notably when numerous texts had to be translated to English, when parameter names changed profoundly in terms of format or language, or when some Sampling_Point_ID had to be corrected for consistency reasons.

2. Summary sheets

The summary sheets in the Excel database provide an overall picture of the content of each CS datasheet. The different parameters present in the datasheet are listed, along with the measurement units, scales of observation, and data sources. Some basic statistics including the number of reported results and different sampling sites, and the minimum & maximum dates of observation, are given for each parameter. The GIS-related information required to link the tabular data to the GIS data is also provided in two complementary columns (Table 4). An illustration of the linking process between a parameter of a CS in the Excel database and the GIS data is presented again in (Figure 3).

Table 4: Description of the columns providing GIS related information in a summary sheet of the Excel database

Optional field name	Description
GIS_Related_Files	This column gives the filename (and relative path when relevant) of the GIS file containing the spatial features related to the parameter. If no GIS data is linked to the parameter, GIS_Related_Files = “NONE”. Note: If there is more than one GIS file linked to the parameter, the multiple filenames listed in the column are separated by a vertical bar symbol (“ \| ”).
GIS_Key_Field	This column specifies the name of the field of the GIS file’s attribute table that should contain the IDs that are present in the datasheet for that parameter.
GIS_Key_Value	This column specifies the ID of a specific spatial feature of the GIS file. Not needed in most cases.

D3.3 fig03
Figure 3

Note: For full references to papers quoted in this article see

» References

Go To Top