---------- User Guide ---------- ``hesseflux`` collects functions used for processing Eddy covariance data of the ICOS_ ecosystem site FR-Hes_. The post-processing functionality for Eddy flux data is similar to the R-package REddyProc_ and includes basically the steps described in `Papale et al. (Biogeosciences, 2006)`_ plus some extensions such as the daytime method of flux partitioning (`Lasslop et al., Global Change Biology 2010`_) and the estimation of uncertainties on the fluxes as in `Lasslop et al. (Biogeosci, 2008)`_. Only the post-processing steps are described here. We are happy to discuss any processing or post-processing directly. Contact us at mc (at) macu (dot) de. europe-fluxdata.eu file format ============================== The first processing steps at the ICOS ecosystem site FR-Hes (not shown) brings the data in a format that can be submitted to the database `europe-fluxdata.eu`_. The database predates ICOS and is somewhat a precursor of the ICOS data processing. The file format of `europe-fluxdata.eu` is hence very similar to the ICOS format. The only known difference to us is the unit of atmospheric pressure, which is in hPa in `europe-fluxdata.eu`_ and in kPa in `ICOS ecosystems`_. The file format has notably one header line with variable names. There are no units in the file. ``hesseflux`` provides a little helper script `europe-fluxdata_units.py` in the `bin` directory that adds a second header line with units. The script can be run on the output as: .. code-block:: bash python europe-fluxdata_units.py output_file_of_postproc_europe-fluxdata.csv Post-processing Eddy covariance data ==================================== The script `postproc_europe-fluxdata.py` in the `example` directory provides a template for post-processing data that is in the `europe-fluxdata.eu`_ file format. It basically makes all steps described in `Papale et al. (Biogeosciences, 2006)`_. The script is governed by a configuration file in Python's standard :mod:`configparser` format. The example configuration file `hesseflux_example.cfg` in the `example` directory is highly commented and should be (almost) self-explanatory. The script is called like: .. code-block:: bash python postproc_europe-fluxdata.py hesseflux_example.cfg This script should be taken as a template for one's own post-processing but includes most standard post-processing steps. Here we describe the main parts of the post-processing script. Reading the configuration file ------------------------------ The script `postproc_europe-fluxdata.py` starts by reading the configuration file `hesseflux_example.cfg`: .. code-block:: python import sys import configparser # Read config file configfile = sys.argv[1] config = configparser.ConfigParser(interpolation=None) config.read(configfile) It then analyses the configuration options. The first section in the configuration file are the options controlling which steps shall be performed by the script. The section in the `hesseflux_example.cfg` looks like: .. code-block:: python [POSTSWITCH] # spike detection (Papale et al., Biogeoci 2006) # bool outlier = True # ustar filtering (Papale et al., Biogeoci 2006) # bool ustar = True # flux partitioning (Reichstein et al., GCB 2005; Lasslop et al., GCB 2010) # bool partition = True # gap filling (Reichstein et al., GCB 2005) # bool fill = True # error estimate of Eddy fluxes (Lasslop et al., Biogeosci 2008) # bool fluxerr = False And the code in `postproc_europe-fluxdata.py` is: .. code-block:: python # program switches outlier = config['POSTSWITCH'].getboolean('outlier', True) ustar = config['POSTSWITCH'].getboolean('ustar', True) partition = config['POSTSWITCH'].getboolean('partition', True) fill = config['POSTSWITCH'].getboolean('fill', True) fluxerr = config['POSTSWITCH'].getboolean('fluxerr', True) All options are boolean and set to `True` by default if they are not given in the configuration file. All post-processing steps except uncertainty estimation of flux data would be performed in the given example. Read the data ------------- The script would then read in the data. The section in the configuration file is: .. code-block:: python [POSTIO] # can be comma separated list or single file # str inputfile = FR-Hes_europe-fluxdata_2016.txt # see strftime documentation of Python's datetime module # https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior # str timeformat = %Y%m%d%H%M # Delimiter to use with pandas.read_csv. # If None, Python’s builtin sniffer tool is used (slow) # https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html # str sep = , # Line numbers to skip (0-indexed) or number of lines to skip (int) # at the start of the file. # https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html # list-like, int skiprows = None # values being NaN and undef will be ignored # float undef = -9999. # threshold of shortwave radiation for determining day/night. # day is SW_IN > swthr # float swthr = 10. The analysis of the options in `postproc_europe-fluxdata.py` is: .. code-block:: python # input file inputfile = config['POSTIO'].get('inputfile', '') timeformat = config['POSTIO'].get('timeformat', '%Y%m%d%H%M') sep = config['POSTIO'].get('sep', ',') skiprows = config['POSTIO'].get('skiprows', None) undef = config['POSTIO'].getfloat('undef', -9999.) swthr = config['POSTIO'].getfloat('swthr', 10.) Note that strings are given without quotes in the configuration file. `inputfile` can be a single filename or a comma-separated list of filenames. If it is missing or empty, the script will try to open a GUI, where one can choose input files. The data will be appended if several input files are given. The (first) input file is read as: .. code-block:: python import pandas as pd infile = inputfile[0] df = pd.read_csv(infile, sep, skiprows=skiprows, parse_dates=[0], date_format=timeformat, index_col=0, header=0) :mod:`pandas` will use the first column as index (`index_col=0`), assuming that these are dates (`parse_dates=[0]`) in the format `timeformat`, where columns are separated by `sep`. The defaults follow the `europe-fluxdata.eu`_ format but similar formats may be used, and script and/or configuration file can be adapted easily. Only variable names (still) have to follow `europe-fluxdata.eu`_, `ICOS`_ or `Ameriflux`_ format at the moment. If the input file has a second header line with units, one can skip it giving `skiprows=[1]` (not `skiprows=1`). All input files are supposed to be in the same format if `inputfile` is a comma-separated list of filenames, and they will be read with the same command above. The :mod:`pandas` dataframes (`df`) will simply be appended. The flag dataframe ------------------ All Not-a-Number (NaN) values will be set to `undef` and will be ignored in the following. This happens via a second dataframe (`dff`), having the same columns and index as the input dataframe `df`, representing quality flags. All cells that have a value other than `0` in the flag dataframe `dff` will be ignored in the dataframe `df`. This means all cells of `df` with `undef` will be set to `2` in `dff` immediately: .. code-block:: python # NaN -> undef df.fillna(undef, inplace=True) # Flag dff = df.copy(deep=True).astype(int) dff[:] = 0 dff[df == undef] = 2 Day / night ----------- Most post-processing routines differentiate between daytime and nighttime data. `Papale et al. (Biogeosciences, 2006)`_ use a threshold of 20 W m\ :sup:`-2` of global radiation to distinguish between day and night. `REddyProc`_ uses incoming shortwave radiation greater than 10 W m\ :sup:`2` as daytime. The shortwave radiation threshold `swthr` (same name as in ReddyProc) can be used to define the appropriate threshold. The default is 10 W m\ :sup:`2`. The column `SW_IN_1_1_1` has to exist in the input data. .. code-block:: python # day / night isday = df['SW_IN_1_1_1'] > swthr Data check ---------- `postproc_europe-fluxdata.py` checks the units of air temperature (i.e. the first column starting with `TA_`). .. code-block:: python # Check Ta in Kelvin hta = ['TA_'] hout = _findfirststart(hta, df.columns) if df[hout[0]].max() < 100.: tkelvin = 273.15 else: tkelvin = 0. df.loc[dff[hout[0]]==0, hout[0]] += tkelvin :func:`_findfirststart(starts, names)` is a helper function that finds the first occurrence in `names` that starts with the string `starts`. This helper function is used for the moment until ``hesseflux`` has the functionality that the user can give individual variable names. The script calculates air vapour pressure deficit `VPD_PI_1_1_1` from air temperature and relative humidity (i.e. the first column starting with `RH_`) if not given in input data using the function :func:`esat` of `pyjams`_ for saturation vapour pressure: .. code-block:: python import numpy as np import pyjams as pj # add VPD if not given hvpd = ['VPD'] hout = _findfirststart(hvpd, df.columns) if len(hout) == 0: hvpd = ['TA_', 'RH_'] hout = _findfirststart(hvpd, df.columns) if len(hout) != 2: raise ValueError('Cannot calculate VPD.') ta_id = hout[0] rh_id = hout[1] # TA [K] if df[ta_id].max() < 100.: tk = df[ta_id] + 273.15 else: tk = df[ta_id] # rh [0-1] if df[rh_id].max() > 10.: rh = df[rh_id] / 100. else: rh = df[rh_id] vpd = (1. - rh) * pj.esat(tk) vpd_id = 'VPD_PI_1_1_1' df[vpd_id] = vpd df[vpd_id].where((df[ta_id] != undef) | (df[rh_id] != undef), other=undef, inplace=True) dff[vpd_id] = np.where((dff[ta_id] + dff[rh_id]) > 0, 2, 0) df.loc[dff[vpd_id] == 0, vpd_id] /= 100. It further assures that VPD is in Pa for further calculations. .. code-block:: python # Check VPD in Pa hvpd = ['VPD'] hout = _findfirststart(hvpd, df.columns) if df[hout[0]].max() < 10.: # kPa vpdpa = 1000. elif df[hout[0]].max() < 100.: # hPa vpdpa = 100. else: vpdpa = 1. # Pa df.loc[dff[hout[0]] == 0, hout[0]] *= vpdpa And finally determines the time intervals of the input data `dtsec` (s) and the number of time steps per day `ntday`. .. code-block:: python # time stepping dsec = (df.index[1] - df.index[0]).seconds ntday = np.rint(86400 / dsec).astype(int) Spike / outlier flagging ------------------------ If `outlier=True` is set in the configuration file, spikes will be detected with the method given in `Papale et al. (Biogeosciences, 2006)`_. A median absolute deviation (MAD) filter will be used on the second derivatives of the time series in two-week chunks. The section in `hesseflux_example.cfg` looks like: .. code-block:: python [POSTMAD] # spike / outlier detection, see help(hesseflux.madspikes) # scan window in days for spike detection # int nscan = 15 # fill window in days for spike detection # int nfill = 1 # spike will be set for values above z absolute deviations # float z = 7. # 0: mad on raw values; 1, 2: mad on first or second derivatives # int deriv = 2 `nfill` is the number of days that are treated at once. `nfill=1` means that the time series will be stepped through day by day. `nscan` are the days to be considered when calculating the mean absolute deviations. `nscan=15` means that 7 days before the fill day, the fill day itself, and 7 days after the fill day will be used for the robust statistic. However, only spikes detected within the inner `nfill` days will be flagged in the `nscan` days. Spikes will be detected if they deviate more than `z` mean absolute deviations from the median. For example, `nfill=3`, `nscan=15`, and `z=7` means that the time series will be treated in steps of 3 days. Each 3 days, MAD statistics will be calculated using 15 days around the middle of the 3 days. Then all values within the 3 days that deviate more 7 mean absolute deviations from the median of the 15 days will be flagged. `deriv=2` applies the MAD filter to the second derivatives. A spike has normally a strong curvature and hence a large second derivative. `deriv=1` is currently not implemented. `deriv=0` applies the filter to the raw time series. This might be useful to find outliers in smooth time series such as soil moisture. `deriv=0` is also used on the 20 Hz Eddy raw data in the quality and uncertainty strategy of `Mauder et al. (Agric Forest Meteo, 2013)`_. The default values, if options are not given in the configuration file, are `nscan=15`, `nfill=1`, `z=7`, and `deriv=2`. `postproc_europe-fluxdata.py` calls the spike detection like this: .. code-block:: python # assume *_PI variables after raw variables, e.g. LE before LE_PI, # if available houtlier = ['H_', 'LE', 'FC', 'H_PI', 'LE_PI', 'NEE'] hout = _findfirststart(houtlier, df.columns) sflag = hf.madspikes(df[hout], flag=dff[hout], isday=isday, undef=undef, nscan=nscan * ntday, nfill=nfill * ntday, z=z, deriv=deriv, plot=False) for ii, hh in enumerate(hout): dff.loc[sflag[hh] == 2, hh] = 3 The function :func:`~hesseflux.madspikes.madspikes` returns flag columns for the input variables where spiked data is flagged as 2. The scripts sets the corresponding columns in the flag dataframe `dff` to 3 (3 is used just to keep track where the flag was set). u* filtering ------------ If `ustar=True` is set in the configuration file, a u*-filter will be applied following `Papale et al. (Biogeosciences, 2006)`_. The section in `hesseflux_example.cfg` looks like: .. code-block:: python [POSTUSTAR] # ustar filtering, see help(hesseflux.ustarfilter) # min ustar value. Papale et al. (Biogeosci 2006): 0.1 forest, 0.01 else # float ustarmin = 0.1 # number of boostraps for determining uncertainty of ustar threshold. # 1 = no bootstrap # int nboot = 1 # significant difference between ustar class and mean of the above classes # float plateaucrit = 0.95 # if True, return u* thresholds for each season, # otherwise max(u*) as in Papale et al. (Biogeoci 2006) # bool seasonout = True # if True, ustar flags are set, # otherwise ustar is calculated by not set, for example to check the influence # of the above parameters # bool applyustarflag = True A minimum threshold `ustarmin` is defined under which data is flagged by default. `Papale et al. (Biogeosciences, 2006)`_ suggest 0.1 for forests and 0.01 for other land cover types. `postproc_europe-fluxdata.py` sets 0.01 as its default value. Uncertainty of the u* threshold is calculated via bootstrapping in Papale et al. `nboot` gives the number of bootstrapping for the uncertainty estimate of the u* threshold. The algorithm divides the input data in 6 temperature classes and 20 u* classes within each temperature class per season. It then determines the threshold for each season as the average u* of the u* class where the average CO2 flux is less than `plateaucrit` times the average of all CO2 fluxes with u* greater than the u* class. `Papale et al. (Biogeosciences, 2006)`_ took 6 temperature classes and `plateaucrit=0.99`, while `REddyProc`_ takes 7 temperature classes and `plateaucrit=0.95`, which are also the defaults in ``hesseflux``. `Papale et al. (Biogeosciences, 2006)`_ also used the maximum of the four seasonal u* thresholds as the threshold applied to all the year. If `seasonout=True`, the seasonal u* thresholds will be applied instead of the maximum of four seasonal u* thresholds. One can also set `applyustarflag=False` to just calculate the u* thresholds without applying them to experiment with different parameter values. The u*-filtering is then performed as: .. code-block:: python hfilt = ['NEE', 'USTAR', 'TA_'] hout = _findfirststart(hfilt, df.columns) # take FC if NEE not in input file if len(hout) == 2: hfilt = ['FC', 'USTAR', 'TA_'] hout = _findfirststart(hfilt, df.columns) assert len(hout) == 3, ('Could not find CO2 flux (NEE, FC),' ' USTAR or TA in input file.') hout = _findfirststart(hfilt, df.columns) ffsave = dff[hout[0]].to_numpy() dff.loc[(~isday) & (df[hout[0]] < 0.), hout[0]] = 4 ustars, flag = hf.ustarfilter(df[hout], flag=dff[hout], isday=isday, undef=undef, ustarmin=ustarmin, nboot=nboot, plateaucrit=plateaucrit, seasonout=seasonout, plot=True) dff[hout[0]] = ffsave df = df.assign(USTAR_TEST_1_1_1=flag) dff = dff.assign(USTAR_TEST_1_1_1=np.zeros(df.shape[0], dtype=int)) if applyustarflag: # assume *_PI variables after raw variables, # e.g. LE before LE_PI if available hustar = ['H_', 'LE', 'FC', 'H_PI', 'LE_PI', 'NEE'] hout = _findfirststart(hustar, df.columns) print(' Using', hout) for ii, hh in enumerate(hout): dff.loc[flag == 2, hh] = 5 The function :func:`~hesseflux.ustarfilter.ustarfilter` returns the u* 5, 50 and 95 percentiles of the bootstrapped u* thresholds as well as flag columns, which is 0 except where u* is smaller than the median u*-threshold. The scripts sets the columns of the Eddy fluxes in the flag dataframe `dff` to 5 (5 to keep track where the flag was set). One might not want to do u* filtering, but use for example Integral Turbulence Characteristics (ITC) that were calculated, for example, with `EddyPro`_\ :sup:`(R)`. These should be set right at the start after reading the input data into the dataframe `df` and producing the flag dataframe `dff` like: .. code-block:: python dff.loc[df['FC_SSITC_TEST_1_1_1']>0, 'FC_1_1_1'] = 2 Partitioning of Net Ecosystem Exchange -------------------------------------- If `partition=True` is set in the configuration file, two estimates of Gross Primary Productivity (GPP) and Ecosystem Respiration (RECO) are calculated: firstly with the method of `Reichstein et al. (Glob Change Biolo, 2005)`_ using nighttime data only, and secondly with the method of `Lasslop et al. (Glob Change Biolo, 2010)`_ using a light-response curve on 'daytime' data. The configuration `hesseflux_example.cfg` gives only one option in this section: .. code-block:: python [POSTPARTITION] # partitioning, see help(hesseflux.nee2gpp) # if True, set GPP=0 at night # bool nogppnight = False Many people find it unaesthetic that the 'daytime' method gives negative GPP at night. We esteem this the correct behaviour, reflecting the uncertainty in the gross flux estimates. However, one can set `nogppnight=True` to set GPP=0 at night and RECO=NEE in this case, the latter having then all variability of the net fluxes. The partitioning is calculated as: .. code-block:: python hpart = ['NEE', 'SW_IN', 'TA_', 'VPD'] hout = _findfirststart(hpart, df.columns) if len(hout) == 3: # take FC if NEE not in input file hpart = ['FC', 'SW_IN', 'TA_', 'VPD'] hout = _findfirststart(hpart, df.columns) print(' Using', hout) astr = ('Could not find CO2 flux (NEE, FC), SW_IN, TA,' ' or VPD in input file.') assert len(hout) == 4, astr # nighttime method dfpartn = hf.nee2gpp(df[hout], flag=dff[hout], isday=isday, undef=undef, method='reichstein', nogppnight=nogppnight) if hout[0].startswith('NEE'): suff = hout[0][3:-1] else: suff = hout[0][2:-1] dfpartn.rename(columns=lambda c: c + suff + '1', inplace=True) # daytime method dfpartd = hf.nee2gpp(df[hout], flag=dff[hout], isday=isday, undef=undef, method='lasslop', nogppnight=nogppnight) dfpartd.rename(columns=lambda c: c + suff + '2', inplace=True) df = pd.concat([df, dfpartn, dfpartd], axis=1) # take flags from NEE for dn in ['1', '2']: for gg in ['GPP', 'RECO']: dff[gg + suff + dn] = dff[hout[0]] # flag if partitioning was not possible for dn in ['1', '2']: for gg in ['GPP', 'RECO']: dff[df[gg + suff + dn] == undef] = 2 Gap-filling / Imputation ------------------------ Marginal Distribution Sampling (MDS) of `Reichstein et al. (Glob Change Biolo, 2005)`_ is implemented as imputation or so-called gap-filling algorithm. The algorithm looks for similar conditions in the vicinity of a missing data point, if option `fill=True`. The configuration file is: .. code-block:: python [POSTGAP] # gap-filling with MDS, see help(hesseflux.gapfill) # max deviation of SW_IN # float sw_dev = 50. # max deviation of TA # float ta_dev = 2.5 # max deviation of VPD # float vpd_dev = 5.0 # avoid extrapolation in gaps longer than longgap days longgap = 60 If a flux data point is missing, times with incoming shortwave radiation in the range of `sw_dev` around the actual shortwave radiation will be looked for, as well as air temperatures within `ta_dev` and air vapour pressure deficit within `vpd_dev`. The mean of flux values at the similar conditions is then taken as fill value. The function does not fill long gaps longer than `longgap` days. A good summary is given in Fig. A1 of `Reichstein et al. (Glob Change Biolo, 2005)`_. The script invokes MDS as: .. code-block:: python hfill = ['SW_IN', 'TA_', 'VPD'] hout = _findfirststart(hfill, df.columns) assert len(hout) == 3, 'Could not find SW_IN, TA or VPD in input file.' # assume *_PI variables after raw variables, e.g. LE before LE_PI # if available hfill = ['H_', 'LE', 'FC', 'H_PI', 'LE_PI', 'NEE', 'GPP_1_1_1', 'RECO_1_1_1', 'GPP_1_1_2', 'RECO_1_1_2', 'GPP_PI_1_1_1', 'RECO_PI_1_1_1', 'GPP_PI_1_1_2', 'RECO_PI_1_1_2', 'SW_IN', 'TA_', 'VPD'] hout = _findfirststart(hfill, df.columns) print(' Using', hout) df_f, dff_f = hf.gapfill(df[hout], flag=dff[hout], sw_dev=sw_dev, ta_dev=ta_dev, vpd_dev=vpd_dev, longgap=longgap, undef=undef, err=False, verbose=1) hdrop = ['SW_IN', 'TA_', 'VPD'] hout = _findfirststart(hdrop, df.columns) df_f.drop(columns=hout, inplace=True) dff_f.drop(columns=hout, inplace=True) def _add_f(c): return '_'.join(c.split('_')[:-3] + ['f'] + c.split('_')[-3:]) df_f.rename(columns=_add_f, inplace=True) dff_f.rename(columns=_add_f, inplace=True) df = pd.concat([df, df_f], axis=1) dff = pd.concat([dff, dff_f], axis=1) The function :func:`~hesseflux.gapfill.gapfill` returns the filled columns `df_f` as well as flag columns `dff_f` indicating fill quality. Fill quality A-C of `Reichstein et al. (Glob Change Biolo, 2005)`_ are translated to quality flags 1-3. Uncertainty estimates of flux data ---------------------------------- `Lasslop et al. (Biogeosci, 2008)`_ presented an algorithm to estimate uncertainties of Eddy covariance fluxes using Marginal Distribution Sampling (MDS). The gap-filling function :func:`~hesseflux.gapfill.gapfill` can be used for uncertainty estimation giving the keyword `err=True`. The same thresholds as for gap-filling are used. The script `postproc_europe-fluxdata.py` uses the function :func:`~hesseflux.gapfill.gapfill` to calculate flux uncertainties like: .. code-block:: python hfill = ['SW_IN', 'TA_', 'VPD'] hout = _findfirststart(hfill, df.columns) assert len(hout) == 3, 'Could not find SW_IN, TA or VPD in input file.' # assume *_PI variables after raw variables, e.g. LE before LE_PI # if available hfill = ['H_', 'LE', 'FC', 'H_PI', 'LE_PI', 'NEE', 'H_f', 'LE_f', 'FC_f', 'H_PI_f', 'LE_PI_f', 'NEE_f', 'NEE_PI_f', 'GPP_1_1_1', 'RECO_1_1_1', 'GPP_1_1_2', 'RECO_1_1_2', 'GPP_f_1_1_1', 'RECO_f_1_1_1', 'GPP_f_1_1_2', 'RECO_f_1_1_2', 'GPP_PI_1_1_1', 'RECO_PI_1_1_1', 'GPP_PI_1_1_2', 'RECO_PI_1_1_2', 'GPP_PI_f_1_1_1', 'RECO_PI_f_1_1_1', 'GPP_PI_f_1_1_2', 'RECO_PI_f_1_1_2', 'SW_IN', 'TA_', 'VPD'] # hfill = ['NEE', 'GPP', 'SW_IN', 'TA_', 'VPD'] hout = _findfirststart(hfill, df.columns) df_f = hf.gapfill(df[hout], flag=dff[hout], sw_dev=sw_dev, ta_dev=ta_dev, vpd_dev=vpd_dev, longgap=longgap, undef=undef, err=True, verbose=1) hdrop = ['SW_IN', 'TA_', 'VPD'] hout = _findfirststart(hdrop, df.columns) df_f.drop(columns=hout, inplace=True) colin = list(df_f.columns) # names such as: NEE_PI_err_1_1_1 df_f.rename(columns=_add_f, inplace=True) colout = list(df_f.columns) df = pd.concat([df, df_f], axis=1) # take flags of non-error columns for cc in range(len(colin)): dff[colout[cc]] = dff[colin[cc]] We recommend, however, to calculate flux uncertainties with the Eddy covariance raw data as described in `Mauder et al. (Agric Forest Meteo, 2013)`_. This is for example implemented in the processing softwares `EddyPro`_\ :sup:`(R)` or `TK3`_. Writing the output file ----------------------- The dataframe is written to the output file with :mod:`pandas` :func:`to_csv` method: .. code-block:: python df.to_csv(outputfile, sep=sep, na_rep=str(undef), index=True, date_format=timeformat) using the same `sep` and `timeformat` as the input. The configuration for output is: .. code-block:: python [POSTIO] # if empty, write will ask for output path using the name of # this config file with the suffix .csv # str outputfile = FR-Hes_europe-fluxdata_2016-post.txt # if True, set variable to undef where flagged in output # bool outundef = True # if True, add flag columns prepended with flag_ for each variable # bool outflagcols = False If `outputfile` is missing or empty, the script will try to open a GUI, where one can choose an output directory and the filename will then be name of the configuration file with the suffix '.csv'. If `outundef=True` then all values in `df` with a flag value in `dff` greater than zero will be set to `undef`. The script can also add flag columns, prefixed with `flag_`, for each column in `df`, if `outflagcols=True`. The script will always output the columns with the flags for fill quality if gap-filling was performed: option `fill=True`. The whole code to write the output file is: .. code-block:: python # Back to original units hta = ['TA_'] hout = _findfirststart(hta, df.columns) df.loc[dff[hout[0]] == 0, hout[0]] -= tkelvin hvpd = ['VPD'] hout = _findfirststart(hvpd, df.columns) df.loc[dff[hout[0]] == 0, hout[0]] /= vpdpa if outundef: for cc in df.columns: if cc.split('_')[-4] != 'f': # exclude gap-filled columns df[cc] = df[cc].where(dff[cc] == 0, other=undef) if outflagcols: def _add_flag(c): return 'flag_' + c dff.rename(columns=_add_flag, inplace=True) # no flag columns for flags dcol = [] for hh in dff.columns: if '_TEST_' in hh: dcol.append(hh) if dcol: dff.drop(columns=dcol, inplace=True) df = pd.concat([df, dff], axis=1) else: occ = [] for cc in df.columns: if cc.split('_')[-4] == 'f': occ.append(cc) dff1 = dff[occ].copy(deep=True) dff1.rename(columns=lambda c: 'flag_' + c, inplace=True) df = pd.concat([df, dff1], axis=1) df.to_csv(outputfile, sep=sep, na_rep=str(undef), index=True, date_format=timeformat) That's all Folks! .. _Ameriflux: https://ameriflux.lbl.gov .. _EddyPro: https://www.licor.com/env/products/eddy_covariance/eddypro .. _FR-Hes: https://www.icos-france.fr/en/static3/the-network .. _ICOS: https://www.icos-cp.eu/ .. _ICOS ecosystems: https://www.icos-etc.eu .. _Lasslop et al. (Biogeosci, 2008): https://doi.org/10.5194/bg-5-1311-2008 .. _Lasslop et al. (Glob Change Biolo, 2010): https://doi.org/10.1111/j.1365-2486.2009.02041.x .. _Lasslop et al., Global Change Biology 2010: https://doi.org/10.1111/j.1365-2486.2009.02041.x .. _Mauder et al. (Agric Forest Meteo, 2013): https://doi.org/10.1016/j.agrformet.2012.09.006 .. _Papale et al. (Biogeosciences, 2006): https://doi.org/10.5194/bg-3-571-2006 .. _REddyProc: https://cran.r-project.org/web/packages/REddyProc/index.html .. _Reichstein et al. (Glob Change Biolo, 2005): https://doi.org/10.1111/j.1365-2486.2005.001002.x .. _TK3: https://zenodo.org/record/20349 .. _europe-fluxdata.eu: https://www.europe-fluxdata.eu .. _pyjams: https://github.com/mcuntz/pyjams