```{r echo=FALSE} library ("knitr") ``` ```{r} library ("AeRobiology") ``` ```{r eval=FALSE, echo = TRUE} install.packages("AeRobiology") library (AeRobiology) ``` # Using the attached data During this tutorial we are going to use the **pollen data from Munich** which are **integrated in the package**. This will allow you to follow the tutorial obtaining the same results. If you want to follow the tutorial by using your own data, check the next section.

**`munich_pollen`** is a data set containing information of daily concentrations of pollen in the atmosphere of Munich during the years 2010_2015. Pollen types included: "Alnus", "Betula", "Taxus", "Fraxinus", "Poaceae", "Quercus", "Ulmus" and "Urtica". Data were obtained at Munich (Zentrum Allergie und Umwelt, ZAUM) using a Hirst_type volumetric pollen trap (Hirst, 1952) following the standard methodology (VDI4252_4_2016). Some gaps have been added to test some functions of the package (e.g. quality_control(), interpollen()).
The data were obtained by the research team of Prof. Jeroen Buters (Christine Weil & Ingrid Weichenmeier). At the Zentrum Allergie und Umwelt ([ZAUM](https://www.zaum-online.de/), directed by Prof. Carsten B. Schmidt_Weber).

You can **load the data** in your working environment by typing: ```{r echo = TRUE} data("munich_pollen") ``` # Loading your database from Excel This section has been developed for people who are not familiar with R. **If you have experience in loading databases in R, you might not be interested in this section and you can skip it.** There are several functions to import data from Excel. Arbitrarily, we are going to use the package **readxl**. ```{r eval=FALSE, echo = TRUE} install.packages("readxl") library (readxl) ``` Now we can import the data with the function **read_xlsx()**. I strongly recommend you to **have your Excel file in the same folder in which you are working with R**. This will avoid long paths to your files and also will prevent broken paths when some files have been moved. If you are working with an old database, you might be interested in the **read_xls()** function. ```{r eval=FALSE, echo = TRUE} Mydata<-read_xlsx("C:/Users/Antonio/Desktop/Prueba Markdown/mydata.xlsx") ``` You can also **select the sheet** you want to import: ```{r eval=FALSE, echo = TRUE} Mydata<-read_xlsx("C:/Users/Antonio/Desktop/Prueba Markdown/mydata.xlsx", sheet=2) ```

**IMPORTANT**: You must remove all the symbols or letters from your database. **If you have missing gaps within your database don't put "-" or "No Data" or any letter in the gaps**, keep the cell empty or remove the cell. The dates should be in a complete format (E.g.:"14/02/2019"). There must be a first column with the date followed by as many columns as pollen types with only numeric values (**check decimal separation: commas must be replaced by dots**).

Now you should have loaded your data in the object `Mydata`. Let's check if there is any mistake: ```{r echo=TRUE, results='hold'} str(munich_pollen) ``` **Note**: I'm using `munich_pollen` database as example but you should use `Mydata`. As you can see, the object is a `data.frame`, the first column is "Date" type and the rest are "num", which means numeric. This is the format you database **must** have. Normally it is automatically recognized by the import function, but if not, check the functions: **as.Date()**, **as.numeric()** and **as.data.frame()**.

**Your database must have this format to start working with the package**. The functions are designed to warn you in case the format is not correct but there may be some exceptions. **This is the most important step when using the package and some strange errors reported by the functions can be solved by doing this**. Sometimes the date column has 2 different types simultaneously ("POSIXlt" and "POSIXct") and it might cause mistakes. You can use one or another, but not both. Furthermore, it is strongly recommended to use "Date" format instead of "POSIXlt" or "POSIXct" despite theoretically there may not be problems by using them.

**Please, don`t exasperate**. This is the **slowest step of using the package and the most important**. Once you have your data loaded, you only have to spend a few minutes in each function to have your results. If you have some unsolving problems, search on the internet. Some other people must have had your same problem and it should be solved in some forums. If not, write us and we will help you as soon as possible: aerobiology.package@gmail.com

If you want to import your data from **csv files**, you might be interested in **read.csv()** function.

# Function quality_control() **This function was designed to check the quality of an historical database of several pollen types**. Since many of the quality requirements depend on the criteria selected to establish the main pollen season, this function has the **calculate_ps() function integrated**. You can insert arguments of calculate_ps() in this function to select the method you want for calculating the main pollen season. calculate_ps() details can be consulted later in this same manuscript.

If **result = "plot"**, this function returns a **graphical resume** of the quality of the database marking with different red intensities the "weak-points" of it, and if **result = "table"** a `data.frame` with the detailed reasons of their "weakness".

**It establishes the quality according to 5 different criteria**: * **It checks if the main pollen season cannot be calculated according to the minimal requirements:** lack of data for this pollen type and year. This criteria appears as **"Complete"** in the generated data.frame. * **If the start, end or peak dates of the main pollen season have been interpolated** (or a date near them specified by **`int.window`** argument). It is based on the following premise: If this day or a nearby date is missing/interpolated, the selected date as start/peak/end might not be the real one. These criteria appear as **"Start"**, **"Peak"** and **"End"** in the generated data.frame. * **The percentage of missing data within the main pollen season.** It calculates the number of days which have been interpolated by interpollen() function and their percentage of the main pollen season. If a high percentage of the main pollen season has been interpolated, the information of this season might not be reliable. The maximal percentage allowed can be specified by the **`perc.miss`** argument. This filter appears as **"Comp.MPS"** in the generated data.frame. When running the function, it gives you a **graphical resume** with different color intensity depending on the risk of including each pollen/spore type for a concrete year and a **data.frame** with detailed information about the reasons of this evaluation.

If **result = "plot"**, the function returns a list of objects of class *ggplot2*; if **result = "table"**, the function returns a *data.frame*. By default, **result = "table"**.

```{r echo=TRUE, results='hide', fig.keep='first'} QualityControl<-quality_control(munich_pollen, result = "table") ``` If the filter has been successfully passed, it appears as `TRUE` in the generated `data.frame`. If not, it appears as `FALSE`. ```{r echo=TRUE, results='hold'} head(QualityControl) ``` ```{r echo=TRUE, fig.keep='first', results='hide'} quality_control(munich_pollen, result = "plot") ``` There are lots of arguments for this function. You can consult them in the "help" section of the function in R or in the [official document](https://CRAN.R-project.org/package=AeRobiology) of the package. Let's suppose we want to check the quality of our database, but we want to calculate it for a **80% defined Main Pollen Season** (arguments **`ps.method`** and **`perc`**). We don't want to take into account years with a missing data less than **4 days** away from the start, peak or end dates (argument **`int.window`**). Furthermore, we only want to exclude years with more than **50% of interpolated data** within the main pollen season (argument **`perc.miss`**): ```{r echo=TRUE, results='hide', fig.keep='first'} quality_control(munich_pollen, int.window = 4, perc.miss = 50, ps.method = "percentage", perc = 80) ``` **As you would have noticed, there are several differences between this result and the previous one.**

**interpollen() function** is integrated in this function. Consult the "help" section of this function to use it.

# Function interpollen() The function **interpollen()** was designed to complete all the missing gaps within a database in just one step. All the gaps of each pollen/spore type are completed simultaneously. **There are different methods to do so which may be more or less appropriate according to the particularities of your database.** The arguments are: * **data**: A `data.frame` object. This data.frame should include a first column in "Date" format and the rest of the columns in "numeric" format belonging to each pollen/spore type by column. * **method**: A `character` string specifying the method applied to calculate and generate the pollen missing data. The implemented methods that can be used are: **"lineal"**, **"movingmean"**, **"spline"**, **"tseries"** or **"neighbour"**. The method argument will be "lineal" by default. * **maxdays**: A `numeric (integer)` value specifying the maximum number of consecutive days with missing data that the algorithm is going to interpolate. If the gap is bigger than the argument value, the gap will not be interpolated. Not valid with "tseries" method. The maxdays argument will be 30 by default. This argument might be very interesting to avoid long interpolated gaps. * **plot**: A `logical (integer)` argument. If `TRUE`, graphical previews of the input database will be plot at the end of the interpolation process. All the interpolated gaps will be marked in red. The `plot` argument will be `TRUE` by default. * Others... ```{r echo=TRUE, results='hide', fig.keep='first'} Interpolated<-interpollen(munich_pollen[,c(1,6)], method="lineal", plot = TRUE, result = "long") ``` As you can see above, when **result = "long"**, **the missing gaps which have been completed by the algorithm are marked in red.** It displays similar graphs for each pollen/spore type within your database. Furthermore, the **"Interpolated"** data.frame is produced as "long" with another new column indicating if each data has been interpolated or not (1 means interpolated and 0 means original data). ```{r echo=TRUE, results='hold'} head(Interpolated) ``` If you want to integrate all the new data in your database, **you can assign the function to a new object** and specify the argument **result = "wide"**: ```{r echo=TRUE, results='hide'} CompleteData<-interpollen(munich_pollen[,c(1,6)], method="lineal", plot = FALSE, result = "wide") ``` Now your database without gaps appears as "CompleteData". **As you might have noticed, if you add `plot=FALSE`, the graphs are not shown.** You don't have to do this before applying another function of this package. **"interpollen" function has been integrated in the main functions of the package** so you can choose whether you want your data to be completed or not by a specific argument in each function.E.g.: ```{r echo=TRUE, eval=FALSE} calculate_ps(munich_pollen, method="percentage", interpolation=TRUE, int.method = "lineal", plot = F) ```

**FREQUENT ERROR:** Sometimes when applying interpollen you receive the following error: **"Error in rep(NA, ncasos): invalid 'times' argument"**. This is due to a mistake in your database: **you have some dates repeated or disorganized**. The algorithm of `interpollen()` searches for correlative dates and, when they are not consecutive natural days, it carries out the interpolation. When the second date is an earlier date than the previous one, it reports an error. You have to correct these issues in your database before applying the interpollen function.

## Methods of interpolation In this function there are **5 different methods to interpolate missing gaps**. Each method can be more appropriate than the others under some specific circumstances. Interpolated data are not real data, but including such estimations can reduce the error of some calculations such as the main pollen season by percentage or a pollen calendar, i.e.: if you keep the gap, these days would count as 0 for the calculations and this would suppose a bigger error than estimating pollen concentrations for these days. Obviously, this will also suppose a bigger error than having the real data. ### "lineal" method As mentioned above, there are many methods to interpolate your missing data. The simplest one was the showed in the previous example: "lineal". **It traces a straight line between each extreme of the gap.** The pollen/spore concentration of the missing days is calculated according to this line. This method may be appropriate for dates without pollen/spores or for small gaps. **Nevertheless, there are other method which can be more effective during the pollen season.** ### "movingmean" method This method calculates the moving mean of the pollen daily concentrations. The gaps are fulfilled by **calculating the moving mean of the concentrations for this particular day.** The window of the moving mean is centered on this day and its size is the result of multiplicating the gap size by the **`factor` argument**. E.g.: If you have a gap of 3 days and your factor is 2, the gaps are replaced by the value of a moving mean of 6 days (3 x 2) centered in this particular day and not taking into account the missing days for the mean value. Furthermore, it is a dynamic function: **for each gap of the database the window size of the moving mean changes depending of each gap size.** ```{r echo=TRUE, results='hide', fig.keep='first'} i<-interpollen(munich_pollen[,c(1,6)], method="movingmean", factor = 2, plot = TRUE) ``` ### "spline" method This method carries out the interpolation by tracing a **polinomic function to link both extremes of the gap.** The polinomic function is chosen according to the best fitting equation to the previous and posterior days of the gap (second, third, fourth degree...). The number of days of each side of the gap which will be taken into account for calculating the spline regression are specified by **`ndays` argument**. The smoothness of the adjustment can be specified by the **`spar` argument**. ```{r echo=TRUE, results='hide', fig.keep='first'} i2<-interpollen(munich_pollen[,c(1,6)], method="spline", ndays=3, spar=0.7, plot = FALSE) ``` **Note**: By changing the `ndays` and `spar` argument, very different results can be obtained. E.g.: ```{r echo=TRUE, results='hide', fig.keep='first'} i3<-interpollen(munich_pollen[,c(1,6)], method="spline", ndays=5, spar=0.2, plot = TRUE) ``` ### "tseries" method If you have long time series of data, this might be the most suitable method for you. This method **analyses the time series of the pollen/spore database and performs a seasonal-trend decomposition based on LOESS** ([Cleveland et al., 1990](http://www.nniiem.ru/file/news/2016/stl-statistical-model.pdf)). It extracts the **seasonality of the historical database** and uses it to predict the missing data by performing a linear regression with the target year. ```{r echo=TRUE, results='hide', fig.keep='last'} i4<-interpollen(munich_pollen, method="tseries", plot = TRUE) ``` **Seasonality is represented in grey, real data in black and interpolated data in red.** This method improves with long time series of years. Strange results may be caused by lack of years or small sampling periods. ### "neighbour" method **Other near stations provided by the user are used to interpolate the missing data of the target station.** First of all, a Spearman correlation is performed between the target station and the neighbour stations to discard the neighbour stations with a correlation coefficient smaller than **`mincorr` value**. For each gap, a linear regression is performed between the neighbour stations and the target stations to determine the equation which converts the pollen concentrations of the neighbour stations into the pollen concentration of the target station. Only neighbour stations without any missing data during the gap period are taken into account for each gap.

You can include 4 different databases of neighbour stations with the arguments **`data2`, `data3`, `data4` and `data5`.** **The format of these databases must be the same than the original one and the name of the pollen/spore types must be exactly the same.**

With the **`mincorr` argument** you can specify the minimal correlation coefficient (Spearman correlation) that the neighbour stations must have with the target station to be taken into account for a concrete gap. This process is completely independent for each gap of each pollen type. # Function calculate_ps() The function **calculate_ps()** is one of the core functions of the AeRobiology package. It was designed to **calculate the main parameters of the pollen season** with regard to phenology and pollen intensity from a historical database of several pollen types. The function can use the most common methods to define the main pollen season.

**Please, be patient:** This function has tons of arguments. Some of them are only useful for specific situations. You can use the function without knowing all of them. Nevertheless, **we decided to include all the possible arguments in order to provide specific tools for the advanced users**.

**This function has several arguments, but don't worry, we are going to perform examples later**. The main arguments are: * **data**: A `data.frame` object including the general database where calculation of the pollen season must be applied. This data.frame must include a first column in "Date" format and the rest of columns in "numeric" format belonging to each pollen type by column. * **method**: A `character` string specifying the method applied to calculate the pollen season and the main parameters. The implemented methods that can be used are: "percentage", "logistic", "moving", "clinical" or "grains". A more detailed information about the different methods for defining the pollen season may be consulted [here](https://CRAN.R-project.org/package=AeRobiology)) or later in this same document. * **th.day**: A `numeric` value. The number of days whose pollen concentration is bigger than this threshold is calculated for each year and pollen type. This value will be obtained in the results of the function. The `th.day` argument will be 100 by default. * **perc**: A `numeric` value ranging 0_100. This argument is valid only for `method = "percentage"`. This value represents the percentage of the total annual pollen included in the pollen season, removing (100_perc)/2% of the total pollen before and after of the pollen season. The `perc` argument will be 95 by default. * **def.season**: A `character` string specifying the method for selecting the best annual period to calculate the pollen season. The pollen seasons may occur within the natural year between two years. The implemented options that can be used are: "natural", "interannual" or "peak". The `def.season` argument will be "natural" by default. A more detailed information about the different methods for selecting the best annual period to calculate the pollen season may be consulted [here](https://CRAN.R-project.org/package=AeRobiology). * **reduction**: A `logical` value. This argument is valid only for the "logistic" method. If `FALSE` the reduction of the pollen data is not applicable. If `TRUE` a reduction of the peaks above a certain level (`red.level` argument) will be carried out before the definition of the pollen season. The reduction argument will be `FALSE` by default. A more detailed information about the reduction process may be consulted [here](https://CRAN.R-project.org/package=AeRobiology). * **red.level**: A `numeric` value ranging 0_1 specifying the percentile used as level to reduce the peaks of the pollen series before the definition of the pollen season. This argument is valid only for the "logistic" method. The `red.level` argument will be 0.90 by default, specifying the percentile 90. * **derivative**: A `numeric (integer)` value belonging to options of 4, 5 or 6 specifying the derivative that will be applied to calculate the asymptotes which determines the pollen season using the "logistic" method. This argument is valid only for the "logistic" method. The `derivative` argument will be 5 by default. More information may be consulted [here](https://CRAN.R-project.org/package=AeRobiology). * **man**: A `numeric (integer)` value specifying the order of the moving average applied to calculate the pollen season using the "moving" method. This argument is valid only for the "moving" method. The `man` argument will be 11 by default. * **th.ma**: A `numeric` value specifying the threshold used for the "moving" method for defining the beginning and the end of the pollen season. This argument is valid only for the "moving" method. The `th.ma` argument will be 5 by default. * **n.clinical**: A `numeric (integer)` value specifying the number of days which must exceed a given threshold (th.pollen argument) for defining the beginning and the end of the pollen season. This argument is valid only for the "clinical" method. The `n.clinical` argument will be 5 by default. * **window.clinical**: A `numeric (integer)` value specifying the time window during which the conditions must be evaluated for defining the beginning and the end of the pollen season using the "clinical" method. This argument is valid only for the "clinical" method. The `window.clinical` argument will be 7 by default. * **window.grains**: A `numeric (integer)` value specifying the time window during which the conditions must be evaluated for defining the beginning and the end of the pollen season using the "grains" method. This argument is valid only for the "grains" method. The `window.grains` argument will be 5 by default. * **th.pollen**: A `numeric` value specifying the threshold that must be exceeded during a given number of days (n.clinical or window.grains arguments) for defining the beginning and the end of the pollen season using the "clinical" or "grains" methods. This argument is valid only for the "clinical" or "grains" methods. The `th.pollen` argument will be 10 by default. * **th.sum**: A `numeric` value specifying the pollen threshold that must be exceeded by the sum of daily pollen during a given number of days (`n.clinical` argument) exceeding a given daily threshold (`th.pollen` argument) for defining the beginning and the end of the pollen season using the "clinical" method. This argument is valid only for the "clinical" method. The `th.sum` argument will be 100 by default. * **type**: A `character` string specifying the parameters considered according to a specific pollen type for calculating the pollen season using the "clinical" method. The implemented pollen types that may be used are: "birch", "grasses", "cypress", "olive" or "ragweed". As result for selecting any of these pollen types the parameters `n.clinical`, `window.clinical`, `th.pollen` and `th.sum` will be automatically adjusted for the "clinical" method. If no pollen types are specified (`type = "none"`), these parameters will be considered by the user. This argument is valid only for the "clinical" method. The `type` argument will be "none" by default. * **interpolation**: A `logical` value. If `FALSE` the interpolation of the pollen data is not applicable. If `TRUE` an interpolation of the pollen series will be applied to complete the gaps with no data before the calculation of the pollen season. The `interpolation` argument will be `TRUE` by default. A more detailed information about the interpolation method may be consulted [here](https://CRAN.R-project.org/package=AeRobiology). * **int.method**: A `character` string specifying the method selected to apply the interpolation method in order to complete the pollen series. The implemented methods that may be used are: "lineal", "movingmean", "spline" or "tseries". The `int.method` argument will be "lineal" by default. * **maxdays**: A `numeric` (integer value) specifying the maximum number of consecutive days with missing data that the algorithm is going to interpolate. If the gap is bigger than the argument value, the gap will not be interpolated. Not valid with `int.method = "tseries"`. The `maxdays` argument will be 30 by default. * **export.plot**: A `logical` value specifying if a set of plots based on the definition of the pollen season and saved in the working directory will be required or not. If `FALSE` graphical results will not be saved. If `TRUE` a pdf file for each pollen type showing graphically the definition of the pollen season for each studied year will be saved within the plot_AeRobiology directory created in the working directory. The `export.plot` argument will be `FALSE` by default. * **export.result**: A `logical` value specifying if a excel file including all parameters for the definition of the pollen season saved in the working directory will be required or not. If `FALSE` the results will not exported. If `TRUE` the results will be exported as xlsx file including all parameters calculated from the definition of the pollen season within the table_AeRobiology directory created in the working directory. The `export.result` argument will be `FALSE` by default. * **plot**: A `logical` value specifying if the plots are generated in the plot history. The `plot` argument will be `TRUE` by default. * **result**: A `character` string specifying the output for the function. The implemented outputs that may be obtained are: `"table"` and `"list"`. The argument `result` will be `"table"` by default.

**IMPORTANT :** if `export.plot=TRUE`, the plots are **EXPORTED** in "plot_AeRobiology" folder.

## Methods of calculating the Main Pollen Season This function allows to calculate the pollen season using **five different methods** which are described below. After calculating the **start_date**, **end_date** and **peak_date** for the pollen season all rest of **parameters** are calculated: * **type**: pollen type * **seasons**: year of the beginning of the season * **st.dt**: start_date (date) * **st.jd**: start_date (day of the year) * **en.dt**: end_date (date) * **en.jd**: end_date (day of the year) * **ln.ps**: length of the season * **sm.tt**: total sum * **sm.ps**: pollen integral * **pk.val**: peak value * **pk.dt**: peak_date (date) * **pk.jd**: peak_date (day of year) * **ln.prpk**: length of the pre_peak period * **sm.prpk**: pollen integral of the pre_peak period * **ln.pspk**: length of the post_peak period * **sm.pspk**: pollen integral of the post_peak period * **daysth**: number of days with more than 100 pollen grains * **st.dt.hs**: start_date of the High pollen season (date, only for clinical method) * **st.jd.hs**: start_date of the High pollen season (day of the year, only for clinical method) * **en.dt.hs**: end_date of the High pollen season (date, only for clinical method) * **en.jd.hs**: end_date of the High pollen season (day of the year, only for clinical method)

**IMPORTANT :** If `export.result=TRUE`, these objects will be exported as xlsx file within the "table_AeRobiology" directory created in your working directory. This Excel file will have a sheet for each pollen type and a last sheet with the legend of all the abbreviations and the method selected to calculate them. **We strongly recommend to use `export.result=TRUE`**

Although the results can be extracted in the working directory in a xlsx file, **they can also be assigned to an object, and visualized**, for example (an extract): ```{r echo = TRUE, results='hide', fig.keep='last', warning=FALSE} pollen_season <- calculate_ps(munich_pollen) ``` ```{r echo = FALSE, fig.keep='all', warning=FALSE} knitr::kable(pollen_season[24:31, ] , format = "html", booktabs = TRUE) ``` The result is also plotted by default (*plot = TRUE*). ```{r echo = TRUE, fig.keep='last', warning=FALSE} calculate_ps(munich_pollen[,c(1,6)], plot = TRUE) ``` ### "percentage" method This is a commonly used method for defining the pollen season based on the **elimination of a certain percentage in the beginning and the end of the pollen season** ( [Nilsson and Persson, 1981](https://www.tandfonline.com/doi/abs/10.1080/00173138109427661); [Andersen, 1991](https://www.tandfonline.com/doi/pdf/10.1080/00173139109427810)). For example, if the pollen season is based on the 95% of the total annual pollen (`perc = 95`), the start_date of the pollen season is marked as the day in which 2.5% of the total pollen is registered and the end_date of the pollen season is marked as the day in which 97.5% of the total pollen is registered. ```{r echo=TRUE, results='hide', fig.keep='first', eval=FALSE} calculate_ps(munich_pollen[,c(1,6)], method = "percentage", perc = 90, export.result = TRUE) ``` In this case we have calculated the main pollen season based on the 90% of the total annual pollen. **Results are stored in the "table_AeRobiology" folder since `export.result=TRUE`**. You can select different methods of interpolation or even to not interpolate gaps: ```{r echo=TRUE, results='hide', fig.keep='first'} calculate_ps(munich_pollen[,c(1,6)], method = "percentage", perc = 90, export.result = FALSE, int.method = "spline") ``` ```{r echo=TRUE, results='hide', fig.keep='first'} calculate_ps(munich_pollen[,c(1,6)], method = "percentage", perc = 75, export.result = FALSE, interpolation = FALSE) ```

**IMPORTANT :** if `export.plot=TRUE`, the plots are **EXPORTED** in "plot_AeRobiology" folder.

### "logistic" method This method was developed by Ribeiro et al. (2007) and modified by Cunha et al. (2015). It is based on fitting annually a **non-linear logistic regression model** to the daily accumulated curve for each pollen type. This logistic function and the **different derivatives** were considered to calculate the **start_date** and **end_date** of the pollen season, based on the **asymptotes when pollen amounts are stabilized on the beginning and the end of the accumulated curve**. For more information about the method, see [Ribeiro et al. (2007)](https://www.ncbi.nlm.nih.gov/pubmed/18247462) and [Cunha et al. (2015)](https://link.springer.com/article/10.1007%2Fs10453-014-9345-3). **Three different derivatives** may be used (`derivative` argument) 4, 5 or 6 that represent **from higher to lower restrictive criterion** for defining the pollen season. This method may be **complemented with an optional method for reduction the peaks values** (`reduction = TRUE`), thus avoiding the effect of the great influence of extreme peaks. In this sense, **peaks values will be cut below a certain level** that the user may select based on a percentile analysis of peaks. For example, `red.level = 0.90` will cut all peaks above the percentile 90. ```{r echo=TRUE, results='hide', fig.keep='first', warning=FALSE} pollen_season<-calculate_ps(munich_pollen[,c(1,6)], method = "logistic", derivative = 5, reduction=FALSE) ``` ```{r echo = FALSE, fig.keep='all', warning=FALSE} knitr::kable(pollen_season, format = "html", booktabs = TRUE ) ``` In the previous case, the reduction wasn't carried out (`reduction=FALSE`) and all the peaks were conserved. We can cut some peaks and change the `derivative`: ```{r echo=TRUE, results='hide', fig.keep='first', warning=FALSE} pollen_season<-calculate_ps(munich_pollen[,c(1,6)], method = "logistic", derivative = 6, reduction=FALSE, red.level = 0.8) ``` ```{r echo = FALSE, fig.keep='all', warning=FALSE} knitr::kable(pollen_season, format = "html", booktabs = TRUE) ``` As you can observe, results are a bit different.

**IMPORTANT :** if `export.plot=TRUE`, the plots are **EXPORTED** in "plot_AeRobiology" folder.

### "clinical" method This method was proposed by [Pfaar et al. (2017)](https://www.ncbi.nlm.nih.gov/pubmed/27874202). It is based on the expert consensus in relation to pollen exposure and the **relationship with allergic symptoms** derived of the literature. Different periods may be defined by this method: the **pollen season**, the **high pollen season** and the **high pollen days**: 1) The **start_date** and **end_date** of the **pollen season** were defined as a **certain number of days (`n.clinical` argument) within a time window period (`window.clinical` argument) exceeding a certain pollen threshold (`th.pollen` argument) which summation is above a certain pollen sum (`th.sum` argument)**.
All these parameters are established for each pollen type according to Pfaar et al. (2017) and using the **`type`** argument these parameters may be automatically adjusted for the specific pollen types ("birch", "grasses", "cypress", "olive" or "ragweed"). Furthermore, **the user may change all parameters to do a customized definition of the pollen season**. 2) The **start_date** and **end_date** of the **high pollen season** were defined as **three consecutive days exceeding a certain pollen threshold (`th.day` argument)**. 3) The number of **high pollen days** will also be calculated exceeding this pollen threshold (`th.day`). For more information about the method, see [Pfaar et al. (2017)](https://www.ncbi.nlm.nih.gov/pubmed/27874202). Running the following example, the main pollen season will be established according to the birch requirements: more than 5 days within a week in which more than 10 pollen grains/m3 are registered and whose sum exceeds 100 pollen grains/m3. The high pollen days are those which exceed 100 pollen grains/m3 (`n.clinical=5, window.clinical=7, th.pollen=10, th.sum=100, th.day=100`) ```{r echo=TRUE, results='hide', fig.keep='first', warning=FALSE} pollen_season<-calculate_ps(munich_pollen[,c(1,6)], method = "clinical", type = "birch") ``` ```{r echo = FALSE, fig.keep='all', warning=FALSE} knitr::kable(pollen_season, format = "html", booktabs = TRUE) ``` The code above returns the same result than: ```{r echo=TRUE, results='hide', fig.keep='first', warning=FALSE} pollen_season<-calculate_ps(munich_pollen[,c(1,6)], method = "clinical", n.clinical = 5, window.clinical = 7, th.pollen = 10, th.sum = 100, th.day = 100) ``` ```{r echo = FALSE, fig.keep='all', warning=FALSE} knitr::kable(pollen_season, format = "html", booktabs = TRUE) ``` We have resumed all these parameters under the argument `type=birch` to facilitate its application.

**IMPORTANT :** if **`export.plot=TRUE`**, the plots are **EXPORTED** in "plot_AeRobiology" folder.

### "grains" method This method was proposed by Galan et al. (2001) originally in olive pollen but after also applied in other pollen types. The **start_date** and **end_date** of the pollen season were defined as a **certain number of days (`window.grains` argument) exceeding a certain pollen threshold (`th.pollen` argument)**. For more information about the method, see [Galan et al. (2001)](https://link.springer.com/article/10.1007/s004840000081). We want to establish the main pollen season start in the first day when more than 3 consecutive days have more than 2 pollen grains/m3, and the end in the last day in which these conditions are fulfilled: ```{r echo=TRUE, results='hide', fig.keep='first'} calculate_ps(munich_pollen[,c(1,6)], method = "grains", window.grains = 3, th.pollen = 2 ) ``` ### "moving" method

**This method is proposed the first time by the authors of this package. We are developing a research paper explaining the method in detail.**

The definition of the pollen season is based on the application of a **moving average** to the pollen series in order to obtain the general seasonality of the pollen curve avoiding the great variability of the daily fluctuations. Thus, **the start_date and the end_date will be established when the curve of the moving average reaches a given pollen threshold** (**`th.ma`** argument). Also the order of the moving average may be customized by the user (**`man`** argument). By default, `man = 11` and `th.ma = 5`.

The idea of this method is to be able to calculate the start of the main pollen season when the season has not finished yet. Moreover, it allows to establish the start and end even if there is a huge amount of lacking data within the main pollen season. Its similar to the "grains" method but taking into account the moving mean to avoid daily variability.

You might understand it better by consulting the plots obtained.

**IMPORTANT :** if **`export.plot=TRUE`**, the plots are **EXPORTED** in "plot_AeRobiology" folder.

Let's calculate it with a moving average of 7 days. We are going to establish the start and end of the main pollen season when the moving average reaches 4 pollen grains/m3: ```{r echo=TRUE, results='hide', fig.keep='first'} calculate_ps(munich_pollen[,c(1,6)], method = "moving", man = 7, th.ma = 4) ``` ## Advanced examples ### Southern Hemisphere & interannual types As you may have noticed, we have been using the function under an "European" point of view: the calculations are from 1st January to 31th December. The researchers of the **Southern Hemisphere** are used to work with **interanual pollen seasons**. Don't worry, we haven't forgotten you! 1) You can work **from 1st June to 31th May** by means of the argument **`def.season="interannual"`**: ```{r echo=TRUE, results='hide', fig.keep='first'} calculate_ps(munich_pollen[,c(1,3)], method = "moving", man = 7, th.ma = 4, def.season = "interannual") ```

In this method, the season belongs to the first year of the pair of years, i.e: from June 2017 to May 2018 -> season "2017".

2) You can **center the main pollen season in the average peak day** (182 days before and after the average date of the peak): ```{r echo=TRUE, results='hide', fig.keep='first'} calculate_ps(munich_pollen[,c(1,6)], method = "percentage", perc=95, def.season = "peak") ```

In this last method, the season belongs to the year in which the average peak date - 182 days is located, i.e.: if the average peak date is in January 2013, the season is called "2012" in the data.frames.

### Interpolation in detail Pollen time series frequently have different gaps with no data and this fact could be a problem for the calculation of specific methods for defining the pollen season even providing incorrect results. In this sense **by default a linear interpolation will be carried out to complete these gaps before to define the pollen season (`interpolation = TRUE`)**. Additionally, the users may select other interpolation methods using the **`int.method` argument**, as "lineal", "movingmean", "spline" or "tseries". Some advanced users may have noticed that you can't use directly all the arguments of **interpollen()** through **calculate_ps()**. You only are able to select the interpolation method. Nevertheless, it is not impossible to use them: 1) **Use interpollen()** with all the arguments you want and **overwrite your interpolated database in an object**: ```{r echo=TRUE, results='hide', fig.keep='first'} CompleteData<-interpollen(munich_pollen, method="spline", ndays=3, spar=0.7, plot = TRUE, maxdays = 3, result = "wide") ``` 2) Then, **use "CompleteData" instead of "munich_pollen" (or your database) in the following steps**: ```{r echo = TRUE, results='hide',fig.keep='all'} plot_ps(CompleteData, pollen.type="Alnus", year=2013, fill.col = "orange", axisname = "AeRobiology custom units") ```

**Note**: You might select **`interpolation=FALSE`** in the other functions you use after interpolating manually. In some cases it doesn't matter, but if you want to interpolate only gaps with less than 3 days and you did in the first step, if you don't use `interpolation=FALSE` in calculate_ps function, a second interpolation will be carried out by using the default method (gaps with less than 30 days and lineal interpolation). You will obtain a database with the gaps of less than 3 days interpolated by "spline" method and the rest with the "lineal" method.

# Function pollen_calendar This function **calculates the pollen calendar from a historical database** of several pollen types using different designs. The main arguments are: * **data**: A `data.frame` with the first column in "Date" format and the rest of the columns in "numeric" format (pollen types). * **method**: for choosing the method to generate the pollen calendar. The options are "heatplot", "violinplot" and "phenological". * **n.types**: indicating the number of the most abundant pollen types showed in the pollen calendar * **start.month**: ranging 1-12 indicates the number of the month (January-December) when the beginning of the pollen calendar must be considered, see more details [here](https://CRAN.R-project.org/package=AeRobiology). * **export.plot**: specifying if a plot with the pollen calendar saved in the working directory will be required or not. Other arguments have been incorporated in relation to the format to export the plot (`export.format`). * **result**: **"plot"** or **"table"**. **Data exportation:** - If `export.plot = TRUE` this plot displaying the pollen calendar will be exported within the **Plot_AeRobiology** directory created in the working directory. - If `export.plot = TRUE` and `export.format = pdf`, a pdf file with the pollen calendar will be saved within the **plot_AeRobiology** directory, created in the working directory. Additional characteristics may be incorporated to the exportation as pdf file. - If `export.plot = TRUE` and `export.format = png`, a png file with the pollen calendar will be saved within the **plot_AeRobiology** directory, created in the working directory. Additional characteristics may be incorporated to the exportation as png file. **Exclusive arguments** for pollen calendars generated as **"heatplot"**: * **period**: specifying the interval time considered to generate the pollen calendar. The options are "weekly" and "daily." * **method.classes**: indicating the method to define the classes used for classifying the average pollen concentrations. The options are "exponential" and "custom". * **n.classes**: specifying the number of classes that will be used when `method.classes = custom`. * **classes**: specifying a numeric vector with the desired thresholds to define the different classes. * **color**: choosing different options such as "green", "red", "blue", "purple"or "black". **Exclusive arguments** for pollen calendars generated as **"phenological"**: * **perc1, perc2**: both ranging 0-100. These values represent the percentage of the total annual pollen included in the pollen season, removing (100-percentage)/2% of the total pollen before and after the pollen season. Two percentages must be specified to define the "main pollination period" (`perc1`) and the "early/late pollination" (`perc2`) based on the "phenological" method proposed by [Werchan et al. (2018)](https://link.springer.com/article/10.1007/s40629-018-0055-1). * **th.pollen**: specifying the minimum threshold of the average pollen concentration which will be used to generate the pollen calendar. Days below this threshold will not be considered and days above this threshold will be considered as "possible occurrence" as proposed by [Werchan et al. (2018)](https://link.springer.com/article/10.1007/s40629-018-0055-1). This argument also works for "violinplot" pollen calendar

By default the database will be interpolated according to the **interpollen** function, and also other arguments for this function are incorporated in the **pollen_calendar** function.

## "heatplot" method This pollen calendar is constructed based on the **daily or weekly average of pollen concentrations** (depending on the preferences of the user, who may select "daily" or "weekly" as **`period` argument**). Then, these averages may be classified in different categories following different methods selected by the user. An example of this pollen calendar may be consulted in [Rojo et al. (2016)](https://link.springer.com/article/10.1007/s10661-016-5129-2). This method to design pollen calendars is an **adaptation from the pollen calendar proposed by Spieksma (1991)**, who considered 10-day periods instead of daily or weekly periods. First, we are going to generate a pollen calendar based on the **heatplot**, designed with **green** color and constructed with **daily** averages: ```{r echo = TRUE, results='hide',fig.keep='all', warning=FALSE} pollen_calendar(munich_pollen, method = "heatplot", period = "daily", color = "green", interpolation = FALSE) ``` In all cases, the table used by the pollen calendar with the averaged values will be created. This table can be visualized. To do that we have to set the argument **result = "table"** For example (an extract): ```{r echo = TRUE, fig.keep='all', warning=FALSE} average_values<-pollen_calendar(munich_pollen, method = "heatplot", period = "daily", color = "green", interpolation = FALSE, result = "table") knitr::kable(average_values[82:90, ], format = "html", booktabs = TRUE) ``` By default the method for defining the classes for the pollen calendar is according to the **exponential** method. Nevertheless, the classes can be customized by **`classes` argument**: ```{r echo = TRUE, results='hide',fig.keep='all', warning=FALSE} pollen_calendar(data = munich_pollen, method = "heatplot", period = "daily", color = "red", method.classes = "custom", n.classes = 5, classes = c(5, 25, 50, 200), interpolation = FALSE) ``` In addition, for species with their pollen season occurring between two natural years, the start of the pollen calendar can be selected by the **`start.month` argument**: ```{r echo = TRUE, results='hide',fig.keep='all', warning=FALSE} pollen_calendar(data = munich_pollen, method = "heatplot", period = "daily", color = "purple", method.classes = "custom", n.classes = 5, classes = c(5, 25, 50, 200), start.month = 11, na.remove = FALSE, interpolation = FALSE) ```

**`NA` (no data) can be removed by using `na.remove` argument.**

Also we can generate a pollen calendar based on the **heatplot** design with **weekly** averages. ```{r echo = TRUE, results='hide',fig.keep='all', warning=FALSE} pollen_calendar(data = munich_pollen, method = "heatplot", period = "weekly", color = "blue", method.classes = "exponential", n.types = 4, y.start = 2011, y.end = 2014, interpolation = FALSE) ``` In this case, we have included other restrictive arguments as **`n.types`** limiting the number of pollen types and **`y.start`** and **`y.end`**, limiting the period to be considered for the pollen calendar. ## "phenological" method This pollen calendar is based on the phenological definition of the pollen season and adapted from the methodology proposed by [Werchan et al. (2018)](https://link.springer.com/article/10.1007/s40629-018-0055-1). After obtaining the **daily average pollen concentrations** for the most abundant pollen types, different pollination periods are calculated. The **main pollination period** is calculated according to the percentage defined by `perc1` argument (selected by the user, 80% by default; red) of the annual total pollen. For example, if `perc1 = 80`, the beginning of the high season is marked when the 10% of the annual value is reached; the end is selected when 90% is reached. The **early/late pollination period** is defined with the `perc2` argument (selected by the user, 99% of the total annual pollen by default; orange), i.e.: the start of this period will be when the 0.5% is reached and the end when the 99.5% is reached. For this kind of pollen calendar the `th.pollen` argument defines the **possible occurrence** period as adapted by [Werchan et al. (2018)](https://link.springer.com/article/10.1007/s40629-018-0055-1), considering the entire period between the first and the last day when this pollen level is reached (yellow). ```{r echo = TRUE, results='hide',fig.keep='all', warning=FALSE} pollen_calendar(data = munich_pollen, method = "phenological", n.types = 5, y.start = 2011, y.end = 2014, interpolation = FALSE) ``` Furthermore, different criteria can be customized: ```{r echo = TRUE, results='hide',fig.keep='all', warning=FALSE} pollen_calendar(data = munich_pollen, method = "phenological", perc1 = 90, perc2 = 95, th.pollen = 5, interpolation = FALSE) ``` In this last case, the pollen calendar has been generated with more restrictive criteria for `perc1`, `perc2` and `th.pollen`. ## "violinplot" method This pollen calendar is based on the **pollen intensity**, and adapted from the pollen calendar published by [ORourke (1990)](https://link.springer.com/article/10.1007/BF02539105). At first, the daily averages of the pollen concentrations are calculated and then, these averages are represented by using the violin plot graph.

The shape of the violin plot represents the pollen intensity of the pollen types in a relative way, i.e.: the values will be calculated as **relative measurements** regarding to the most abundant pollen type in annual amounts. Therefore, this pollen calendar shows a relative comparison between the pollen intensity of the pollen types but **without scales and units**.

```{r echo = TRUE, results='hide', fig.keep='all', warning=FALSE} pollen_calendar(data = munich_pollen, method = "violinplot", y.start = 2012, y.end = 2015, interpolation = FALSE) ``` In addition, **`th.pollen`** can be established, specifying the minimum pollen concentration considered (E.g.: 10 pollen grains/m3): ```{r echo = TRUE, results='hide', fig.keep='all', warning=FALSE} pollen_calendar(data = munich_pollen, method = "violinplot", th.pollen = 10, na.rm = FALSE, interpolation = FALSE) ``` # Functions iplot_pollen() and iplot_years() These functions have been designed for a **quick view** of your data for discussions or interpretations of them. **Interactive plots are displayed**. It might be interesting for group meetings or real time exposition of results. The functions create a emergent screen in which you can select the pollen/spore type or the years you want to plot.

**IMPORTANT :** To stop the real time visualization and continue using the package you must click in the **"Stop" signal.**

The function **iplot_pollen()** is to plot the pollen data during one season. The arguments are: * **data**: A `data.frame` object. This data.frame should include a first column in "Date" format and the rest of columns in format "numeric" belonging to each pollen type by column. * **year:** An `integer` value specifying the year to display. This is a mandatory argument. The function **iplot_years()** is to plot the pollen data during one season. The arguments are: * **data**: A `data.frame` object. This data.frame should include a first column in "Date" format and the rest of columns in format "numeric" belonging to each pollen type by column. * **pollen:** A `character` string with the name of the particle to show. This character must match with the name of a column in the input database. This is a mandatory argument. ```{r echo = TRUE, results='hide',fig.keep='all', eval=FALSE} iplot_pollen(munich_pollen, year = 2012) iplot_years(munich_pollen, pollen = "Betula") ```

**Note:** We are not able to plot interactive figures in this document. Please, run the codes above in your R session.

# Function plot_summary() The function **plot_summary()** is to plot the pollen data during several seasons. Also to plot the averaged pollen season over the study period. It is possible to plot the relative abundance per day and smoothing the pollen season by calculating a moving average. The main arguments are: * **data**: A `data.frame` object. This data.frame should include a first column in "Date" format and the rest of columns in "numeric" format belonging to each pollen type by column. * **pollen:** A `character` string with the name of the particle to show. This character must match with the name of a column in the input database. This is a mandatory argument. * **mave:** An `integer` value specifying the order of the moving average applied to the data. By default, `mave = 1`. * **normalized:** A `logical` value specifying if the visualization shows real pollen data (`normalized = FALSE`) or the percentage of every day over the whole pollen season (`normalized = TRUE`). By default, `normalized = FALSE`. * **axisname:** A `character` string specifying the title of the y axis. By default, `axisname = "Pollen grains / m3"` * Others... ```{r echo = TRUE, results='hide',fig.keep='all'} plot_summary(munich_pollen, pollen = "Betula") ``` In some cases the user may want to reduce the "noise" of the daily values by **calculating moving means** (E.g.: 5 days moving mean; `mave = 5`): ```{r echo = TRUE, results='hide',fig.keep='all'} plot_summary(munich_pollen, pollen = "Betula", mave = 5) ``` You might be also interested in representing as background the **percentage** each day suppose to the main pollen season (`normalized = TRUE`): ```{r echo = TRUE, results='hide',fig.keep='all'} plot_summary(munich_pollen, pollen = "Betula", mave = 5, normalized = TRUE) ```

**interpollen() function** is integrated in this function. Consult the "help" section of this function to use it.

# Function plot_normsummary() The function **plot_normsummary()** has been designed to plot the pollen data amplitude during several seasons: daily average pollen concentration over the study period, maximum pollen concentration of each day over the study period and minimum pollen concentration of each day value over the study period. It is possible to plot the relative abundance per day and smoothing the pollen season by calculating a moving average. The main arguments similar to **plot_summary()**, but as a result **you will obtain the the max-min range of the study period** instead of the values for every year. ```{r echo = TRUE, results='hide',fig.keep='all'} plot_normsummary(munich_pollen, pollen = "Betula", color.plot = "red") ``` The **maximum values** are marked in **red** and the **minimum values in white**. The **average** is the **black line**. Of course, you can change the color (`color.plot` argument): ```{r echo = TRUE, results='hide',fig.keep='all'} plot_normsummary(munich_pollen, pollen = "Betula", color.plot = "green", mave = 5, normalized = TRUE) ```

**interpollen() function** is integrated in this function. Consult the "help" section of this function to use it.

# Function analyse_trend() The function **analyse_trend()** has been created to calculate the main seasonal indexes of the pollen season ("Start Date", "Peak Date", "End Date" and "Pollen Integral"), as well as **trends analysis** of those parameters over the seasons. It is a summary dot plot showing the distribution of the main seasonal indexes over the years. The results can also be stored in two folders termed **"plot_AeRobiology"** and **"table_AeRobiology"**, which will be created in your working directory (only if `export.result=TRUE` & `export.plot=TRUE`). You can decide which result to be returned from the function by setting the argument **result**: *result = "table"* or *result = "plot"*. The function allows you to decide if you want to interpolate the data or not, by the argument **`interpolation`**, it also allows you to select the interpolation method by the argument **`int.method`**. Furthermore, it allows you to select the pollen season definition method, by the argument **`method`** and additional arguments for the function **calculate_ps()**. Some arguments about the visualization are: * **split**: A `logical` argument. If `split = TRUE`, the plot is separated in two according to the nature of the variables (i.e. dates or pollen concentrations). **This argument was a solution to reduce the scale of the x-axis when "total pollen"" variable has a very high/low slope**. By default, `split = TRUE`. * **quantil**: A `numeric` value (between 0 and 1) indicating the quantile of data to be displayed in the graphical output of the function. `quantil = 1` would show all the values, however a lower quantile will exclude the most extreme values of the sample. **This argument was designed after realizing of a common problem: when plotting the results with `split=FALSE`, some "outlier" results increased a lot the scale of the x-axis, making unreadable the rest of results**. Our solution was to create an argument to exclude these outliers results and to be able to observe the main results in an appropriate scale. Furthermore, this argument may be used to split the parameters using a different sampling units (e.g. dates and pollen concentrations) can be used low vs high values of quantil argument (e.g. 0.5 vs 1). Also can be used an extra argument: `split`. By default, `quantil = 0.75`. **This argument only works when `split=FALSE`.** * **significant**: A `numeric` value indicating the significant level to be considered in the linear trends analysis. This p level is displayed in the graphical output of the function (as a number in the legend and as a black ring in the graphical representation). By default, `significant = 0.05`. ```{r echo = TRUE, results='hide',fig.keep='all'} analyse_trend(munich_pollen, interpolation = FALSE, export.result = FALSE, export.plot = FALSE, result="plot") ``` Lets see what happen if we don't split the graphics: ```{r echo = TRUE, results='hide',fig.keep='all'} analyse_trend(munich_pollen, interpolation = FALSE, export.result = FALSE, export.plot = FALSE, split = FALSE, quantil = 1, result="plot") ``` Now the results are less readable, but this kind of graphical representation may be interesting for some people. When plotting all the results together, it might be useful to exclude some outliers: ```{r echo = TRUE, results='hide',fig.keep='all'} analyse_trend(munich_pollen, interpolation = FALSE, export.result = FALSE, export.plot = FALSE, split=FALSE, quantil = 0.5, result="plot") ``` As you can appreciate, a lot of points have been omitted. You can also change the significance level as mentioned above. Why don`t we try?: ```{r echo = TRUE, results='hide',fig.keep='all'} analyse_trend(munich_pollen, interpolation = FALSE, export.result = FALSE, export.plot = FALSE, significant = 1, result = "plot") ``` Now everything is significant! Have a look to the numbers by setting *result = "table"* (the default output). ```{r echo = TRUE, results='hide',fig.keep='all'} analyse_trend(munich_pollen, interpolation = FALSE, export.result = FALSE, export.plot = FALSE, significant = 1, result="table") ```

**interpollen() function** is integrated in this function. Consult the "help" section of this function to use it.

If **result = "plot"**, the function returns a list of objects of class *ggplot2*; if **result = "table"**, the function returns a *data.frame*. By default, **result = "table"**.

# Function plot_trend() The function **plot_trend()** has been created to calculate the main seasonal indexes of the pollen season ("Start Date", "Peak Date", "End Date" and "Pollen Integral") and their trends analysis over the seasons. It produces plots showing the distribution of the main seasonal indexes over the years. The **results are stored in two folders termed "plot_AeRobiology" and "table_AeRobiology"** which will be located in your working directory (only if `export.result=TRUE` & `export.plot=TRUE`). ```{r echo = TRUE, results='hide',fig.keep='all', eval=FALSE} plot_trend(munich_pollen, interpolation = FALSE, export.plot = TRUE, export.result = TRUE) ```

**NOTE:** The plots are not shown in your R environment. Because of the high amount of graphs, they are stored in new folders created in your working directory (search were you have saved the R project and you will find the folders).

The **interval of confidence** only appears if more than 6 dots are plotted. If not, a line crossing all the dots is plotted.

**interpollen() function** is integrated in this function. Consult the "help" section of this function to use it.

# Function iplot_abundance() The function **iplot_abundance()** generates a barplot based on the relative abundance in the air (as percentage) of each pollen/spore type with respect to the total amounts. The main arguments are: * **n.types:** A `numeric` (integer) value specifying the number of the most abundant pollen types that must be represented in the plot of the relative abundance. A more detailed information about the selection of the considered pollen types may be consulted [here](https://CRAN.R-project.org/package=AeRobiology). The `n.types` argument will be 15 types by default. * **y.start/y.end:** A `numeric` (integer) value specifying the period selected to calculate relative abundances of the pollen types (start year - end year). If `y.start` and `y.end` are not specified (`NULL`), the entire database will be used to generate the pollen calendar. The `y.start` and `y.end` arguments will be `NULL` by default. * **col.bar:** A `character` string specifying the color of the bars to generate the graph showing the relative abundances of the pollen types. The `color` argument will be #E69F00 by default, but any color may be selected. * **type.plot: ** A `character` string specifying the type of plot selected to show the plot showing the relative abundance of the pollen types. The implemented types that may be used are: "static" (generates a static ggplot object) and "dynamic" (generates a dynamic plotly object). * **exclude:** A character string vector with the names of the pollen types to be excluded from the plot. * Others... ```{r echo = TRUE, results='hide',fig.keep='all'} iplot_abundance(munich_pollen, interpolation = FALSE, export.plot = FALSE, export.result = FALSE) ``` Now we are going to reduce the number of types: ```{r echo = TRUE, results='hide',fig.keep='all'} iplot_abundance(munich_pollen, interpolation = FALSE, export.plot = FALSE, export.result = FALSE, n.types = 3) ``` We can also select the abundance for only one year and change the color: ```{r echo = TRUE, results='hide',fig.keep='all'} iplot_abundance(munich_pollen, interpolation = FALSE, export.plot = FALSE, export.result = FALSE, n.types = 3, y.start = 2011, y.end = 2011, col.bar = "#d63131") ``` Furthermore, we can make it interactive: ```{r echo = TRUE, results='hide',fig.keep='all'} iplot_abundance(munich_pollen, interpolation = FALSE, export.plot = FALSE, export.result = FALSE, n.types = 3, y.start = 2011, y.end = 2011, col.bar = "#d63131", type.plot = "dynamic") ```

**interpollen() function** is integrated in this function. Consult the "help" section of this function to use it.

# Function iplot_pheno() The function **iplot_pheno()** generates a boxplot based on phenological parameters (start_dates and end_dates) which are calculated by the estimation of the main parameters of the pollen season. The main arguments are: * **data**: A `data.frame` object including the general database where calculation of the pollen season must be applied in order to generate the phenological plot based on the start_dates and end_dates. This data.frame must include a first column in "Date" format and the rest of columns in "numeric" format belonging to each pollen type by column. * **method**: A `character` string specifying the method applied to calculate the pollen season and the main parameters. The implemented methods that can be used are: "percentage", "logistic", "moving", "clinical" or "grains". A more detailed information about the different methods for defining the pollen season may be consulted in calculate_ps function. * **n.types**: A `numeric` (integer) value specifying the number of the most abundant pollen types that must be represented. A more detailed information about the selection of the considered pollen types may be consulted [here](https://CRAN.R-project.org/package=AeRobiology). The `n.types` argument will be 15 by default. * **type.plot**: A `character` string specifying the type of plot selected to show the phenological plot. The implemented types that may be used are: "static" (generates a static ggplot object) and "dynamic" (generates a dynamic plotly object). * Other arguments related to **calculate_ps() function** ```{r echo = TRUE, results='hide',fig.keep='all'} iplot_pheno(munich_pollen, method= "percentage", perc=80, int.method="spline", n.types = 8) ``` We can change the method to establish the main pollen season and the number of pollen types to show: ```{r echo = TRUE, results='hide',fig.keep='all'} iplot_pheno(munich_pollen, method= "clinical", n.clinical = 3, int.method="spline", n.types = 4) ``` Furthermore, we can make the plot interactive to obtain more information by clicking in each object: ```{r echo = TRUE, results='hide',fig.keep='all', eval=FALSE} iplot_pheno(munich_pollen, method= "clinical", n.clinical = 3, int.method="spline", n.types = 4, type.plot = "dynamic") ``` # Function plot_ps() The function **plot_ps() function** was designed to plot the main pollen season of a single pollen type and year. Some of the arguments are: * **data**: A `data.frame` object including the general database where interpolation must be performed. This data.frame must include a first column in "Date" format and the rest of the columns in "numeric" format. Each column must contain information of one pollen type. It is not necessary to insert missing gaps, the function will automatically detect them. * **pollen.type**: A `character` string specifying the name of the pollen type which will be plotted. The name must be exactly the same that appears in the column name. Mandatory argument with no default. * **year**: A `numeric` (integer) value specifying the season to be plotted. The season does not necessary fit a natural year. See calculate_ps for more details. Mandatory argument with no default. * **days**: A `numeric` (integer) specifying the number of days beyond each side of the main pollen season that will be represented. The `days` argument will be 30 by default. * **fill.col**: A `character` string specifying the name of the color to fill the main pollen season in the plot. See ggplot function for more details. The `fill.col` argument will be "turquoise4" by default. It uses the ggplot color codes. * **axisname**:A `character` string specifying the title of the y axis. By default, `axisname = expression(paste("Pollen grains / m"^"3"))` * Others...

**calculate_ps function** is integrated in this function. Consult [here](https://CRAN.R-project.org/package=AeRobiology) for more information. **Interpolation is mandatory** for this function. Due to technical encoding, you must use interpollen function through calculate_ps function. Consult calculate_ps for more information.

```{r echo = TRUE, results='hide',fig.keep='all'} plot_ps(munich_pollen, pollen.type="Alnus", year=2011) ``` If you misspell some pollen type name, the function will tell you: ```{r echo = TRUE, results='hold', error=TRUE} plot_ps(munich_pollen, pollen.type="Alnuscdscscr", year=2011) ``` **Let`s test more arguments:** ```{r echo = TRUE, results='hide',fig.keep='all'} plot_ps(munich_pollen, pollen.type="Alnus", year=2013, method= "percentage", perc=95 ,int.method = "lineal") ``` As you have noticed, the arguments `method`, `perc` and `int.method` are from **calculate_ps function**. What about changing the interpolation method? ```{r echo = TRUE, results='hide',fig.keep='all'} plot_ps(munich_pollen, pollen.type="Alnus", year=2013, method= "percentage", perc=95 ,int.method = "movingmean") ``` Do you want a larger scale? ```{r echo = TRUE, results='hide',fig.keep='all'} plot_ps(munich_pollen, pollen.type="Alnus", year=2013, days = 90) ``` Maybe a different color and y-axis name? ```{r echo = TRUE, results='hide',fig.keep='all'} plot_ps(munich_pollen, pollen.type="Alnus", year=2013, fill.col = "orange", axisname = "AeRobiology custom units") ``` # Function plot_hour() Please keep in mind that the function *plot_hour()* is only available after package **version 2.0**. The input data must be a data.frame object with the structure long. Where the first two columns are factors indicating the pollen and the location. The 3 and 4 columns are POSIXct, showing the date with the hour. Where the third column is the beginning of the concentration *from* and the fourth column is the end time of the concentration data *to*. The fifth column shows the concentrations of the different pollen types as numeric. Please see the example 3-hourly data from the automatic pollen monitor BAA500 from Munich and Viechtach in Bavaria (Germany) data("POMO_pollen"), supplied by ePIN Network supported by the Bavarian Government. We will load an example dataset about 3-hourly data from the ePIN network in Bavaria (Germany). The dataset contains information of 3-hourly concentrations of pollen in the atmosphere of Munich (DEBIED) and Viechtach (DEVIEC) during the year 2018. Pollen types included: "Poaceae" and "Pinus". The data were obtained by the automatic pollen monitor BAA500 from Munich and Viechtach in Bavaria (Germany) data("POMO_pollen"), supplied by the public ePIN Network supported by the Bavarian Government. The ePIN Network was built by Das Bayerische Landesamt für Gesundheit und Lebensmittelsicherheit (LGL) in collaboration with Zentrum Allergie und Umwelt (ZAUM). ```{r echo = FALSE, warning=FALSE, message=FALSE} library(ggplot2) library(dplyr) ``` ```{r echo = TRUE} data("POMO_pollen") ``` The function plots pollen data expressed in concentrations with time resolution higher than 1 day (e.g. hourly, bi-hourly concentrations). If the argument *result = "plot"*, the function returns a list of objects of class **ggplot2**; if *result = "table"*, the function returns a **data.frame** with the hourly patterns. ```{r echo = TRUE, message=FALSE} plot_hour(POMO_pollen) ``` To display a table we have to set the argument *result = "table"*. ```{r message=FALSE} TO<-plot_hour(POMO_pollen, result ="table") knitr::kable(TO[1:10,], caption = "3-Hourly patterns", row.names = FALSE, digits = 1, format = "html", booktabs = TRUE) ``` We can also split the different stations by setting the argument *locations = TRUE*. ```{r, message=FALSE, echo=TRUE} plot_hour(POMO_pollen, locations = TRUE) ``` # Function plot_heathour() An alternative to *plot_hour()* is *plot_heathour()*, which shows a summary of all particles with a heatplot. The input data should have the same format as for *plot_hour()*. ```{r, message=FALSE, echo=TRUE} plot_heathour(POMO_pollen) ``` You can also set the colors by the arguments: *low.col*, *mid.col* and *high.col*. E.g. ```{r, message=FALSE, echo=TRUE} plot_heathour(POMO_pollen, low.col = "darkgreen", mid.col = "moccasin", high.col = "brown") ``` By setting *locations = TRUE* you can split the result by locations: ```{r, message=FALSE, echo=TRUE} plot_heathour(POMO_pollen, locations = TRUE) ```