This package gathers different tools for managing aerobiological databases, elaborating the main calculations and visualization of results. In a first step, data may be checked using tools for quality control and all missing gaps can be completed. Then, the main parameters of the pollen season can be calculated and represented graphically. Multiple graphical tools are available: pollen calendars, phenological plots, time series, tendencies, interactive plots, abundance plots…
During this tutorial we are going to use the pollen data from Munich which are integrated in the package. This will allow you to follow the tutorial obtaining the same results. If you want to follow the tutorial by using your own data, check the next section.
munich_pollen
is a data set containing
information of daily concentrations of pollen in the atmosphere of
Munich during the years 2010_2015. Pollen types included: “Alnus”,
“Betula”, “Taxus”, “Fraxinus”, “Poaceae”, “Quercus”, “Ulmus” and
“Urtica”.
Data were obtained at Munich (Zentrum Allergie und Umwelt, ZAUM) using a
Hirst_type volumetric pollen trap (Hirst, 1952) following the standard
methodology (VDI4252_4_2016). Some gaps have been added to test some
functions of the package
(e.g. quality_control(),
interpollen()).
The data were
obtained by the research team of Prof. Jeroen Buters (Christine Weil
& Ingrid Weichenmeier). At the Zentrum Allergie und Umwelt (ZAUM, directed by Prof. Carsten
B. Schmidt_Weber).
You can load the data in your working environment by typing:
This section has been developed for people who are not familiar with R. If you have experience in loading databases in R, you might not be interested in this section and you can skip it.
There are several functions to import data from Excel. Arbitrarily, we are going to use the package readxl.
Now we can import the data with the function read_xlsx(). I strongly recommend you to have your Excel file in the same folder in which you are working with R. This will avoid long paths to your files and also will prevent broken paths when some files have been moved. If you are working with an old database, you might be interested in the read_xls() function.
You can also select the sheet you want to import:
Now you should have loaded your data in the object
Mydata
. Let’s check if there is any mistake:
## 'data.frame': 2191 obs. of 9 variables:
## $ date : Date, format: "2010-01-01" "2010-01-02" ...
## $ Alnus : num NA NA NA NA NA NA NA NA NA NA ...
## $ Betula : num NA NA NA NA NA NA NA NA NA NA ...
## $ Taxus : num NA NA NA NA NA NA NA NA NA NA ...
## $ Fraxinus: num NA NA NA NA NA NA NA NA NA NA ...
## $ Poaceae : num NA NA NA NA NA NA NA NA NA NA ...
## $ Quercus : num NA NA NA NA NA NA NA NA NA NA ...
## $ Ulmus : num NA NA NA NA NA NA NA NA NA NA ...
## $ Urtica : num NA NA NA NA NA NA NA NA NA NA ...
Note: I’m using munich_pollen
database
as example but you should use Mydata
.
As you can see, the object is a data.frame
, the first
column is “Date” type and the rest are “num”, which means numeric. This
is the format you database must have. Normally it is
automatically recognized by the import function, but if not, check the
functions:
as.Date(),
as.numeric() and
as.data.frame().
Your database must have this format to start working with the package. The functions are designed to warn you in case the format is not correct but there may be some exceptions. This is the most important step when using the package and some strange errors reported by the functions can be solved by doing this. Sometimes the date column has 2 different types simultaneously (“POSIXlt” and “POSIXct”) and it might cause mistakes. You can use one or another, but not both. Furthermore, it is strongly recommended to use “Date” format instead of “POSIXlt” or “POSIXct” despite theoretically there may not be problems by using them.
Please, don`t exasperate. This is the slowest step of using the package and the most important. Once you have your data loaded, you only have to spend a few minutes in each function to have your results. If you have some unsolving problems, search on the internet. Some other people must have had your same problem and it should be solved in some forums. If not, write us and we will help you as soon as possible: [email protected]
If you want to import your data from csv files, you might be interested in read.csv() function.
This function was designed to check the quality of an historical database of several pollen types. Since many of the quality requirements depend on the criteria selected to establish the main pollen season, this function has the calculate_ps() function integrated. You can insert arguments of calculate_ps() in this function to select the method you want for calculating the main pollen season. calculate_ps() details can be consulted later in this same manuscript.
If result = “plot”, this function returns a
graphical resume of the quality of the database marking
with different red intensities the “weak-points” of it, and if
result = “table” a data.frame
with the
detailed reasons of their “weakness”.
It establishes the quality according to 5 different criteria:
It checks if the main pollen season cannot be calculated according to the minimal requirements: lack of data for this pollen type and year. This criteria appears as “Complete” in the generated data.frame.
If the start, end or peak dates of the main pollen season
have been interpolated (or a date near them specified by
int.window
argument). It is based on the
following premise: If this day or a nearby date is missing/interpolated,
the selected date as start/peak/end might not be the real one. These
criteria appear as “Start”, “Peak” and
“End” in the generated data.frame.
The percentage of missing data within the main pollen
season. It calculates the number of days which have been
interpolated by interpollen() function and their percentage of the main
pollen season. If a high percentage of the main pollen season has been
interpolated, the information of this season might not be reliable. The
maximal percentage allowed can be specified by the
perc.miss
argument. This filter appears as
“Comp.MPS” in the generated data.frame.
When running the function, it gives you a graphical resume with different color intensity depending on the risk of including each pollen/spore type for a concrete year and a data.frame with detailed information about the reasons of this evaluation.
If the filter has been successfully passed, it appears as
TRUE
in the generated data.frame
. If not, it
appears as FALSE
.
## type seasons Complete Start Peak End Comp.MPS Risk
## 1 Alnus 2010 TRUE TRUE TRUE FALSE FALSE 2
## 2 Alnus 2011 TRUE FALSE TRUE TRUE FALSE 2
## 3 Alnus 2012 TRUE TRUE TRUE TRUE TRUE 0
## 4 Alnus 2013 TRUE FALSE TRUE FALSE TRUE 2
## 5 Alnus 2014 TRUE FALSE TRUE FALSE FALSE 3
## 6 Alnus 2015 TRUE FALSE FALSE TRUE FALSE 3
There are lots of arguments for this function. You can consult them in the “help” section of the function in R or in the official document of the package.
Let’s suppose we want to check the quality of our database, but we
want to calculate it for a 80% defined Main Pollen
Season (arguments ps.method
and
perc
). We don’t want to take into account
years with a missing data less than 4 days away from
the start, peak or end dates (argument
int.window
). Furthermore, we only want to
exclude years with more than 50% of interpolated data
within the main pollen season (argument
perc.miss
):
As you would have noticed, there are several differences between this result and the previous one.
The function interpollen() was designed to complete all the missing gaps within a database in just one step. All the gaps of each pollen/spore type are completed simultaneously. There are different methods to do so which may be more or less appropriate according to the particularities of your database. The arguments are:
data.frame
object. This
data.frame should include a first column in “Date” format and the rest
of the columns in “numeric” format belonging to each pollen/spore type
by column.character
string specifying
the method applied to calculate and generate the pollen missing data.
The implemented methods that can be used are: “lineal”,
“movingmean”, “spline”,
“tseries” or “neighbour”. The method
argument will be “lineal” by default.numeric (integer)
value
specifying the maximum number of consecutive days with missing data that
the algorithm is going to interpolate. If the gap is bigger than the
argument value, the gap will not be interpolated. Not valid with
“tseries” method. The
maxdays argument will be 30 by
default. This argument might be very interesting to avoid long
interpolated gaps.logical (integer)
argument. If
TRUE
, graphical previews of the input database will be plot
at the end of the interpolation process. All the interpolated gaps will
be marked in red. The plot
argument will be
TRUE
by default.As you can see above, when result = “long”, the missing gaps which have been completed by the algorithm are marked in red. It displays similar graphs for each pollen/spore type within your database. Furthermore, the “Interpolated” data.frame is produced as “long” with another new column indicating if each data has been interpolated or not (1 means interpolated and 0 means original data).
## Date Type Pollen Interpolated
## 1 2010-02-24 Poaceae 0 0
## 2 2010-02-25 Poaceae 0 0
## 3 2010-02-26 Poaceae 0 0
## 4 2010-02-27 Poaceae 0 0
## 5 2010-02-28 Poaceae 0 0
## 6 2010-03-01 Poaceae 0 0
If you want to integrate all the new data in your database, you can assign the function to a new object and specify the argument result = “wide”:
Now your database without gaps appears as “CompleteData”.
As you might have noticed, if you add
plot=FALSE
, the graphs are not shown.
You don’t have to do this before applying another function of this package. “interpollen” function has been integrated in the main functions of the package so you can choose whether you want your data to be completed or not by a specific argument in each function.E.g.:
calculate_ps(munich_pollen, method="percentage", interpolation=TRUE, int.method = "lineal", plot = F)
interpollen()
searches for correlative dates
and, when they are not consecutive natural days, it carries out the
interpolation. When the second date is an earlier date than the previous
one, it reports an error. You have to correct these issues in your
database before applying the interpollen function.
In this function there are 5 different methods to interpolate missing gaps. Each method can be more appropriate than the others under some specific circumstances. Interpolated data are not real data, but including such estimations can reduce the error of some calculations such as the main pollen season by percentage or a pollen calendar, i.e.: if you keep the gap, these days would count as 0 for the calculations and this would suppose a bigger error than estimating pollen concentrations for these days. Obviously, this will also suppose a bigger error than having the real data.
As mentioned above, there are many methods to interpolate your missing data. The simplest one was the showed in the previous example: “lineal”. It traces a straight line between each extreme of the gap. The pollen/spore concentration of the missing days is calculated according to this line. This method may be appropriate for dates without pollen/spores or for small gaps.
Nevertheless, there are other method which can be more effective during the pollen season.
This method calculates the moving mean of the pollen daily
concentrations. The gaps are fulfilled by calculating the moving
mean of the concentrations for this particular day. The window
of the moving mean is centered on this day and its size is the result of
multiplicating the gap size by the factor
argument. E.g.: If you have a gap of 3 days and your factor is
2, the gaps are replaced by the value of a moving mean of 6 days (3 x 2)
centered in this particular day and not taking into account the missing
days for the mean value. Furthermore, it is a dynamic function:
for each gap of the database the window size of the moving mean
changes depending of each gap size.
This method carries out the interpolation by tracing a
polinomic function to link both extremes of the gap.
The polinomic function is chosen according to the best fitting equation
to the previous and posterior days of the gap (second, third, fourth
degree…). The number of days of each side of the gap which will be taken
into account for calculating the spline regression are specified by
ndays
argument. The smoothness of the
adjustment can be specified by the spar
argument.
Note: By changing the ndays
and
spar
argument, very different results can be obtained.
E.g.:
If you have long time series of data, this might be the most suitable method for you. This method analyses the time series of the pollen/spore database and performs a seasonal-trend decomposition based on LOESS (Cleveland et al., 1990). It extracts the seasonality of the historical database and uses it to predict the missing data by performing a linear regression with the target year.
Seasonality is represented in grey, real data in black and interpolated data in red. This method improves with long time series of years. Strange results may be caused by lack of years or small sampling periods.
Other near stations provided by the user are used to
interpolate the missing data of the target station. First of
all, a Spearman correlation is performed between the target station and
the neighbour stations to discard the neighbour stations with a
correlation coefficient smaller than mincorr
value. For each gap, a linear regression is performed between
the neighbour stations and the target stations to determine the equation
which converts the pollen concentrations of the neighbour stations into
the pollen concentration of the target station. Only neighbour stations
without any missing data during the gap period are taken into account
for each gap.
You can include 4 different databases of neighbour stations with the
arguments data2
, data3
,
data4
and data5
. The format
of these databases must be the same than the original one and the name
of the pollen/spore types must be exactly the same.
With the mincorr
argument you can
specify the minimal correlation coefficient (Spearman correlation) that
the neighbour stations must have with the target station to be taken
into account for a concrete gap. This process is completely independent
for each gap of each pollen type.
The function calculate_ps() is one of the core functions of the AeRobiology package. It was designed to calculate the main parameters of the pollen season with regard to phenology and pollen intensity from a historical database of several pollen types. The function can use the most common methods to define the main pollen season.
This function has several arguments, but don’t worry, we are going to perform examples later. The main arguments are:
data.frame
object including
the general database where calculation of the pollen season must be
applied. This data.frame must include a first column in “Date” format
and the rest of columns in “numeric” format belonging to each pollen
type by column.character
string specifying
the method applied to calculate the pollen season and the main
parameters. The implemented methods that can be used are: “percentage”,
“logistic”, “moving”, “clinical” or “grains”. A more detailed
information about the different methods for defining the pollen season
may be consulted here) or later
in this same document.numeric
value. The number of
days whose pollen concentration is bigger than this threshold is
calculated for each year and pollen type. This value will be obtained in
the results of the function. The th.day
argument will be
100 by default.numeric
value ranging 0_100.
This argument is valid only for method = "percentage"
. This
value represents the percentage of the total annual pollen included in
the pollen season, removing (100_perc)/2% of the total pollen before and
after of the pollen season. The perc
argument will be 95 by
default.character
string
specifying the method for selecting the best annual period to calculate
the pollen season. The pollen seasons may occur within the natural year
between two years. The implemented options that can be used are:
“natural”, “interannual” or “peak”. The def.season
argument
will be “natural” by default. A more detailed information about the
different methods for selecting the best annual period to calculate the
pollen season may be consulted here.logical
value. This
argument is valid only for the “logistic” method. If FALSE
the reduction of the pollen data is not applicable. If TRUE
a reduction of the peaks above a certain level (red.level
argument) will be carried out before the definition of the pollen
season. The reduction argument will be FALSE
by default. A
more detailed information about the reduction process may be consulted
here.numeric
value ranging 0_1
specifying the percentile used as level to reduce the peaks of the
pollen series before the definition of the pollen season. This argument
is valid only for the “logistic” method. The red.level
argument will be 0.90 by default, specifying the percentile 90.numeric (integer)
value
belonging to options of 4, 5 or 6 specifying the derivative that will be
applied to calculate the asymptotes which determines the pollen season
using the “logistic” method. This argument is valid only for the
“logistic” method. The derivative
argument will be 5 by
default. More information may be consulted here.numeric (integer)
value
specifying the order of the moving average applied to calculate the
pollen season using the “moving” method. This argument is valid only for
the “moving” method. The man
argument will be 11 by
default.numeric
value specifying the
threshold used for the “moving” method for defining the beginning and
the end of the pollen season. This argument is valid only for the
“moving” method. The th.ma
argument will be 5 by
default.numeric (integer)
value
specifying the number of days which must exceed a given threshold
(th.pollen argument) for defining the beginning and the end of the
pollen season. This argument is valid only for the “clinical” method.
The n.clinical
argument will be 5 by default.numeric (integer)
value specifying the time window during which the conditions must be
evaluated for defining the beginning and the end of the pollen season
using the “clinical” method. This argument is valid only for the
“clinical” method. The window.clinical
argument will be 7
by default.numeric (integer)
value specifying the time window during which the conditions must be
evaluated for defining the beginning and the end of the pollen season
using the “grains” method. This argument is valid only for the “grains”
method. The window.grains
argument will be 5 by
default.numeric
value specifying
the threshold that must be exceeded during a given number of days
(n.clinical or window.grains arguments) for defining the beginning and
the end of the pollen season using the “clinical” or “grains” methods.
This argument is valid only for the “clinical” or “grains” methods. The
th.pollen
argument will be 10 by default.numeric
value specifying the
pollen threshold that must be exceeded by the sum of daily pollen during
a given number of days (n.clinical
argument) exceeding a
given daily threshold (th.pollen
argument) for defining the
beginning and the end of the pollen season using the “clinical” method.
This argument is valid only for the “clinical” method. The
th.sum
argument will be 100 by default.character
string specifying
the parameters considered according to a specific pollen type for
calculating the pollen season using the “clinical” method. The
implemented pollen types that may be used are: “birch”, “grasses”,
“cypress”, “olive” or “ragweed”. As result for selecting any of these
pollen types the parameters n.clinical
,
window.clinical
, th.pollen
and
th.sum
will be automatically adjusted for the “clinical”
method. If no pollen types are specified (type = "none"
),
these parameters will be considered by the user. This argument is valid
only for the “clinical” method. The type
argument will be
“none” by default.logical
value. If
FALSE
the interpolation of the pollen data is not
applicable. If TRUE
an interpolation of the pollen series
will be applied to complete the gaps with no data before the calculation
of the pollen season. The interpolation
argument will be
TRUE
by default. A more detailed information about the
interpolation method may be consulted here.character
string
specifying the method selected to apply the interpolation method in
order to complete the pollen series. The implemented methods that may be
used are: “lineal”, “movingmean”, “spline” or “tseries”. The
int.method
argument will be “lineal” by default.numeric
(integer value)
specifying the maximum number of consecutive days with missing data that
the algorithm is going to interpolate. If the gap is bigger than the
argument value, the gap will not be interpolated. Not valid with
int.method = "tseries"
. The maxdays
argument
will be 30 by default.logical
value
specifying if a set of plots based on the definition of the pollen
season and saved in the working directory will be required or not. If
FALSE
graphical results will not be saved. If
TRUE
a pdf file for each pollen type showing graphically
the definition of the pollen season for each studied year will be saved
within the plot_AeRobiology directory created in the working directory.
The export.plot
argument will be FALSE
by
default.logical
value
specifying if a excel file including all parameters for the definition
of the pollen season saved in the working directory will be required or
not. If FALSE
the results will not exported. If
TRUE
the results will be exported as xlsx file including
all parameters calculated from the definition of the pollen season
within the table_AeRobiology directory created in the working directory.
The export.result
argument will be FALSE
by
default.logical
value specifying if
the plots are generated in the plot history. The plot
argument will be TRUE
by default.character
string specifying
the output for the function. The implemented outputs that may be
obtained are: "table"
and "list"
. The argument
result
will be "table"
by default.export.plot=TRUE
, the plots
are EXPORTED in “plot_AeRobiology” folder.
This function allows to calculate the pollen season using five different methods which are described below. After calculating the start_date, end_date and peak_date for the pollen season all rest of parameters are calculated:
export.result=TRUE
, these
objects will be exported as xlsx file within the “table_AeRobiology”
directory created in your working directory. This Excel file will have a
sheet for each pollen type and a last sheet with the legend of all the
abbreviations and the method selected to calculate them. We
strongly recommend to use export.result=TRUE
Although the results can be extracted in the working directory in a xlsx file, they can also be assigned to an object, and visualized, for example (an extract):
type | seasons | st.dt | st.jd | en.dt | en.jd | ln.ps | sm.tt | sm.ps | pk.val | pk.dt | pk.jd | ln.prpk | sm.prpk | ln.pspk | sm.pspk | daysth | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
24 | Poaceae | 2015 | 2015-05-12 | 132 | 2015-08-10 | 222 | 91 | 4175.0 | 3998.0 | 322 | 2015-06-04 | 155 | 24 | 2089.0 | 67 | 1909.0 | 9 |
25 | Quercus | 2010 | 2010-04-08 | 98 | 2010-05-26 | 146 | 49 | 737.5 | 701.5 | 189 | 2010-04-30 | 120 | 23 | 416.0 | 26 | 285.5 | 1 |
26 | Quercus | 2011 | 2011-04-07 | 97 | 2011-05-16 | 136 | 40 | 2159.0 | 2079.0 | 271 | 2011-04-23 | 113 | 17 | 990.0 | 23 | 1089.0 | 6 |
27 | Quercus | 2012 | 2012-04-04 | 95 | 2012-05-19 | 140 | 46 | 1281.0 | 1219.0 | 210 | 2012-05-02 | 123 | 29 | 789.0 | 17 | 430.0 | 3 |
28 | Quercus | 2013 | 2013-04-27 | 117 | 2013-05-21 | 141 | 25 | 1987.0 | 1889.0 | 373 | 2013-05-09 | 129 | 13 | 1368.0 | 12 | 521.0 | 5 |
29 | Quercus | 2014 | 2014-04-05 | 95 | 2014-05-19 | 139 | 45 | 1334.0 | 1270.0 | 134 | 2014-04-24 | 114 | 20 | 653.0 | 25 | 617.0 | 2 |
30 | Quercus | 2015 | 2015-04-11 | 101 | 2015-05-14 | 134 | 34 | 1250.5 | 1200.5 | 224 | 2015-05-05 | 125 | 25 | 950.5 | 9 | 250.0 | 4 |
31 | Taxus | 2010 | 2010-03-21 | 80 | 2010-04-18 | 108 | 29 | 5665.0 | 5389.0 | 1340 | 2010-03-24 | 83 | 4 | 2123.0 | 25 | 3266.0 | 9 |
The result is also plotted by default (plot = TRUE).
## [1] "2010 Poaceae"
## [1] "2011 Poaceae"
## [1] "2012 Poaceae"
## [1] "2013 Poaceae"
## [1] "2014 Poaceae"
## [1] "2015 Poaceae"
## type seasons st.dt st.jd en.dt en.jd ln.ps sm.tt sm.ps pk.val
## 1 Poaceae 2010 2010-05-18 138 2010-08-25 237 100 1861 1773 179
## 2 Poaceae 2011 2011-05-02 122 2011-08-24 236 115 2244 2134 163
## 3 Poaceae 2012 2012-05-07 128 2012-08-17 230 103 2127 2023 84
## 4 Poaceae 2013 2013-05-18 138 2013-09-13 256 119 2481 2367 166
## 5 Poaceae 2014 2014-05-06 126 2014-08-09 221 96 2413 2306 97
## 6 Poaceae 2015 2015-05-12 132 2015-08-10 222 91 4175 3998 322
## pk.dt pk.jd ln.prpk sm.prpk ln.pspk sm.pspk daysth
## 1 2010-06-11 162 25 705 75 1068 4
## 2 2011-05-25 145 24 588 91 1546 2
## 3 2012-05-25 146 19 349 84 1674 0
## 4 2013-06-13 164 27 773 92 1594 4
## 5 2014-06-11 162 37 1218 59 1088 0
## 6 2015-06-04 155 24 2089 67 1909 9
This is a commonly used method for defining the pollen season based
on the elimination of a certain percentage in the beginning and
the end of the pollen season ( Nilsson
and Persson, 1981; Andersen,
1991). For example, if the pollen season is based on the 95% of the
total annual pollen (perc = 95
), the start_date of the
pollen season is marked as the day in which 2.5% of the total pollen is
registered and the end_date of the pollen season is marked as the day in
which 97.5% of the total pollen is registered.
In this case we have calculated the main pollen season based on the
90% of the total annual pollen. Results are stored in the
“table_AeRobiology” folder since
export.result=TRUE
.
You can select different methods of interpolation or even to not interpolate gaps:
calculate_ps(munich_pollen[,c(1,6)], method = "percentage", perc = 90, export.result = FALSE, int.method = "spline")
calculate_ps(munich_pollen[,c(1,6)], method = "percentage", perc = 75, export.result = FALSE, interpolation = FALSE)
export.plot=TRUE
, the plots
are EXPORTED in “plot_AeRobiology” folder.
This method was developed by Ribeiro et al. (2007) and modified by
Cunha et al. (2015). It is based on fitting annually a
non-linear logistic regression model to the daily
accumulated curve for each pollen type. This logistic function and the
different derivatives were considered to calculate the
start_date and end_date of the pollen
season, based on the asymptotes when pollen amounts are
stabilized on the beginning and the end of the accumulated
curve. For more information about the method, see Ribeiro et
al. (2007) and Cunha
et al. (2015). Three different derivatives may be
used (derivative
argument) 4, 5 or 6 that represent
from higher to lower restrictive criterion for defining
the pollen season.
This method may be complemented with an optional method for
reduction the peaks values (reduction = TRUE
),
thus avoiding the effect of the great influence of extreme peaks. In
this sense, peaks values will be cut below a certain
level that the user may select based on a percentile analysis
of peaks. For example, red.level = 0.90
will cut all peaks
above the percentile 90.
pollen_season<-calculate_ps(munich_pollen[,c(1,6)], method = "logistic", derivative = 5, reduction=FALSE)
type | seasons | st.dt | st.jd | en.dt | en.jd | ln.ps | sm.tt | sm.ps | pk.val | pk.dt | pk.jd | ln.prpk | sm.prpk | ln.pspk | sm.pspk | daysth |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Poaceae | 2010 | 2010-05-16 | 136 | 2010-07-22 | 203 | 68 | 1861 | 1666.0 | 179 | 2010-06-11 | 162 | 27 | 724 | 41 | 942.0 | 4 |
Poaceae | 2011 | 2011-04-27 | 117 | 2011-07-16 | 197 | 81 | 2244 | 1994.0 | 163 | 2011-05-25 | 145 | 29 | 617 | 52 | 1377.0 | 2 |
Poaceae | 2012 | 2012-04-30 | 121 | 2012-07-23 | 205 | 85 | 2127 | 1932.0 | 84 | 2012-05-25 | 146 | 26 | 390 | 59 | 1542.0 | 0 |
Poaceae | 2013 | 2013-02-16 | 47 | NA | NA | NA | 2481 | NA | 166 | 2013-06-13 | 164 | NA | NA | NA | NA | NA |
Poaceae | 2014 | 2014-05-07 | 127 | 2014-07-18 | 199 | 73 | 2413 | 2149.0 | 97 | 2014-06-11 | 162 | 36 | 1206 | 37 | 943.0 | 0 |
Poaceae | 2015 | 2015-05-08 | 128 | 2015-07-08 | 189 | 62 | 4175 | 3491.5 | 322 | 2015-06-04 | 155 | 28 | 2132 | 34 | 1359.5 | 9 |
In the previous case, the reduction wasn’t carried out
(reduction=FALSE
) and all the peaks were conserved. We can
cut some peaks and change the derivative
:
pollen_season<-calculate_ps(munich_pollen[,c(1,6)], method = "logistic", derivative = 6, reduction=FALSE, red.level = 0.8)
type | seasons | st.dt | st.jd | en.dt | en.jd | ln.ps | sm.tt | sm.ps | pk.val | pk.dt | pk.jd | ln.prpk | sm.prpk | ln.pspk | sm.pspk | daysth |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Poaceae | 2010 | 2010-05-08 | 128 | 2010-07-30 | 211 | 84 | 1861 | 1731 | 179 | 2010-06-11 | 162 | 35 | 748 | 49 | 983 | 4 |
Poaceae | 2011 | 2011-04-17 | 107 | 2011-07-26 | 207 | 101 | 2244 | 2073 | 163 | 2011-05-25 | 145 | 39 | 640 | 62 | 1433 | 2 |
Poaceae | 2012 | 2012-04-19 | 110 | 2012-08-03 | 216 | 107 | 2127 | 2035 | 84 | 2012-05-25 | 146 | 37 | 400 | 70 | 1635 | 0 |
Poaceae | 2013 | 2013-02-20 | 51 | NA | NA | NA | 2481 | NA | 166 | 2013-06-13 | 164 | NA | NA | NA | NA | NA |
Poaceae | 2014 | 2014-04-28 | 118 | 2014-07-27 | 208 | 91 | 2413 | 2270 | 97 | 2014-06-11 | 162 | 45 | 1240 | 46 | 1030 | 0 |
Poaceae | 2015 | 2015-05-01 | 121 | 2015-07-15 | 196 | 76 | 4175 | 3678 | 322 | 2015-06-04 | 155 | 35 | 2158 | 41 | 1520 | 9 |
As you can observe, results are a bit different.
export.plot=TRUE
, the plots
are EXPORTED in “plot_AeRobiology” folder.
This method was proposed by Pfaar et al. (2017). It is based on the expert consensus in relation to pollen exposure and the relationship with allergic symptoms derived of the literature.
Different periods may be defined by this method: the pollen season, the high pollen season and the high pollen days:
The start_date and end_date of
the pollen season were defined as a certain
number of days (n.clinical
argument) within a time window
period (window.clinical
argument) exceeding a certain
pollen threshold (th.pollen
argument) which summation is
above a certain pollen sum (th.sum
argument).
All these parameters are established for each pollen type according to
Pfaar et al. (2017) and using the type
argument these parameters may be automatically adjusted for the specific
pollen types (“birch”, “grasses”, “cypress”, “olive” or “ragweed”).
Furthermore, the user may change all parameters to do a
customized definition of the pollen season.
The start_date and end_date of
the high pollen season were defined as three
consecutive days exceeding a certain pollen threshold
(th.day
argument).
The number of high pollen days will also be
calculated exceeding this pollen threshold (th.day
). For
more information about the method, see Pfaar et
al. (2017).
Running the following example, the main pollen season will be
established according to the birch requirements: more than 5 days within
a week in which more than 10 pollen grains/m3 are registered and whose
sum exceeds 100 pollen grains/m3. The high pollen days are those which
exceed 100 pollen grains/m3
(n.clinical=5, window.clinical=7, th.pollen=10, th.sum=100, th.day=100
)
type | seasons | st.dt | st.jd | en.dt | en.jd | ln.ps | sm.tt | sm.ps | pk.val | pk.dt | pk.jd | ln.prpk | sm.prpk | ln.pspk | sm.pspk | daysth | st.dt.hs | st.jd.hs | en.dt.hs | en.jd.hs |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Poaceae | 2010 | 2010-06-04 | 155 | 2010-07-15 | 196 | 42 | 1861 | 1475 | 179 | 2010-06-11 | 162 | 8 | 598 | 34 | 877 | 4 | 2010-06-10 | 161 | 2010-06-13 | 164 |
Poaceae | 2011 | 2011-05-17 | 137 | 2011-07-04 | 185 | 49 | 2244 | 1673 | 163 | 2011-05-25 | 145 | 9 | 436 | 40 | 1237 | 2 | NA | NA | NA | NA |
Poaceae | 2012 | 2012-05-19 | 140 | 2012-07-19 | 201 | 62 | 2127 | 1799 | 84 | 2012-05-25 | 146 | 7 | 284 | 55 | 1515 | 0 | NA | NA | NA | NA |
Poaceae | 2013 | 2013-06-05 | 156 | 2013-07-27 | 208 | 53 | 2481 | 1910 | 166 | 2013-06-13 | 164 | 9 | 708 | 44 | 1202 | 4 | NA | NA | NA | NA |
Poaceae | 2014 | 2014-05-19 | 139 | 2014-07-07 | 188 | 50 | 2413 | 1948 | 97 | 2014-06-11 | 162 | 24 | 1107 | 26 | 841 | 0 | NA | NA | NA | NA |
Poaceae | 2015 | 2015-05-07 | 127 | 2015-07-27 | 208 | 82 | 4175 | 3893 | 322 | 2015-06-04 | 155 | 29 | 2144 | 53 | 1749 | 9 | 2015-05-29 | 149 | 2015-06-06 | 157 |
The code above returns the same result than:
pollen_season<-calculate_ps(munich_pollen[,c(1,6)], method = "clinical", n.clinical = 5, window.clinical = 7, th.pollen = 10, th.sum = 100, th.day = 100)
type | seasons | st.dt | st.jd | en.dt | en.jd | ln.ps | sm.tt | sm.ps | pk.val | pk.dt | pk.jd | ln.prpk | sm.prpk | ln.pspk | sm.pspk | daysth | st.dt.hs | st.jd.hs | en.dt.hs | en.jd.hs |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Poaceae | 2010 | 2010-06-04 | 155 | 2010-07-15 | 196 | 42 | 1861 | 1475 | 179 | 2010-06-11 | 162 | 8 | 598 | 34 | 877 | 4 | 2010-06-10 | 161 | 2010-06-13 | 164 |
Poaceae | 2011 | 2011-05-17 | 137 | 2011-07-04 | 185 | 49 | 2244 | 1673 | 163 | 2011-05-25 | 145 | 9 | 436 | 40 | 1237 | 2 | NA | NA | NA | NA |
Poaceae | 2012 | 2012-05-19 | 140 | 2012-07-19 | 201 | 62 | 2127 | 1799 | 84 | 2012-05-25 | 146 | 7 | 284 | 55 | 1515 | 0 | NA | NA | NA | NA |
Poaceae | 2013 | 2013-06-05 | 156 | 2013-07-27 | 208 | 53 | 2481 | 1910 | 166 | 2013-06-13 | 164 | 9 | 708 | 44 | 1202 | 4 | NA | NA | NA | NA |
Poaceae | 2014 | 2014-05-19 | 139 | 2014-07-07 | 188 | 50 | 2413 | 1948 | 97 | 2014-06-11 | 162 | 24 | 1107 | 26 | 841 | 0 | NA | NA | NA | NA |
Poaceae | 2015 | 2015-05-07 | 127 | 2015-07-27 | 208 | 82 | 4175 | 3893 | 322 | 2015-06-04 | 155 | 29 | 2144 | 53 | 1749 | 9 | 2015-05-29 | 149 | 2015-06-06 | 157 |
We have resumed all these parameters under the argument
type=birch
to facilitate its application.
export.plot=TRUE
, the plots are
EXPORTED in “plot_AeRobiology” folder.
This method was proposed by Galan et al. (2001) originally in olive pollen but after also applied in other pollen types.
The start_date and end_date of the
pollen season were defined as a certain number of days
(window.grains
argument) exceeding a certain pollen
threshold (th.pollen
argument). For more
information about the method, see Galan et
al. (2001).
We want to establish the main pollen season start in the first day when more than 3 consecutive days have more than 2 pollen grains/m3, and the end in the last day in which these conditions are fulfilled:
The definition of the pollen season is based on the application of a
moving average to the pollen series in order to obtain
the general seasonality of the pollen curve avoiding the great
variability of the daily fluctuations. Thus, the start_date and
the end_date will be established when the curve of the moving average
reaches a given pollen threshold
(th.ma
argument). Also the order of the
moving average may be customized by the user
(man
argument). By default,
man = 11
and th.ma = 5
.
The idea of this method is to be able to calculate the start of the main pollen season when the season has not finished yet. Moreover, it allows to establish the start and end even if there is a huge amount of lacking data within the main pollen season. Its similar to the “grains” method but taking into account the moving mean to avoid daily variability.
You might understand it better by consulting the plots obtained.
export.plot=TRUE
, the plots are
EXPORTED in “plot_AeRobiology” folder.
Let’s calculate it with a moving average of 7 days. We are going to establish the start and end of the main pollen season when the moving average reaches 4 pollen grains/m3:
As you may have noticed, we have been using the function under an “European” point of view: the calculations are from 1st January to 31th December. The researchers of the Southern Hemisphere are used to work with interanual pollen seasons. Don’t worry, we haven’t forgotten you!
def.season="interannual"
:calculate_ps(munich_pollen[,c(1,3)], method = "moving", man = 7, th.ma = 4, def.season = "interannual")
In this method, the season belongs to the first year of the pair of years, i.e: from June 2017 to May 2018 -> season “2017”.
In this last method, the season belongs to the year in which the average peak date - 182 days is located, i.e.: if the average peak date is in January 2013, the season is called “2012” in the data.frames.
Pollen time series frequently have different gaps with no data and
this fact could be a problem for the calculation of specific methods for
defining the pollen season even providing incorrect results. In this
sense by default a linear interpolation will be carried out to
complete these gaps before to define the pollen season
(interpolation = TRUE
). Additionally, the users
may select other interpolation methods using the
int.method
argument, as “lineal”,
“movingmean”, “spline” or “tseries”.
Some advanced users may have noticed that you can’t use directly all the arguments of interpollen() through calculate_ps(). You only are able to select the interpolation method. Nevertheless, it is not impossible to use them:
CompleteData<-interpollen(munich_pollen, method="spline", ndays=3, spar=0.7, plot = TRUE, maxdays = 3, result = "wide")
plot_ps(CompleteData, pollen.type="Alnus", year=2013, fill.col = "orange", axisname = "AeRobiology custom units")
Note: You might select
interpolation=FALSE
in the other functions
you use after interpolating manually. In some cases it doesn’t matter,
but if you want to interpolate only gaps with less than 3 days and you
did in the first step, if you don’t use interpolation=FALSE
in calculate_ps function, a second interpolation will be carried out by
using the default method (gaps with less than 30 days and lineal
interpolation). You will obtain a database with the gaps of less than 3
days interpolated by “spline” method and the rest with the “lineal”
method.
This function calculates the pollen calendar from a historical database of several pollen types using different designs. The main arguments are:
data.frame
with the first
column in “Date” format and the rest of the columns in “numeric” format
(pollen types).export.format
).Data exportation:
export.plot = TRUE
this plot displaying the pollen
calendar will be exported within the Plot_AeRobiology
directory created in the working directory.export.plot = TRUE
and
export.format = pdf
, a pdf file with the pollen calendar
will be saved within the plot_AeRobiology directory,
created in the working directory. Additional characteristics may be
incorporated to the exportation as pdf file.export.plot = TRUE
and
export.format = png
, a png file with the pollen calendar
will be saved within the plot_AeRobiology directory,
created in the working directory. Additional characteristics may be
incorporated to the exportation as png file.Exclusive arguments for pollen calendars generated as “heatplot”:
method.classes = custom
.Exclusive arguments for pollen calendars generated as “phenological”:
perc1
) and the “early/late
pollination” (perc2
) based on the “phenological” method
proposed by Werchan
et al. (2018).This pollen calendar is constructed based on the daily or
weekly average of pollen concentrations (depending on the
preferences of the user, who may select “daily” or “weekly” as
period
argument). Then, these averages may
be classified in different categories following different methods
selected by the user. An example of this pollen calendar may be
consulted in Rojo
et al. (2016). This method to design pollen calendars is an
adaptation from the pollen calendar proposed by Spieksma
(1991), who considered 10-day periods instead of daily or
weekly periods.
First, we are going to generate a pollen calendar based on the heatplot, designed with green color and constructed with daily averages:
pollen_calendar(munich_pollen, method = "heatplot", period = "daily", color = "green", interpolation = FALSE)
In all cases, the table used by the pollen calendar with the averaged values will be created. This table can be visualized. To do that we have to set the argument result = “table” For example (an extract):
average_values<-pollen_calendar(munich_pollen, method = "heatplot", period = "daily", color = "green", interpolation = FALSE, result = "table")
knitr::kable(average_values[82:90, ], format = "html", booktabs = TRUE)
jd | Alnus | Betula | Taxus | Fraxinus | Poaceae | Quercus | Ulmus | Urtica | |
---|---|---|---|---|---|---|---|---|---|
82 | 82 | 33.166667 | 0.000000 | 416.1667 | 0.1666667 | 0.0000000 | 0.0000000 | 8.333333 | 0 |
83 | 83 | 21.000000 | 1.833333 | 443.4167 | 10.5833333 | 0.0000000 | 0.0000000 | 17.666667 | 0 |
84 | 84 | 15.000000 | 3.333333 | 387.7500 | 32.5833333 | 0.0000000 | 0.8333333 | 17.500000 | 0 |
85 | 85 | 3.800000 | 6.333333 | 263.5833 | 62.5000000 | 0.1666667 | 0.0000000 | 12.166667 | 0 |
86 | 86 | 10.400000 | 11.833333 | 116.2000 | 7.3333333 | 0.0000000 | 1.3333333 | 8.000000 | 0 |
87 | 87 | 8.400000 | 26.666667 | 37.3000 | 30.0000000 | 0.0000000 | 1.1666667 | 6.000000 | 0 |
88 | 88 | 4.800000 | 52.833333 | 205.7000 | 68.6000000 | 0.0000000 | 0.6666667 | 13.416667 | 0 |
89 | 89 | 7.000000 | 118.333333 | 335.2500 | 240.2000000 | 0.0000000 | 2.6666667 | 34.750000 | 0 |
90 | 90 | 7.333333 | 121.200000 | 480.1667 | 185.6000000 | 0.0000000 | 1.1666667 | 17.500000 | 0 |
By default the method for defining the classes for the pollen
calendar is according to the exponential method.
Nevertheless, the classes can be customized by
classes
argument:
pollen_calendar(data = munich_pollen, method = "heatplot", period = "daily", color = "red", method.classes = "custom", n.classes = 5, classes = c(5, 25, 50, 200), interpolation = FALSE)
In addition, for species with their pollen season occurring between
two natural years, the start of the pollen calendar can be selected by
the start.month
argument:
pollen_calendar(data = munich_pollen, method = "heatplot", period = "daily", color = "purple", method.classes = "custom", n.classes = 5, classes = c(5, 25, 50, 200), start.month = 11, na.remove = FALSE, interpolation = FALSE)
NA
(no data) can be removed by using
na.remove
argument.
Also we can generate a pollen calendar based on the heatplot design with weekly averages.
pollen_calendar(data = munich_pollen, method = "heatplot", period = "weekly", color = "blue", method.classes = "exponential", n.types = 4, y.start = 2011, y.end = 2014, interpolation = FALSE)
In this case, we have included other restrictive arguments as
n.types
limiting the number of pollen
types and y.start
and
y.end
, limiting the period to be
considered for the pollen calendar.
This pollen calendar is based on the phenological definition of the pollen season and adapted from the methodology proposed by Werchan et al. (2018).
After obtaining the daily average pollen
concentrations for the most abundant pollen types, different
pollination periods are calculated. The main pollination
period is calculated according to the percentage defined by
perc1
argument (selected by the user, 80% by default; red)
of the annual total pollen. For example, if perc1 = 80
, the
beginning of the high season is marked when the 10% of the annual value
is reached; the end is selected when 90% is reached. The
early/late pollination period is defined with the
perc2
argument (selected by the user, 99% of the total
annual pollen by default; orange), i.e.: the start of this period will
be when the 0.5% is reached and the end when the 99.5% is reached. For
this kind of pollen calendar the th.pollen
argument defines
the possible occurrence period as adapted by Werchan
et al. (2018), considering the entire period between the first and
the last day when this pollen level is reached (yellow).
pollen_calendar(data = munich_pollen, method = "phenological", n.types = 5, y.start = 2011, y.end = 2014, interpolation = FALSE)
Furthermore, different criteria can be customized:
pollen_calendar(data = munich_pollen, method = "phenological", perc1 = 90, perc2 = 95, th.pollen = 5, interpolation = FALSE)
In this last case, the pollen calendar has been generated with more
restrictive criteria for perc1
, perc2
and
th.pollen
.
This pollen calendar is based on the pollen intensity, and adapted from the pollen calendar published by ORourke (1990). At first, the daily averages of the pollen concentrations are calculated and then, these averages are represented by using the violin plot graph.
pollen_calendar(data = munich_pollen, method = "violinplot", y.start = 2012, y.end = 2015, interpolation = FALSE)
In addition, th.pollen
can be
established, specifying the minimum pollen concentration considered
(E.g.: 10 pollen grains/m3):
These functions have been designed for a quick view of your data for discussions or interpretations of them. Interactive plots are displayed. It might be interesting for group meetings or real time exposition of results. The functions create a emergent screen in which you can select the pollen/spore type or the years you want to plot.
The function iplot_pollen() is to plot the pollen data during one season. The arguments are:
data.frame
object. This
data.frame should include a first column in “Date” format and the rest
of columns in format “numeric” belonging to each pollen type by
column.integer
value specifying the
year to display. This is a mandatory argument.The function iplot_years() is to plot the pollen data during one season. The arguments are:
data.frame
object. This
data.frame should include a first column in “Date” format and the rest
of columns in format “numeric” belonging to each pollen type by
column.character
string with the
name of the particle to show. This character must match with the name of
a column in the input database. This is a mandatory argument.Note: We are not able to plot interactive figures in this document. Please, run the codes above in your R session.
The function plot_summary() is to plot the pollen data during several seasons. Also to plot the averaged pollen season over the study period. It is possible to plot the relative abundance per day and smoothing the pollen season by calculating a moving average. The main arguments are:
data.frame
object. This
data.frame should include a first column in “Date” format and the rest
of columns in “numeric” format belonging to each pollen type by
column.character
string with the
name of the particle to show. This character must match with the name of
a column in the input database. This is a mandatory argument.integer
value specifying the
order of the moving average applied to the data. By default,
mave = 1
.logical
value specifying
if the visualization shows real pollen data
(normalized = FALSE
) or the percentage of every day over
the whole pollen season (normalized = TRUE
). By default,
normalized = FALSE
.character
string
specifying the title of the y axis. By default,
axisname = "Pollen grains / m3"
In some cases the user may want to reduce the “noise” of the daily
values by calculating moving means (E.g.: 5 days moving
mean; mave = 5
):
You might be also interested in representing as background the
percentage each day suppose to the main pollen season
(normalized = TRUE
):
The function plot_normsummary() has been designed to plot the pollen data amplitude during several seasons: daily average pollen concentration over the study period, maximum pollen concentration of each day over the study period and minimum pollen concentration of each day value over the study period. It is possible to plot the relative abundance per day and smoothing the pollen season by calculating a moving average. The main arguments similar to plot_summary(), but as a result you will obtain the the max-min range of the study period instead of the values for every year.
The maximum values are marked in red and the minimum values in white. The average is the black line.
Of course, you can change the color (color.plot
argument):
plot_normsummary(munich_pollen, pollen = "Betula", color.plot = "green", mave = 5, normalized = TRUE)
The function analyse_trend() has been created to calculate the main seasonal indexes of the pollen season (“Start Date”, “Peak Date”, “End Date” and “Pollen Integral”), as well as trends analysis of those parameters over the seasons. It is a summary dot plot showing the distribution of the main seasonal indexes over the years.
The results can also be stored in two folders termed
“plot_AeRobiology” and
“table_AeRobiology”, which will be created in your
working directory (only if export.result=TRUE
&
export.plot=TRUE
). You can decide which result to be
returned from the function by setting the argument
result: result = “table” or result =
“plot”.
The function allows you to decide if you want to interpolate the data
or not, by the argument interpolation
, it
also allows you to select the interpolation method by the argument
int.method
. Furthermore, it allows you to
select the pollen season definition method, by the argument
method
and additional arguments for the
function
calculate_ps().
Some arguments about the visualization are:
logical
argument. If
split = TRUE
, the plot is separated in two according to the
nature of the variables (i.e. dates or pollen concentrations).
This argument was a solution to reduce the scale of the x-axis
when “total pollen”” variable has a very high/low slope. By
default, split = TRUE
.numeric
value (between 0
and 1) indicating the quantile of data to be displayed in the graphical
output of the function. quantil = 1
would show all the
values, however a lower quantile will exclude the most extreme values of
the sample. This argument was designed after realizing of a
common problem: when plotting the results with split=FALSE
,
some “outlier” results increased a lot the scale of the x-axis, making
unreadable the rest of results. Our solution was to create an
argument to exclude these outliers results and to be able to observe the
main results in an appropriate scale. Furthermore, this argument may be
used to split the parameters using a different sampling units
(e.g. dates and pollen concentrations) can be used low vs high values of
quantil argument (e.g. 0.5 vs 1). Also can be used an extra argument:
split
. By default, quantil = 0.75
.
This argument only works when
split=FALSE
.numeric
value
indicating the significant level to be considered in the linear trends
analysis. This p level is displayed in the graphical output of the
function (as a number in the legend and as a black ring in the graphical
representation). By default, significant = 0.05
.analyse_trend(munich_pollen, interpolation = FALSE, export.result = FALSE, export.plot = FALSE, result="plot")
Lets see what happen if we don’t split the graphics:
analyse_trend(munich_pollen, interpolation = FALSE, export.result = FALSE, export.plot = FALSE, split = FALSE, quantil = 1, result="plot")
Now the results are less readable, but this kind of graphical representation may be interesting for some people. When plotting all the results together, it might be useful to exclude some outliers:
analyse_trend(munich_pollen, interpolation = FALSE, export.result = FALSE, export.plot = FALSE, split=FALSE, quantil = 0.5, result="plot")
As you can appreciate, a lot of points have been omitted.
You can also change the significance level as mentioned above. Why don`t we try?:
analyse_trend(munich_pollen, interpolation = FALSE, export.result = FALSE, export.plot = FALSE, significant = 1, result = "plot")
Now everything is significant!
Have a look to the numbers by setting result = “table” (the default output).
analyse_trend(munich_pollen, interpolation = FALSE, export.result = FALSE, export.plot = FALSE, significant = 1, result="table")
The function plot_trend() has been created to calculate the main seasonal indexes of the pollen season (“Start Date”, “Peak Date”, “End Date” and “Pollen Integral”) and their trends analysis over the seasons. It produces plots showing the distribution of the main seasonal indexes over the years.
The results are stored in two folders termed
“plot_AeRobiology” and “table_AeRobiology” which will be
located in your working directory (only if
export.result=TRUE
&
export.plot=TRUE
).
The interval of confidence only appears if more than 6 dots are plotted. If not, a line crossing all the dots is plotted.
The function iplot_abundance() generates a barplot based on the relative abundance in the air (as percentage) of each pollen/spore type with respect to the total amounts.
The main arguments are:
numeric
(integer) value
specifying the number of the most abundant pollen types that must be
represented in the plot of the relative abundance. A more detailed
information about the selection of the considered pollen types may be
consulted here. The
n.types
argument will be 15 types by default.numeric
(integer)
value specifying the period selected to calculate relative abundances of
the pollen types (start year - end year). If y.start
and
y.end
are not specified (NULL
), the entire
database will be used to generate the pollen calendar. The
y.start
and y.end
arguments will be
NULL
by default.character
string specifying
the color of the bars to generate the graph showing the relative
abundances of the pollen types. The color
argument will be
#E69F00 by default, but any color may be selected.character
string
specifying the type of plot selected to show the plot showing the
relative abundance of the pollen types. The implemented types that may
be used are: “static” (generates a static ggplot object) and “dynamic”
(generates a dynamic plotly object).Now we are going to reduce the number of types:
iplot_abundance(munich_pollen, interpolation = FALSE, export.plot = FALSE, export.result = FALSE, n.types = 3)
We can also select the abundance for only one year and change the color:
iplot_abundance(munich_pollen, interpolation = FALSE, export.plot = FALSE, export.result = FALSE, n.types = 3, y.start = 2011, y.end = 2011, col.bar = "#d63131")
Furthermore, we can make it interactive:
iplot_abundance(munich_pollen, interpolation = FALSE, export.plot = FALSE, export.result = FALSE, n.types = 3, y.start = 2011, y.end = 2011, col.bar = "#d63131", type.plot = "dynamic")
The function iplot_pheno() generates a boxplot based on phenological parameters (start_dates and end_dates) which are calculated by the estimation of the main parameters of the pollen season. The main arguments are:
data.frame
object including
the general database where calculation of the pollen season must be
applied in order to generate the phenological plot based on the
start_dates and end_dates. This data.frame must include a first column
in “Date” format and the rest of columns in “numeric” format belonging
to each pollen type by column.character
string specifying
the method applied to calculate the pollen season and the main
parameters. The implemented methods that can be used are: “percentage”,
“logistic”, “moving”, “clinical” or “grains”. A more detailed
information about the different methods for defining the pollen season
may be consulted in calculate_ps function.numeric
(integer) value
specifying the number of the most abundant pollen types that must be
represented. A more detailed information about the selection of the
considered pollen types may be consulted here. The
n.types
argument will be 15 by default.character
string
specifying the type of plot selected to show the phenological plot. The
implemented types that may be used are: “static” (generates a static
ggplot object) and “dynamic” (generates a dynamic plotly object).We can change the method to establish the main pollen season and the number of pollen types to show:
Furthermore, we can make the plot interactive to obtain more information by clicking in each object:
The function plot_ps() function was designed to plot the main pollen season of a single pollen type and year. Some of the arguments are:
data.frame
object including
the general database where interpolation must be performed. This
data.frame must include a first column in “Date” format and the rest of
the columns in “numeric” format. Each column must contain information of
one pollen type. It is not necessary to insert missing gaps, the
function will automatically detect them.character
string
specifying the name of the pollen type which will be plotted. The name
must be exactly the same that appears in the column name. Mandatory
argument with no default.numeric
(integer) value
specifying the season to be plotted. The season does not necessary fit a
natural year. See calculate_ps for more details. Mandatory argument with
no default.numeric
(integer) specifying
the number of days beyond each side of the main pollen season that will
be represented. The days
argument will be 30 by
default.character
string
specifying the name of the color to fill the main pollen season in the
plot. See ggplot function for more details. The fill.col
argument will be “turquoise4” by default. It uses the ggplot color
codes.character
string specifying
the title of the y axis. By default,
axisname = expression(paste("Pollen grains / m"^"3"))
If you misspell some pollen type name, the function will tell you:
## Error in plot_ps(munich_pollen, pollen.type = "Alnuscdscscr", year = 2011): pollen.type: Please, insert only a pollen type which is in your database (type the exact name)
Let`s test more arguments:
plot_ps(munich_pollen, pollen.type="Alnus", year=2013, method= "percentage", perc=95 ,int.method = "lineal")
As you have noticed, the arguments method
,
perc
and int.method
are from
calculate_ps
function.
What about changing the interpolation method?
plot_ps(munich_pollen, pollen.type="Alnus", year=2013, method= "percentage", perc=95 ,int.method = "movingmean")
Do you want a larger scale?
Maybe a different color and y-axis name?
Please keep in mind that the function plot_hour() is only available after package version 2.0.
The input data must be a data.frame object with the structure long. Where the first two columns are factors indicating the pollen and the location. The 3 and 4 columns are POSIXct, showing the date with the hour. Where the third column is the beginning of the concentration from and the fourth column is the end time of the concentration data to. The fifth column shows the concentrations of the different pollen types as numeric. Please see the example 3-hourly data from the automatic pollen monitor BAA500 from Munich and Viechtach in Bavaria (Germany) data(“POMO_pollen”), supplied by ePIN Network supported by the Bavarian Government.
We will load an example dataset about 3-hourly data from the ePIN network in Bavaria (Germany). The dataset contains information of 3-hourly concentrations of pollen in the atmosphere of Munich (DEBIED) and Viechtach (DEVIEC) during the year 2018. Pollen types included: “Poaceae” and “Pinus”.
The data were obtained by the automatic pollen monitor BAA500 from Munich and Viechtach in Bavaria (Germany) data(“POMO_pollen”), supplied by the public ePIN Network supported by the Bavarian Government. The ePIN Network was built by Das Bayerische Landesamt für Gesundheit und Lebensmittelsicherheit (LGL) in collaboration with Zentrum Allergie und Umwelt (ZAUM).
The function plots pollen data expressed in concentrations with time resolution higher than 1 day (e.g. hourly, bi-hourly concentrations). If the argument result = “plot”, the function returns a list of objects of class ggplot2; if result = “table”, the function returns a data.frame with the hourly patterns.
## [[1]]
##
## [[2]]
##
## [[3]]
To display a table we have to set the argument result = “table”.
pollen | location | from | to | value | date | hour | hour2 | Hour | percent |
---|---|---|---|---|---|---|---|---|---|
Pinus | DEVIEC | 2018-05-09 03:00:00 | 2018-05-09 04:00:00 | 276.6 | 2018-05-09 | 3 | 4 | 3-4 | 12.5 |
Pinus | DEVIEC | 2018-05-10 03:00:00 | 2018-05-10 04:00:00 | 99.9 | 2018-05-10 | 3 | 4 | 3-4 | 4.9 |
Pinus | DEVIEC | 2018-05-11 03:00:00 | 2018-05-11 04:00:00 | 52.5 | 2018-05-11 | 3 | 4 | 3-4 | 3.6 |
Pinus | DEVIEC | 2018-05-12 03:00:00 | 2018-05-12 04:00:00 | 195.3 | 2018-05-12 | 3 | 4 | 3-4 | 9.2 |
Pinus | DEVIEC | 2018-05-13 03:00:00 | 2018-05-13 04:00:00 | 24.0 | 2018-05-13 | 3 | 4 | 3-4 | 1.5 |
Pinus | DEVIEC | 2018-05-14 03:00:00 | 2018-05-14 04:00:00 | 85.8 | 2018-05-14 | 3 | 4 | 3-4 | 5.2 |
Pinus | DEVIEC | 2018-05-15 03:00:00 | 2018-05-15 04:00:00 | 104.7 | 2018-05-15 | 3 | 4 | 3-4 | 17.0 |
Pinus | DEVIEC | 2018-05-16 03:00:00 | 2018-05-16 04:00:00 | 33.3 | 2018-05-16 | 3 | 4 | 3-4 | 20.6 |
Pinus | DEVIEC | 2018-05-17 03:00:00 | 2018-05-17 04:00:00 | 19.1 | 2018-05-17 | 3 | 4 | 3-4 | 8.7 |
Pinus | DEVIEC | 2018-05-18 03:00:00 | 2018-05-18 04:00:00 | 57.2 | 2018-05-18 | 3 | 4 | 3-4 | 7.3 |
We can also split the different stations by setting the argument locations = TRUE.
## [[1]]
##
## [[2]]
##
## [[3]]
An alternative to plot_hour() is plot_heathour(), which shows a summary of all particles with a heatplot. The input data should have the same format as for plot_hour().
You can also set the colors by the arguments: low.col, mid.col and high.col. E.g.
By setting locations = TRUE you can split the result by locations:
## [[1]]
##
## [[2]]