# basics

require(foieGras)
#> Loading required package: foieGras

## Disclaimer

this vignette is an extended set of examples to highlight foieGras’s functionality. Please, do NOT interpret these examples as instructions for conducting analysis of animal movement data. Numerous essential steps in a proper analysis have been left out of this document. It is your job to understand your data, ensure you are asking the right questions of your data, and that the analyses you undertake appropriately reflect those questions. We can not do this for you!

### foieGras models

this vignette provides a (very) brief overview of how to use foieGras to filter animal track locations obtained via the Argos satellite system or via processed light-level geolocation (GLS). foieGras provides two state-space models (SSM’s) for filtering (ie. estimating “true” locations and associated movement model parameters, while accounting for error-prone observations):

• a simple Random Walk model, rw
• a Correlated Random Walk model, crw

both models are continuous-time models, that is, they account for the time intervals between successive observations, thereby naturally accounting for the commonly irregularly-timed nature of animal tracking data. We won’t dwell on the details of the models here (see Jonsen et al. 2020 for details on the crw model), except to say there may be advantages to choosing one over the other in certain circumstances. The Random Walk model tends not to deal well with small to moderate gaps (relative to a specified time step) in observed locations and can over-fit to particularly noisy data. The Correlated Random Walk model can often deal better with these small to moderate data gaps and appropriately smooth through noisy data but tends to estimate nonsensical movement through larger data gaps.

additionally, foieGras provides fast models (mpm, jmpm) for estimating a behavioural index along animals’ tracks (see Jonsen et al. 2019 for details). The mpm is fit to individual tracks, whereas the jmpm is fit to multiple tracks simultaneously with a variance parameter that is estimated jointly across the tracks. This latter model can often better resolve subtle changes in movement behaviour along tracks that lack much contrast in movements. Now, both models can be fit to time-regularized locations (discrete-time models) or to time-irregular locations (continuous-time models). See Auger-Méthé et al. 2017 for an example of the latter.

### input data

foieGras expects data to be provided in one of several possible formats.

1. a data.frame or tibble that looks like this
#>      id                date lc      lon       lat  smaj smin eor
#> 1 54591 2012-03-05 05:09:33  1 110.5707 -66.42752  2442  416  42
#> 2 54591 2012-03-06 04:55:14  0 110.3402 -66.39579 49660  391  90
#> 3 54591 2012-03-07 04:23:10  A 110.4778 -66.45266  5032  472  93
#> 4 54591 2012-03-07 21:23:06  A 110.3749 -66.39622  4007  286 116
#> 5 54591 2012-03-09 04:27:49  B 110.4732 -66.48743 13063  956  82
#> 6 54591 2012-03-10 00:10:41  A 110.5014 -66.43516  5099  478  79

where the Argos data are provided via CLS Argos’ Kalman filter model (KF) and include error ellipse information for each observed location.

1. a data.frame or tibble that looks like this
#>             id                date lc   lon    lat
#> 1 ct109-085-14 2015-02-03 00:11:02  B 70.45 -49.93
#> 2 ct109-085-14 2015-02-03 13:26:37  B 71.00 -50.21
#> 3 ct109-085-14 2015-02-03 21:53:15  B 71.31 -50.37
#> 4 ct109-085-14 2015-02-04 04:05:35  A 71.64 -50.43
#> 5 ct109-085-14 2015-02-04 17:12:42  B 72.04 -50.46
#> 6 ct109-085-14 2015-02-05 02:05:44  B 72.44 -50.47

where the Argos data are provided via CLS Argos’ Least-Squares model (LS) and do not include error ellipse information.

1. a data.frame or tibble that includes observations with missing KF error ellipse information
#>      id                date lc      lon       lat  smaj smin eor
#> 1 54591 2012-03-05 05:09:33  1 110.5707 -66.42752  2442  416  42
#> 2 54591 2012-03-06 04:55:14  0 110.3402 -66.39579 49660  391  90
#> 3 54591 2012-03-07 04:23:10  A 110.4778 -66.45266    NA   NA  NA
#> 4 54591 2012-03-07 21:23:06  A 110.3749 -66.39622    NA   NA  NA
#> 5 54591 2012-03-09 04:27:49  B 110.4732 -66.48743    NA   NA  NA
#> 6 54591 2012-03-10 00:10:41  A 110.5014 -66.43516  5099  478  79

in this situation, foieGras treats observations with missing error ellipse information as though they are LS-based observations.

1. an sf-tibble where observations have any of the previous 3 structures and also include CRS information
#>      id                date lc  smaj smin eor                   geometry
#> 1 54591 2012-03-05 05:09:33  1  2442  416  42 POINT (2430.943 -912.3109)
#> 2 54591 2012-03-06 04:55:14  0 49660  391  90  POINT (2437.96 -903.7763)
#> 3 54591 2012-03-07 04:23:10  A  5032  472  93 POINT (2429.753 -907.3739)
#> 4 54591 2012-03-07 21:23:06  A  4007  286 116 POINT (2437.367 -905.2335)
#> 5 54591 2012-03-09 04:27:49  B 13063  956  82 POINT (2426.138 -905.8045)
#> 6 54591 2012-03-10 00:10:41  A  5099  478  79 POINT (2431.234 -909.0692)
1. a data.frame, tibble or sf-tibble where processed GLS data are provided and include longitude and latitude error SD’s (in degrees). In this case, the lc class is set to GL for all GLS locations.
#>      id                date lc   lon lat     lonerr     laterr
#> 1 54632 2021-04-26 14:51:25 GL 100.0 -55 0.80145678 0.24680960
#> 2 54632 2021-04-27 02:51:25 GL 100.5 -54 1.03366929 1.34625683
#> 3 54632 2021-04-27 14:51:25 GL 101.0 -53 0.03611925 2.84626597
#> 4 54632 2021-04-28 02:51:25 GL 101.5 -52 1.15502022 1.67499383
#> 5 54632 2021-04-28 14:51:25 GL 102.0 -51 0.98906118 0.05397196
1. a data.frame, tibble or sf-tibble where GPS data are provided. In this case, the lc class is set to G for all GPS locations.
#>         id                date lc  lon   lat
#> 1 F02-B-17 2021-04-26 14:51:25  G 70.1 -49.2
#> 2 F02-B-17 2021-04-26 15:51:25  G 70.6 -48.2
#> 3 F02-B-17 2021-04-26 16:51:25  G 71.1 -47.2
#> 4 F02-B-17 2021-04-26 17:51:25  G 71.6 -46.2
#> 5 F02-B-17 2021-04-26 18:51:25  G 72.1 -45.2
1. a data.frame, tibble or sf-tibble where any combination of Argos, GLS or GPS locations can be intermixed - though, most typically this would be a combination of Argos and GPS locations.
#>         id                date lc  lon   lat  smaj smin eor
#> 1 F02-B-17 2017-09-17 05:20:00  G 70.1 -49.2    NA   NA  NA
#> 2 F02-B-17 2017-10-04 14:35:01  2 70.2 -49.1  1890   45  77
#> 3 F02-B-17 2017-10-05 04:03:25  G 70.1 -49.3    NA   NA  NA
#> 4 F02-B-17 2017-10-05 06:28:20  A 71.1 -48.7 28532 1723 101
#> 5 F02-B-17 2017-10-05 10:21:18  B 70.8 -48.5 45546 3303  97

### fitting a foieGras model

model fitting for quality control of locations is comprised of 2 steps: a prefilter step where a number of checks are made on the input data (see ?foieGras::fit_ssm for details), including applying the trip::sda filter to identify extreme outlier observations. Additionally, if the input data are not supplied as an sf object,the prefilter guesses at an appropriate projection (typically world mercator, EPSG 3395) to apply to the data. The SSM is then fit to this projected version of the data. Users invoke this process via the fit_ssm function:

## prefilter and fit Random Walk SSM using a 24 h time step
fit <-
fit_ssm(
sese1,
model = "rw",
time.step = 24,
control = ssm_control(verbose = 0)
)

these are the minimum arguments required: the input data, the model (rw or crw) and the time.step (in h) to which locations are predicted (the argument control = ssm_control(verbose = 0) is included for vignette tidyness). Additional control can be exerted over the prefiltering step, via the vmax, ang, distlim, spdf and min.dt arguments. see ?fit_ssm for details, the defaults for these arguments are quite conservative (for non-flying species), usually leading to relative few observations being flagged to be ignored by the SSM. Additional control over the SSM fitting step can also be exerted via the control = ssm_control() argument, see ?ssm_control for details.

fit_ssm can be applied to single tracks or to multiple tracks, as shown above. The SSM is fit to each individual separately and the resulting output is a compound tibble with rows corresponding to each individual fG_ssm fit object. The converged column indicates whether each model fit converged successfully.

## list fit outcomes for both seals
fit
#> # A tibble: 1 x 5
#>   id        ssm          converged pdHess pmodel
#>   <chr>     <named list> <lgl>     <lgl>  <chr>
#> 1 ct36-E-09 <ssm [15]>   TRUE      TRUE   rw

individual id is displayed in the 1st column, all fit output (ssm) in the 2nd column, convergence status (whether the optimizer found a global minimum) of each model fit is displayed in the 3rd column, whether the Hessian matrix was positive-definite and could be solved to obtain standard errors (pdHess) is displayed in the 4th column, and the specified process model (rw or crw) in the 5th column. In some cases, the optimizer will converge but the Hessian matrix is not positive-definite, which typically indicates the optimizer converged on a local minimum. In this case, some standard errors can often be calculated but not all. One possible solution is to try specifying a longer time.step or set time.step = NA to turn off predictions and return only fitted values (location estimates at the pre-filtered observation times). If pdHess = FALSE persists then careful inspection of the supplied data is warranted to determine if suspect observations not identified by prefilter are present. The excellent glmmTMB troubleshooting vignette may also provide hints at solutions. Convergence failures should be examined for potential data issues, however, in some cases changes to the optimization parameters via ssm_control() (see ?fit_ssm and ?ssm_control on usage) may overcome mild issues (see ?nlminb or ?optim for details on optimization control parameters).

### accessing and visualizing model fit objects

simple summary information about the fit can be obtained by calling the individual fit objects:

fit\$ssm[[1]]
#> Process model: rw
#> Time interval: 24 hours
#> number of original observations: 678
#> number of observations: 497
#> number of fitted states: 497
#> number of predicted states: 58
#>
#> parameter estimates
#> -------------------
#>         Estimate Std. Error
#> rho_p   -0.57210    0.05063
#> sigma_x  9.22751    0.44432
#> sigma_y  7.38602    0.32627
#> rho_o    0.46945    0.08541
#> tau_x    0.34763    0.02734
#> tau_y    0.32041    0.02210
#> -------------------
#> negative log-likelihood: 4409.909
#> convergence: yes

here, the [[1]] denotes the first individual and so on. The summary table lists all estimated parameters, the specific ones listed depend on the process model selected and the data type. Here, sigma_x and sigma_y are the process error standard deviations in the x and y directions, rho_p is the correlation parameter in the covariance term, tau_x and tau_y are the observation error standard deviations, and rho_o is the correlation parameter. The Std. Error column lists the standard errors, calculated via the Delta method (see TMB documentation for details), for each estimated parameter.

a summary plot method allows a quick visual of the SSM fit to the data:

# plot time-series of the predicted values
plot(fit, what = "predicted", type = 1, pages = 1)

plot(fit, what = "fitted", type = 2, pages = 1)

the predicted values (red) are the state estimates predicted at regular time intervals, specified by time.step (here every 24 h). These estimates are plotted on top of the observations that passed the prefilter stage (blue points and blue rug at bottom). Fitted values are the state estimates corresponding to the time of each observation; their time series are plotted by default - plot(fit). A 2-D time series plot of the track is invoked by the argument type = 2.

Assessing goodness-of-fit is an important component of any model fitting exercise. Residual plots are important for validating models, but classical Pearson residuals, for example, are not appropriate for state-space models. Instead, a prediction residual, or one-step-ahead residual, provides a useful alternative - albeit more computationally demanding to calculate. In foieGras, prediction residuals from state-space model fits are calculated using the osar function and can be visualised as time-series plots, Q-Q plots, or autocorrelation functions:

require(patchwork)
# calculate & plot residuals
res <- osar(fit)

(plot(res, type = "ts") | plot(res, type = "qq")) /
(plot(res, type = "acf") | plot_spacer())

As calculation of these residuals is computationally demanding, especially for multiple individual tracks, the osar function is automatically implemented in parallel when calculating residuals for more than 2 tracks.

### inferring behaviour

A behavioural index, movement persistence, can be estimated along SSM-fitted tracks by using the fit_mpm function. This index ranges continuously from 0, associated with frequent changes in direction and/or speed, to 1, associated with infrequent changes (persistence) in direction and/or speed. Here we use the fG_ssm fit object from earlier as data to estimate and visualise the time-varying movement persistence as a 1-D time-series and along an animal’s ssm-filtered track:

fmp <- fit_mpm(fit, what = "predicted", model = "mpm", control = mpm_control(verbose = 0))

plot(fmp, pages = 1, pal = "Zissou1", rev = TRUE)


plot(fmp, fit, pages = 1, pal = "Cividis")