Calibrating trade-offs tutorial


Systematic conservation planning requires making trade-offs (Margules & Pressey 2000; Vane-Wright et al. 1991). Since different criteria may conflict with one another – or not align perfectly – prioritizations need to make trade-offs between different criteria (Klein et al. 2013). Although some criteria can easily be accounted for by using locked constraints or representation targets (e.g., Dorji et al. 2020; Hermoso et al. 2018), this is not always the case (e.g., Beger et al. 2010). For example, prioritizations often need to balance overall cost with the overall level spatial fragmentation among reserves (Hermoso et al. 2011; Stewart & Possingham 2005). Additionally, prioritizations often need to balance the overall level of connectivity among reserves against other criteria (Hermoso et al. 2012). Since the best trade-off depends on a range of factors – such as available budgets, species’ connectivity requirements, and management capacity – finding the best balance can be challenging.

The prioritizr R package provides multi-objective optimization methods to help identify the best trade-offs between different criteria. To achieve this, a conservation planning problem can be formulated with a primary objective (e.g., add_min_set_objective()) and penalties (e.g., add_boundary_penalties()) that relate to such criteria. When building the problem, the nature of the trade-offs can be specified using certain parameters (e.g., the penalty parameter of the add_boundary_penalties() function). To identify a prioritization that finds the best balance between different criteria, the trade-off parameters can be tuned using a calibration analysis. These analyses – in the context of systematic conservation planning – typically involve generating a set of candidate prioritizations based on different parameters, measuring their performance according to each of the criteria, and then selecting a prioritization (or set of prioritizations) based on how well they achieve the criteria (Hermoso et al. 2011, 2012; Stewart & Possingham 2005). For example, the Marxan decision support tool has a range of parameters (e.g., species penalty factors, boundary length modifier) that are calibrated to balance cost, species’ representation, and spatial fragmentation (Ardron et al. 2010).

The aim of this tutorial is to provide guidance on calibrating trade-offs when using the prioritizr R package. Here we will explore a couple of different approaches for generating candidate prioritizations, and methods for finding the best balance between different criteria. Specifically, we will try to generate prioritizations that strike the best balance between total cost and spatial fragmentation (measured as total boundary length). As such, the code used in this vignette will be directly applicable when performing a boundary length calibration analysis.


Let’s load the packages and dataset used in this tutorial. Since this tutorial uses the prioritizrdata R package along with several other R packages (see below), please ensure that they are all installed. This particular dataset comprises two object: tas_pu and tas_features. Although we will briefly discuss this dataset below, please refer to the Tasmania Tutorial vignette for further details.

# load packages

# load planning unit data

# convert planning units to sf format
tas_pu <- st_as_sf(tas_pu)

# load feature data

# print planning unit data
## Simple feature collection with 1130 features and 5 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: 298809.6 ymin: 5167775 xmax: 613818.8 ymax: 5502544
## Projected CRS: WGS 84 / UTM zone 55S
## First 10 features:
##   id      cost status locked_in locked_out                       geometry
## 0  1 60.246377      0     FALSE      FALSE MULTIPOLYGON (((328497 5497...
## 1  2 19.863008      0     FALSE      FALSE MULTIPOLYGON (((307121.6 54...
## 2  3 59.680513      0     FALSE      FALSE MULTIPOLYGON (((321726.1 54...
## 3  4 32.416138      0     FALSE      FALSE MULTIPOLYGON (((304314.5 54...
## 4  5 26.177062      0     FALSE      FALSE MULTIPOLYGON (((314958.5 54...
## 5  6 51.262177      0     FALSE      FALSE MULTIPOLYGON (((327904.3 54...
## 6  7 32.299112      0     FALSE      FALSE MULTIPOLYGON (((308194.1 54...
## 7  8 38.404063      0     FALSE      FALSE MULTIPOLYGON (((322792.7 54...
## 8  9  3.554745      0     FALSE      FALSE MULTIPOLYGON (((334896.6 54...
## 9 10  1.834921      0     FALSE      FALSE MULTIPOLYGON (((356377.1 54...
# print feature data
## class      : RasterStack 
## dimensions : 398, 359, 142882, 62  (nrow, ncol, ncell, nlayers)
## resolution : 1000, 1000  (x, y)
## extent     : 288801.7, 647801.7, 5142976, 5540976  (xmin, xmax, ymin, ymax)
## crs        : +proj=utm +zone=55 +south +datum=WGS84 +units=m +no_defs +ellps=WGS84 +towgs84=0,0,0 
## names      : tas_features.1, tas_features.2, tas_features.3, tas_features.4, tas_features.5, tas_features.6, tas_features.7, tas_features.8, tas_features.9, tas_features.10, tas_features.11, tas_features.12, tas_features.13, tas_features.14, tas_features.15, ... 
## min values :              0,              0,              0,              0,              0,              0,              0,              0,              0,               0,               0,               0,               0,               0,               0, ... 
## max values :              1,              1,              1,              1,              1,              1,              1,              1,              1,               1,               1,               1,               1,               1,               1, ...

The tas_pu object contains planning units represented as spatial polygons (i.e., converted to a sf::st_sf() object). This object has three columns that denote the following information for each planning unit: a unique identifier (id), unimproved land value (cost), and current conservation status (locked_in). Specifically, the conservation status column indicates if at least half the area planning unit is covered by existing protected areas (denoted by a value of 1) or not (denoted by a value of zero).

# plot map of planning unit costs
plot(tas_pu[, "cost"], main = "Planning unit costs")

# plot map of planning unit statuses
plot(tas_pu[, "locked_in"], main = "Planning unit status")

The tas_features object describes the spatial distribution of different vegetation communities (using presence/absence data). We will use the vegetation communities as the biodiversity features for the prioritization.

# plot map of the first four vegetation classes
plot(tas_features[[1:4]], main = paste("Feature", 1:4))

We can use this dataset to generate a prioritization. Specifically, we will use the minimum set objective so that the optimization process minimizes total cost. We will add representation targets to ensure that prioritizations cover 17% of each vegetation community. Additionally, we will add constraints to ensure that planning units covered by existing protected areas are selected (i.e., locked in). Finally, we will specify that the conservation planning exercise involves binary decisions (i.e., selecting or not selecting planning units for protected area establishment).

# define a problem
p0 <- problem(tas_pu, tas_features, cost_column = "cost") %>%
      add_min_set_objective() %>%
      add_relative_targets(0.17) %>%
      add_locked_in_constraints("locked_in") %>%

# print problem
## Conservation Problem
##   planning units: sf (1130 units)
##   cost:           min: 0.19249, max: 61.92727
##   features:       tas_features.1, tas_features.2, tas_features.3, ... (62 features)
##   objective:      Minimum set objective 
##   targets:        Relative targets [targets (min: 0.17, max: 0.17)]
##   decisions:      Binary decision 
##   constraints:    <Locked in planning units [257 locked units]>
##   penalties:      <none>
##   portfolio:      default
##   solver:         default
# solve problem
s0 <- solve(p0)

# print result

# create column for making a map of the prioritization
s0$map_1 <- case_when(
  s0$locked_in > 0.5 ~ "locked in",
  s0$solution_1 > 0.5 ~ "priority",
  TRUE ~ "other"

# plot map of prioritization
  s0[, "map_1"], pal = c("purple", "grey90", "darkgreen"),
  main = NULL, key.pos = 1

We can see that the priority areas identified by the prioritization are scattered across the study area (shown in green). Indeed, none of the priority areas are connect to existing protected areas (shown in purple), and very of them are connect with other priority areas. As such, the prioritization has a high level of spatial fragmentation. If it is important avoid such levels of spatial fragmentation, then we will need to explicitly account spatial fragmentation in the optimization process.

Preliminary processing

We need to conduct some preliminary processing procedures to prepare the data for subsequent analysis. This is important to help make it easier to find suitable trade-off parameters, and avoid numerical scaling issues that can result in overly long run times (see presolve_check() for further information). These processing steps are akin to data scaling (or normalization) procedures that are applied in statistical analysis to improve model convergence.

The first processing procedure involves setting the cost values for all locked in planning units to zero. This is so that the total cost estimates of the prioritization reflects the total cost of establishing new protected areas – not just total land value. In other words, we want the total cost estimate for a prioritization to reflect the cost of implementing conservation actions. This procedure is especially important when using the hierarchical approach described below, so that cost thresholds are based on percentage increases in the cost of establishing new protected areas.

# set costs for planning units covered by existing protected areas to zero
tas_pu$cost[tas_pu$locked_in > 0.5] <- 0

# plot map of planning unit costs
plot(tas_pu[, "cost"], main = "Planning unit cost")

The second procedure involves pre-computing the boundary length data and manually re-scaling the boundary length values. This procedure is important because boundary length values are often very large that, in turn, can cause numerical issues that result in excessive run times (see presolve_check() for further details).

# generate boundary length data for the planning units
tas_bd <- boundary_matrix(tas_pu)

# manually re-scale the boundary length values
tas_bd@x <- rescale(tas_bd@x, to = c(0.01, 100))

After applying these procedures, our data is ready for subsequent analysis.

Generating candidate prioritizations

Here we will start the calibration analysis by generating a set of candidate prioritizations. Specifically, these prioritizations will be generated using different parameters to specify different trade-offs between the different criteria. Since this tutorial involves navigating trade-offs between the overall cost of a prioritization and the level of spatial fragmentation associated with a prioritization (as measured by total boundary length), we will generate prioritizations using different parameters related to these criteria. We will examine two approaches for generating candidate prioritizations based on multi-objective optimization procedures. Although we’ll be examining both approaches in this tutorial, you would normally only use one of these approaches when conducting your own analysis

Blended approach

The blended approach for multi-objective optimization involves combining separate criteria (e.g., total cost and total boundary length) into a single joint criterion. To achieve this, a trade-off (or scaling) parameter is used to specify the relative importance of each criterion. This approach is the default approach provided by the prioritizr R package. Specifically, each of the functions for adding a penalty to a problem formulation (e.g., add_boundary_penalties()) contains a parameter to control the relative importance of the penalties (i.e., the penalty parameter). For example, when using the add_boundary_penalties() function, setting a high penalty value will indicate that it is important to reduce the overall exposed boundary (perimeter) of the prioritization.

The main challenge with the blended approach is identifying a range of suitable penalty values to generate candidate prioritizations. If we set a penalty value that is too low, then the penalties will have no effect (e.g., boundary length penalties would have no effect on the prioritization). If we set a penalty value too high, then the prioritization will effectively ignore the primary objective. In such cases, the prioritization will be overly spatially clustered – because the planning unit cost values have no effect — and contain a single reserve. Thus we need to find a suitable range of penalty values before we can generate a set of candidate prioritizations.

We can find a suitable range of penalty values by generating a set of preliminary prioritizations. These preliminary prioritizations will be based on different penalty values – similar to the process for generating the candidate prioritizations – but solved using customized settings that sacrifice optimality for fast run times (see below for details). This is especially important because specifying a penalty value that is too high will cause the optimization process to take a very long time to generate a solutions (due to the numerical scaling issues mentioned previously). To find a suitable range of penalty values, we need to identify an upper limit for the penalty value (i.e., the highest penalty value that result in a prioritization containing a single reserve). Let’s create some preliminary penalty to identify this upper limit. Please note that you might need to adjust the prelim_upper value to find the upper limit when analyzing different datasets.

## [1]   0.00001   0.00009   0.00089   0.00841   0.07943   0.74989   7.07946
## [8]  66.83439 630.95734

Next, let’s use the preliminary penalty values to generate preliminary prioritizations. As mentioned earlier, we will generate these preliminary prioritizations using customized settings to reduce runtime. Specifically, we will set a time limit of 10 minutes per run, and relax the optimality gap to 20%. Although we would not normally use such settings – because the resulting prioritizations are not guaranteed to be near-optimal (the default gap is 10%) – this is fine because our goal here is to tune the preliminary penalty values. Indeed, none of these preliminary prioritizations will be considered as candidate prioritizations. Please note that you might need to set a higher time limit, or relax the optimality gap even further (e.g., 40%) when analyzing larger datasets.

After generating the preliminary prioritizations, let’s create some maps to visualize them. In particular, we want to understand how different penalty values influence the spatial fragmentation of the prioritizations.

We can see that as the penalty value used to generate the prioritizations increases, the spatial fragmentation of the prioritizations decreases. In particular, we can see that a penalty value of 66.83439 results in a single reserve – meaning this is our best guess of the upper limit. Using this penalty value as an upper limit, we will now generate a second series of prioritizations that will be the candidate prioritizations. Critically, these candidate prioritizations will not be generated using with time limit and be generated using a more suitable gap (i.e., default gap of 10%).

We now have a set of candidate prioritizations generated using the blended approach. The main advantages of this approach is that it is similar calibration analyses used by other decision support tools for conservation (i.e., Marxan) and it is relatively straightforward to implement. However, this approach also has a key disadvantage. Because the penalty parameter is a unitless trade-off parameter – meaning that we can’t leverage existing knowledge to specify a suitable range of penalty values – we first have to conduct a preliminary analysis to identify an upper limit. Although finding an upper limit was fairly simple for the example dataset, it can be difficult to find for more realistic data. In the next section, we will show how to generate a set of candidate prioritizations using the hierarchical approach – which does not have this disadvantage.

Hierarchical approach

The hierarchical approach for multi-objective optimization involves generating a series of incremental prioritizations – using a different objective at each increment to refine the previous solution – until the final solution achieves all of the objectives. The advantage with this approach is that we can specify trade-off parameters for each objective based on a percentage from optimality. This means that we can leverage our own knowledge – or that of decision maker – when to generate a range of suitable trade-off parameters. As such, this approach does not require us to generate a series of preliminary prioritizations.

This approach is slightly more complicated to implement within the prioritizr R package then the blended approach. To start off, we generate an initial prioritization based on a problem formulation that does not consider any penalties. Critically, we will generate this prioritization by solving the problem to optimality (using the gap parameter of the add_default_solver() function).

Next, we will calculate the total cost of the initial prioritization.

## [1] 904.5156

Now we will calculate a series of cost thresholds. These cost thresholds will be calculated by inflating the cost of the initial prioritization by a range of percentage values. Since these values are percentages – and not unitless values unlike those used in the blended approach – we can use domain knowledge to specify a suitable range of cost thresholds. For this tutorial, let’s assume that it would be impractical – per our domain knowledge – to expend more than four times the total cost of the initial prioritization to reduce spatial fragmentation.

## [1]  905 1357 1810 2262 2714 3166 3619 4071 4523

After generating the cost thresholds, we can use them to generate prioritizations. Specifically, we will generate prioritizations that aim to minimize total boundary length as much as possible – ignoring the total cost of the prioritizations – whilst ensuring that the total cost of the prioritization does not exceed a given cost threshold and the other considerations (e.g., locked in constraints). To achieve this, we create a new column in the tas_pu object that contains only zero values (called zeros) and use this new column to specify the cost data for the prioritizations. Although we normally recommend against cost data that contain zero values – because planning units with zero costs are often selected in prioritizations even if they are not needed – here we use zero cost values so that the prioritization will focus exclusively on spatial fragmentation. Additionally, when it comes to generating the prioritization, we will add linear constraints to ensure that the total cost of the prioritization does not exceed a given cost threshold (using the add_linear_constraints() function).

# add a column with zeros
tas_pu$zeros <- 0

# define a problem with zero cost values and boundary penalties
## note that because all the costs are all zero, it doesn't actually
## matter what penalty value is used (as long as the value is > 0)
## and so we just use a value of 1
p2 <- problem(tas_pu, tas_features, cost_column = "zeros") %>%
      add_min_set_objective() %>%
      add_boundary_penalties(penalty = 1, data = tas_bd) %>%
      add_relative_targets(0.17) %>%
      add_locked_in_constraints("locked_in") %>%

# generate prioritizations based on each cost threshold
## note that the prioritizations are solved to within 10% of optimality
## (the default gap) because the gap is not specified
hierarchical_results <- lapply(threshold, function(x) {
  ## generate solution by adding a constraint based on the threshold and
  ## using the "real" cost values (i.e., not zeros)
  s <-
    p2 %>%
    add_linear_constraints(threshold = x, sense = "<=", data = "cost") %>%
  ## return data frame with solution
  s <- data.frame(s = s$solution_1)
  names(s) <- paste0("threshold_", x)

# format results as a single spatial object
hierarchical_results <- cbind(tas_pu,, hierarchical_results))

# plot maps of prioritizations
  x =
    hierarchical_results %>%
    dplyr::select(starts_with("threshold_")) %>%
    mutate_if(is.numeric, function(x) {
        hierarchical_results$locked_in > 0.5 ~ "locked in",
        x > 0.5 ~ "priority",
        TRUE ~ "other"
  pal = c("purple", "grey90", "darkgreen")

We now have a set of candidate prioritizations generated using the hierarchical approach. This approach can be much faster than the blended approach because it does not require generating a set of prioritizations to identify an upper limit for the penalty trade-off parameter. After generating a set of candidate prioritizations, we can then calculate performance metrics to compare the prioritizations.

Calculating performance metrics

Here we will calculate performance metrics to compare the prioritizations. Since we aim to navigate trade-offs between the total cost of a prioritization and the overall level of spatial fragmentation associated with a prioritization (as measured by total boundary length), we will calculate metrics to assess these criteria. Although we generated two sets of candidate prioritizations in the previous section; for brevity, here we will consider the candidate prioritizations generated using the hierarchical approach. Please note that you could also apply the following procedures to candidate prioritizations generated using the blended approach.

# calculate metrics for prioritizations
## note that we use p0 and not p1 so that cost calculations are based
## on the cost values and not zeros
hierarchical_metrics <- lapply(
  grep("threshold_", names(hierarchical_results)), function(x) {
    x <- hierarchical_results[, x]
      total_cost = eval_cost_summary(p0, x)$cost,
      total_boundary_length = eval_boundary_summary(p0, x)$boundary
hierarchical_metrics <-, hierarchical_metrics)
hierarchical_metrics$threshold <- threshold
hierarchical_metrics <- as_tibble(hierarchical_metrics)

# preview metrics
## # A tibble: 9 × 3
##   total_cost total_boundary_length threshold
##        <dbl>                 <dbl>     <dbl>
## 1       905.              2920849.       905
## 2      1356.              2423631.      1357
## 3      1809.              2205306.      1810
## 4      2262.              2061301.      2262
## 5      2713.              1953381.      2714
## 6      3161.              1864954.      3166
## 7      3617.              1805587.      3619
## 8      4052.              1756225.      4071
## 9      4500.              1719318.      4523

After calculating the metrics, let’s we can use them to help select a prioritization.

Selecting a prioritization

Now we need to decide on which candidate prioritization achieves the best trade-off. There are a range of qualitative and quantitative methods that are available to select a candidate prioritization (Ardron et al. 2010). Here we will consider three different methods. Since some of these methods a set of candidate prioritizations, we will use the candidate prioritizations using the hierarchical approach for these methods. To keep track of the prioritizations selected by different methods, let’s create a results_data table.

# create data for plotting
result_data <-
  hierarchical_metrics %>%
  ## rename threshold column to value column
  rename(value = "threshold") %>%
  ## add column with column names that contain candidate prioritizations
  mutate(name = grep(
    "threshold_", names(hierarchical_results), value = TRUE, fixed = TRUE
  )) %>%
  ## add column with labels for plotting
  mutate(label = paste("Threshold =", value)) %>%
  ## add column to keep track prioritizations selected by different methods
  mutate(method = "none")

# print table
## # A tibble: 9 × 6
##   total_cost total_boundary_length value name           label            method
##        <dbl>                 <dbl> <dbl> <chr>          <chr>            <chr> 
## 1       905.              2920849.   905 threshold_905  Threshold = 905  none  
## 2      1356.              2423631.  1357 threshold_1357 Threshold = 1357 none  
## 3      1809.              2205306.  1810 threshold_1810 Threshold = 1810 none  
## 4      2262.              2061301.  2262 threshold_2262 Threshold = 2262 none  
## 5      2713.              1953381.  2714 threshold_2714 Threshold = 2714 none  
## 6      3161.              1864954.  3166 threshold_3166 Threshold = 3166 none  
## 7      3617.              1805587.  3619 threshold_3619 Threshold = 3619 none  
## 8      4052.              1756225.  4071 threshold_4071 Threshold = 4071 none  
## 9      4500.              1719318.  4523 threshold_4523 Threshold = 4523 none

Next, let’s examine some different methods for selecting prioritizations.

Visual method

One qualitative method involves plotting the relationship between the different criteria, and using the plot to visually select a candidate prioritization. This visual method is often used to help calibrate trade-offs among prioritizations generated using the Marxan decision support tool (e.g., Hermoso et al. 2011; Stewart & Possingham 2005). So, let’s create a plot to select a prioritization.

We can see that there is a clear relationship between total cost and total boundary length. It would seem that in order to achieve a lower total boundary length – and thus lower spatial fragmentation – the prioritization must have a greater cost. Although we might expect the results to show a smoother curve – in other words, only Pareto dominant solutions – this result is expected because we generated candidate prioritizations using the default optimality gap of 10%. Typically, the visual method involves selecting a prioritization near the elbow of the plot. So, let’s select the prioritization generated using a threshold value of 1810. To keep of the prioritizations selected based on different methods, let’s create a method column in the result_data table.

Next, let’s consider a quantitative approach.

TOPSIS method

Multiple-criteria decision analysis is a disciple that uses analytical methods to evaluate trade-offs between multiple criteria (MCDA; reviewed in Greene et al. 2011). Although this discipline contains many different methods, here we will use the the Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS) method (Hwang & Yoon 1981). This method requires (i) data describing the performance of each prioritization according the different criteria, (ii) weights to encode the relative importance of each criteria, and (iii) details on whether each criteria should ideally be minimized or maximized. Let’s run the analysis, assuming that we equal weighting for total cost and total boundary length.

##   alt.row     score rank
## 1       1 0.6819118    3
## 2       2 0.7485099    1
## 3       3 0.7175139    2
## 4       4 0.6383925    4
## 5       5 0.5493251    5
## 6       6 0.4677887    6
## 7       7 0.3972848    7
## 8       8 0.3488804    8
## 9       9 0.3180882    9

The candidate prioritization with the greatest TOPSIS score is considered to represent the best trade-off between total cost and total boundary length. So, based on this method, we would select the prioritization generated using a threshold value of 1357. Let’s update the result_data with this information.

Next, let’s consider another quantitative method.

Cohon et al. (1979) method

This method is based on an algorithm that was originally developed by Cohon et al. (Cohon et al. 1979), and was later adapted for use in systematic conservation planning (Fischer & Church 2005). Specifically, it involves generating two ideal prioritizations – with each prioritization representing the ideal prioritization for each criteria – and then using the performance metrics calculated for these prioritizations to automatically derive a trade-off penalty value (Ardron et al. 2010; Cohon et al. 1979). Thus, unlike the two other methods, this method does not require a set of candidate prioritizations. As such, this method can be used to find a prioritization that meets multiple criteria in a much shorter period of time than the other methods. To implement this method, we first need to generate the ideal prioritizations (note that we specify an gap of 0% to ensure optimality).

Next, let’s calculate the performance metrics for these prioritizations.

After calculating these performance metrics, we can use them to automatically calculate a penalty value.

## [1] 0.00945

Now that we have calculated a penalty value using this method, we can use it to generate a prioritization.

Let’s update the results_data table with results about the prioritization.

Next, let’s compare the prioritizations selected by different methods.

Method comparison

Let’s create a plot to visualize the results from the different methods.

We can see that the different method selected different prioritizations. To further compare the results from the different methods, let’s create some maps showing the selected prioritizations.