Evaluating methods for estimating rare event: with zero-heavy data: a simulation model estimating sea turtle bycatch in the pelagic longline fishery.
Fishery scientists and ecologists often must make inferences from data with many zero values and high variance. For example, studies of the detection or capture of protected species or infrequently encountered commercial species result in data sets that contain many zeros and few positive values with a skewed distribution (Martin et al., 2005; Sileshi, 2006). Analyzing such zero-heavy data sets (data sets with many zero values) poses unique challenges that are not always met, perhaps, because method suitability has not been explored fully or because of deference to familiar methods (Walters, 2003; Martin et al., 2005; Sileshi, 2006). It is not uncommon for scientists to use familiar statistical methods even when it may be impossible to meet model assumptions (Walters, 2003; Sileshi, 2006). Additionally, transformations often are employed to overcome violations of the errors' assumed variance-mean relationship, but transformations will not ameliorate the problems associated with zero-heavy data (Martin et al., 2005). Biased estimates and incorrect conclusions can result from not accounting for excess zeros and using models with inappropriate assumptions (Martin et al., 2005).
However, interest is growing in analyzing data with excess zeros and in estimating rare events because more appropriate analyses can provide more accurate results (Martin et al., 2005). If scientists use the most appropriate analysis method for a system, they are more likely to obtain the best available estimate for making management decisions for their study system. In this article, we evaluate several methods for making inferences from zero-heavy data sets in the context of estimating fleetwide bycatch of sea turtles. By evaluating method performance, we identify the most suitable estimation method in a variety of fishery scenarios.
The U.S. Atlantic pelagic longline fishery targets swordfish (Xiphias gladius) and tuna (Thunnus spp.) in the Atlantic Ocean, Caribbean Sea, and Gulf of Mexico. From 2005 to 2007, longlines were used to catch approximately 73% of swordfish, 84% of yellowfin tuna (Thunnus albacares), and 90% of bigeye tuna (Thunnus obesus) domestic landings by weight nationwide, where fishing gear was specified (NMFS (1)). However, swordfish and tuna constituted less than half of the observed catch from the U.S. Atlantic pelagic longline fishery between 1992 and 2002 (Beerkircher et al., 2004). The rest of the catch was incidental bycatch. Sharks, rays, and finfishes composed the majority of bycatch during this period, and the incidental capture of sea turtles and marine mammals made up about 1% of the observed catch (Beerkircher et al., 2004). For example, out of the 944 observed sets in 2007, 114 caught a sea turtle (Fairfield and Garrison, 2008). A fishing set is a single deployment of fishing gear; a vessel on average fishes 6 sets per 9-day trip (NMFS, 2006).
Although the incidental capture of the loggerhead sea turtle (Caretta caretta) and leatherback sea turtle (Dermochelys coriacea) is rare, it is notable because these species are protected under the Endangered Species Act (ESA) of 1973: the leatherback sea turtle is listed as endangered and the loggerhead sea turtle is listed as both endangered and threatened. The endangered distinct populations of the loggerhead sea turtle include one in the northeast Atlantic, and the distinct populations listed as threatened include a population in the south Atlantic and another in the northwest Atlantic.
Because the sea turtles that are caught by the U.S. Atlantic pelagic longline fishery are protected under the ESA, scientists at the Southeast Fisheries Science Center (SEFSC) of the National Marine Fisheries Service estimate the number caught annually. These annual bycatch estimates are compared with the fishery's incidental take statement (ITS), which stipulates the maximum number of sea turtles the fishery may catch incidentally before formal consultation under section 7 of the ESA must be undertaken. If the maximum number stipulated in the ITS is exceeded for a turtle species, the SEFSC must assess whether the fishery is jeopardizing the survival of that turtle species and, consequently, how the fishery is allowed to proceed (McCracken, 2004). Therefore, accurate and precise estimates are necessary for both sea turtle conservation and appropriate fishery management.
The SEFSC bases its estimates of sea turtle bycatch on 2 sources of data: logbooks kept by vessel captains and records made by independent observers deployed on ~8% of vessels (Beerkircher et al., 2004). Vessel captains are required to keep logbooks and record information about fishing gear, location, effort, target, and catch. Observers are charged with collecting unbiased data that are representative of the total catch composition (Crowder and Murawski, 1998; Fairfield and Garrison, 2008). To estimate fleetwide sea turtle bycatch, bycatch rates are extrapolated from observer data and on the basis of observer logbook data are applied to unobserved fishing sets (Fairfield and Garrison, 2008). Generally, bycatch is estimated by identifying a relationship between fishing effort or environmental characteristics and the number of turtles caught on observed fishing sets and then by assuming that that relationship holds for unobserved sets.
The estimation methods essentially can be categorized as sample-based estimators or model-based predictors. For sample-based estimators, sampling probabilities are assumed but, for the most part, assumptions are avoided regarding the structure of the target population and features being estimated. These estimators allow the observed bycatch rate to be raised to fleetwide estimates on the basis of total reported fishing effort. Sample-based estimators are usually less efficient (i.e., they require more samples to achieve a specified level of performance) than model-based predictors, where a statistical model of bycatch is assumed. The statistical model used in model-based predictors represents the process that is generating the response variable as a function of explanatory variables (McCracken, 2004). In our example, parameters can be estimated with the data from observed sets and used to relate the explanatory variable values recorded in the logbooks to the number of sea turtles caught. These relationships can be used to estimate the number of turtles caught on unobserved sets.
Current SEFSC estimates of sea turtle bycatch by the U.S. Atlantic pelagic longline fishery have wide confidence intervals, and their accuracy is unknown. Consequently, it is difficult to determine the level of bycatch in a single year and the trend over time, and insufficient bycatch information impedes management. The ability of the SEFSC to estimate bycatch--and, thus, of the NMFS to manage the fishery and conserve protected species--may be improved if alternative estimation methods are systematically compared and the most suitable estimation method is identified. Evaluation of estimation methods with regards to frequently encountered data complexities, such as small sample size (8% observer coverage), overdispersion (greater variance than expected), excess zeros (many observed sets without bycatch), and hierarchical observations (sampling fishing sets within trips), is particularly warranted.
In this study, we evaluated 2 of the most prevalent methods for estimating rare events with zero-heavy data: the delta-lognormal method, a sample-based estimator (Pennington, 1983); and the generalized linear model (GLM), a model-based predictor (Lindsey, 1997), in the context of sea turtle bycatch in the U.S. Atlantic pelagic longline fishery. The SEFSC has used the deltalognormal method to estimate sea turtle bycatch in the U.S. Atlantic pelagic longline fishery since 1997, but, in recent years, the SEFSC has considered switching to a GLM approach (Fairfield and Garrison, 2008; Garrison (2)).
In comparison, the Southwest Fisheries Science Center (SWFSC) and Pacific Islands Fisheries Science Center (PIFSC) have estimated sea turtle bycatch in the U.S. Pacific pelagic longline fishery. The SWFSC used a survey sampling theory in 1994 and 1995 and a regression tree model in 1996 (Skillman and Kleiber, 1998). In 2000, McCracken (2004) of the PIFSC completed the first official report that systematically examined different methods for estimating sea turtle bycatch in the U.S. pelagic longline fishery, although sea turtle bycatch has been estimated since 1992. McCracken (2004) determined that the GLM with a Poisson error distribution and its generalized additive model (GAM) counterpart were the most appropriate methods for estimating sea turtle bycatch in the U.S. Pacific pelagic longline fishery from 1994 to 1999. However, McCracken did not consider the delta-lognormal method, and data from the Atlantic fishery were not analyzed. The Pacific fishery was closed in 2000 and reopened in 2004. Since then, observer coverage has been at least 20%, and bycatch has declined to the point that it is not necessary to model bycatch; instead, the Horvitz-Thompson estimator has been used by the PIFSC (McCracken, 2004).
The goal of this study was to evaluate delta-lognormal and GLM performance under a variety of spatial fishery scenarios to identify the more suitable estimation method. We built a simulation model representing a range of spatial interactions of sea turtles with the U.S. Atlantic pelagic longline fishery and used the deltalognormal method, a generalized linear model with a Poisson error distribution (GLM-P), and a GLM with a negative binomial error distribution (GLM-NB), each at 2 spatiotemporal scales, to estimate the number of turtles caught. By comparing these estimates to the total number of turtles caught in the simulation, we were able to systematically evaluate the performance of each method.
Materials and methods
To represent sea turtle bycatch by the U.S. Atlantic pelagic longline fishery, we constructed a simulation model that included 5 spatial scenarios with various distributions of sea turtles and fishing sets (Fig. 1). The simulation model included both SEFSC data and model assumptions based on the current understanding of fishery and sea turtle behavior (Table 1). Observers were simulated on 8% of the fishing sets, and each estimation method was applied to every spatial scenario. The estimation methods were evaluated by comparing the estimated amount of bycatch to the total simulated amount of bycatch. The simulation model was run 1000 times for each of the 5 spatial scenarios, enabling a comprehensive evaluation of the performance of the estimation methods.
The empirical and theoretical foundation of model assumptions
Fishery-independent data on sea turtle spatial distributions are limited to a few satellite-tracked individuals, at most 60 turtles in a study but typically fewer than 20 (Godley et al., 2007), and aerial surveys (Epperly et al., 1995; McClellan, 1996; McDaniel et al., 2000; Goodman et al., 2007). Small sample size, short study durations (typically less than one year), and nonrepresentative sampling of ages and sexes make satellite tracking data unsuitable for our study (Godley et al., 2007). Moreover, inference from aerial surveys can be difficult because of the high percentage of time that turtles spend submerged and variability in turtle surfacing behavior related to season and location (Byles, 1988; Nelson, 1996; Mansfield, 2006; Goodman et al., 2007).
Because fishery-independent data were not suitable for our objectives, we considered fishery-dependent data. These data indicate that sea turtles clump (i.e., tend to concentrate in certain areas rather than occur equally spaced or spaced with uniform probability), especially in productive areas of the ocean. Currents, frontal regions, and some bathymetric features often are associated with enhanced productivity and prey aggregation, and turtles exhibit a clumping pattern in response to these features when they forage (Williams et al., 1996; Witzell, 1999; Gilman et al., 2006). Environmental features, such as major current systems and gradients in temperature, chlorophyll, and salinity, also seem to influence the clumping of turtles, as well as swordfish (Bigelow et al., 1999; Polovina et al., 2000; Lewison et al., 2004). However, turtle distributions appear to vary seasonally and between species. Gardner et al. (2008) found that, for most of the year, loggerhead and leatherback bycatch locations were not completely random and that there seemed to be increased clumping from July to October. Also, clumping was more pronounced with loggerheads than with leatherbacks (Gardner et al., 2008).
Therefore, we modeled clumped and uniformly random sea turtle distributions. Although existing data indicate that turtles clump, very little information about the spatial extent or density of clumps is available. A density estimate of 0.5 turtles/[km.sup.2] was assumed for modeling because it is an intermediate value based on the estimates available in the scientific literature, bearing in mind that an individual turtle may surface and be available to aerial surveys 5.3% to 30% of the time (Byles, 1988; Nelson, 1996; Mansfield, 2006; Goodman et al., 2007).
As for fishing sets, SEFSC maps of longline set locations suggest that sets do not have a uniformly random distribution (Fairfield and Garrison, 2008). For analysis, the SEFSC has divided the Atlantic Ocean, Caribbean Sea, and Gulf of Mexico into 10 geographic regions or statistical areas, and the agency estimates bycatch in each area for each calendar quarter and then sums these estimates to generate a total annual estimate. Sets appeared clumped whether their distribution was considered across all fishing areas or within a single fishing area. However, the mechanism behind this clumping is not well understood. We modeled 2 possible scenarios: 1) fishing sets clump in the same areas in which sea turtles clump and 2) sets clump independently of turtles. The first scenario could occur if both fishermen and turtles target productive areas of the ocean. The latter could result from either fishermen or turtles imperfectly targeting productive areas or clumping based on another cue. For example, fishermen might aggregate from peer influence.
The spatial scenarios with clumped sets were expected to be most realistic, but considering the amount of uncertainty in the nature of the interactions of sea turtles with the pelagic longline fishery, we thought it useful to analyze other distributions as well. For example, a scenario with uniformly random turtles and sets served as a null model. Further, the results from spatial scenarios considered less realistic for the interactions of sea turtles with the U.S. Atlantic pelagic longline fishery could illuminate general properties of the estimation methods that are relevant to other problems with the management of natural resources.
General structure of the simulation model
Much remains unknown regarding the spatial distributions of sea turtles, how fishermen decide where to fish, and the nature of interactions of sea turtles with fishing sets in time and space. Therefore, we designed several spatially explicit scenarios to address the uncertainty and variation in interactions of sea turtles and the fishery. Five spatial scenarios were modeled (Fig. 1): 1) co-occurrence clumping ([Turtles.sub.clump], [Sets.sub.clump-turtles]); 2) independent clumping ([Turtles.sub.clump], [Sets.sub.clump_sets]); 3) sets-only clumping ([Turtles.sub.uniform], [Sets.sub.clump-sets]); 4) turtles-only clumping ([Turtles.sub.clump], [Sets.sub.uniform]); and 5) fully uniform distribution ([Turtles.sub.uniform], [Sets.sub.uniform]).
Details of model construction
In each simulation, the number of fishing sets that we modeled was 8000, which was approximately the average number of sets reported annually to the SEFSC from 2005 to 2007 (Walsh and Garrison, 2006; Fairfield-Walsh and Garrison, 2007; Fairfield and Garrison, 2008), the first 3 years after NMFS regulations mandated a change from J-hooks to circle hooks for the longline fishery (Watson et al., 2005). Circle hooks were required to reduce the number of sea turtles caught and the severity of their injuries. However, rather than simulating 8000 sets at once, we divided the 8000 sets into computational groups of 25 sets for convenience (Fig. 2). The computational groups of 25 sets were used to distribute turtles and sets, place observers, and simulate bycatch, but bycatch estimates were not made at this scale.
With the Atlantic Ocean, Caribbean Sea, and Gulf of Mexico divided into 10 geographic regions and bycatch estimates for these statistical areas made by the SEFSC for each calendar quarter, bycatch estimates are made in 40 quarter-area strata. Bycatch rates were expected to vary across these strata; therefore, we also modeled strata (Table 2). For each stratum, we calculated the average number of sets reported to the SEFSC from 2005 to 2007 and rounded to multiples of 25 to determine the number of computational groups of 25 sets that would be modeled per stratum.
Each computational group of 25 sets was modeled as a grid of 100 x l00 cells. Sea turtles and fishing sets were assigned coordinates (x, y) depending on the spatial scenario. The details of the procedures are described in the following sections. Modeled sets covered 5 cells-an initial cell and 4 cells either up, right, down, or left--because the average longline set covers about 50 km (mean 47 km, minimum 32 km, maximum 64 km) (Witzell, 1999; Beerkircher et al., 2004; Gilman et al., 2006). Hence, modeled cells were conceptualized as 10x10 km.
Co-occurrence clumping scenario In spatial scenarios with clumped fishing sets, we modeled computational groups with 5 clumps of 5 sets each. Sea turtles also were aggregated in 5 clumps for clumping scenarios. Each clump was based around a block of 9x9 cells. This use of clumps of 90x90 km was consistent with the results of Gardner et al. (2008), who reported that turtle bycatch distributions were found to span 30-200 km.
We modeled the density of sea turtles as declining with distance from the center of a clump. We selected x and y coordinates for the seed of the first turtle clump with uniform probability. To accentuate clumping, we placed turtles within a clump so that the coordinates closer to the seed had a greater probability: Prob(X = [X.sub.seed])=0.2, Prob (X = [X.sub.seed [+ or -] 1]) = 0.16, Prob (X = [X.sub.seed [+ or -] 2]) = 0.12, Prob (X = [X.sub.seed] [+ or -] 3]) = 0.0 8, Prob(X = [X.sub.seed [+ or -] 4]) = 0.04. Assuming a density of 0.5 turtles/[km.sup.2], we placed an average of 50 turtles/cell or 4050 turtles/clump and 20,250 turtles in the entire grid of 100x100 cells. Subsequent clump seed coordinates were selected so that a set could not fish in multiple turtle clumps.
In the spatial scenario with fishing sets and sea turtles clumped in the same areas, the co-occurrence clumping scenario, the clumps (9x9 cells) for the sets and clumps (9x9 cells) for the turtles were identical. Each fishing set began within the 9x9 cells of its clump and then moved 4 cells up, right, down, or left. A set could leave the 9x9 cells of its clump during fishing. However, clumps were designed with 9x9 cells so that a fishing set that began in a clump's center could move in any direction and remain inside its clump. For each of the 5 fishing sets in a clump, the direction of fishing (up, right, down, or left) was determined by the number of turtles that would be encountered in each direction.
To determine the initial coordinates of fishing sets, we tallied the number of sea turtles in each x coordinate of the clump. This tally was used to construct a probability for set placement by dividing the number of turtles with a particular x coordinate by the total number of turtles. The same was done for the y coordinates. To determine the direction of fishing, we tallied the number of turtles that would be encountered by a set moving right, left, up, or down. These 4 counts were summed, and the number encountered in each direction was divided by the total to obtain a probability of moving in each direction. The more turtles that would be encountered, the greater the probability a set would fish in that direction. This algorithm mimicked a situation where more turtles are in the productive areas that fishermen are targeting than in other areas.
Independent clumping scenario The 2 features that distinguish the independent clumping scenario from the co-occurrence clumping scenario are the following: 1) the clumps (9x9 cells) for fishing sets and turtles were placed independently and 2) the direction of fishing was influenced by the number of sets in each of the 4 directions. That is, there was a positive relationship between the probability a set would fish in a particular direction and the proximity to other sets in that direction. The smaller the distance to other sets, the greater the probability the set would fish in that direction. This algorithm is consistent with fishermen aggregating because of peer influence.
Initial x and y coordinates were selected for the seed of the first fishing set clump with uniform probability. We also selected x and y coordinates for the starting positions of each of the 5 sets in a clump with uniform probability. Each set had a greater probability of moving in the direction where there were more sets. We first considered [Set.sub.0][Cell.sub.0], the first cell in the first set. We calculated the distances from [Set.sub.0][Cell.sub.0] to [Set.sub.i][Cell.sub.0], where i = 1 to 4, and summed these distances. We calculated the distances from [Set.sub.0][Cell.sub.1R], the cell to the right of the initial fishing cell, to [Set.sub.i][Cell.sub.0] and added these distances to the distances from [Set.sub.0][Cell.sub.0] to [Set.sub.i][Cell.sub.0]. We continued to calculate the distances to [Set.sub.i][Cell.sub.0] if [Set.sub.0] fished to the right and summed the distances. This algorithm gave the distance from [Set.sub.0] to [Set.sub.i][Cell.sub.0] if [Set.sub.0] moved right. We also calculated distances for [Set.sub.0] fishing up, left, and down. These calculations gave us 4 distances for [Set.sub.0], one each for moving right, left, up, and down. The direction with the smallest distance between sets should have the greatest probability, so we divided each of the 4 distances by the smallest distance. Next, we normalized the transformed distances to obtain a probability of [Set.sub.0] moving in each direction. We computed these probabilities for each of the sets to determine the direction of fishing. Subsequent set clumps were placed to prevent the overlapping of sets from different clumps (Seed + 17 [less than or equal to] x or y [less than or equal to] Seed - 17). No contraints were placed upon the overlap of set and turtle clumps, and turtles were distributed as they were in the co-occurrence clumping scenario.
Sets-only clumping scenario In the sets-only clumping scenario, when fishing sets were clumped but sea turtles were uniformly random, the direction of fishing was determined as it was in the independent clumping scenario. When turtles had a uniformly random distribution, they could occur in any cell in the grid of 100x100 cells. To maintain consistency, we placed the same number of turtles across the entire grid in the uniformly random scenarios as we did in the 5 clumps in the clumping scenarios. Distributing 20,250 turtles across the grid with uniform probability resulted in an average of 2.025 turtles/cell and 0.0203 turtles/[km.sup.2]. Although turtle densities differed between uniformly random and clumping scenarios, different probabilities of capture were applied in the clumping and uniformly random scenarios to account for higher densities in clumping scenarios. The probabilities of capture are discussed below.
Scenarios with uniformly random sets Fishing sets were uniformly random in 2 spatial scenarios: turtles-only clumping and fully uniform. In these spatial scenarios, set placement and direction of fishing were determined with a uniform probability distribution. The distributions of sea turtles were constructed as described above.
Simulating bycatch After sea turtles and fishing sets were distributed, we modeled bycatch. To quantify the number of takes, we first tallied the number of turtles that occurred in fished cells. Then, we applied a probability of capture, given co-occurrence of a turtle and set in a cell, to each encountered turtle to determine whether the set caught the turtle (Table 2). The capture probabilities varied across quarter-area strata and were based on observed bycatch rates from the SEFSC for the period from 2005 to 2007 (Walsh and Garrison, 2006; Fairfield-Walsh and Garrison, 2007; Fairfield and Garrison, 2008).
We calculated observed bycatch rates by stratum for leatherback and loggerhead sea turtles for each year from 2005 to 2007 and averaged the stratum bycatch rates across years. We decided to use the rates for leatherback sea turtles in the simulation model because leatherbacks are more of a conservation concern than are loggerheads and the average number of observed sets per year without take was smaller for leatherbacks (100 sets) than for loggerheads (179 sets), and therefore the data for leatherback sea turtles gave us a larger sample size for calculating capture probabilities.
Some of the 40 SEFSC quarter-area strata had no fishing effort, observer coverage, or observed bycatch from 2005 to 2007. The strata without effort or observer coverage were eliminated from the simulation model. Eight strata had either no fishing effort or no observer coverage from 2005 to 2007, and therefore we simulated 32 strata. For observed strata with a bycatch rate of 0 turtles/set, we calculated probabilities of capture from bycatch rates in those strata at different quarters, when possible, or used a median bycatch rate across all quarter-area strata. This algorithm was consistent with the SEFSC's pooling method in which there was pooling across quarters before pooling across fishing areas (Garrison, 2003).
The simulated bycatch probabilities also varied depending on the spatial scenario because of the different turtle densities. The bycatch rates for strata ranged from 0.263 to 0.011 turtles/set. We divided these rates by the average number of sea turtles to be encountered in 5 cells. Then, this probability was applied to each turtles that occurred in fished cells to determine whether it was caught. A set fishing among uniformly random turtles on average encountered 10.125 turtles, and a set fishing among clumped turtles on average encountered 250 turtles.
Observer distribution We attempted to simulate observed fishing sets in a design consistent with the SEFSC's procedure, in terms of both the number of observers and their spatial distribution. The SEFSC's goal for observer coverage has been 8% since 2002 (Beer-kircher et al., 2004). In our model, 8% observer coverage equated to 2 observed sets per computational group of 25 fishing sets.
The SEFSC distributes observers according to a simple random sampling design based on reported effort (Witzell and Cramer, 1995). Vessels are selected for observation in proportion to the amount of fishing reported in a quarter-area stratum in the previous year, and vessels are sampled without replacement within a quarter. Our simulation model was for one year; therefore, we had no effort from a previous year upon which to base the observer distribution. Rather, for each computational group of 25 sets, a single cell from the grid of 100x100 cells was chosen at random to represent an area of high fishing effort. Observers were placed on the 2 sets closest to this cell. Although this method included assumptions differentiating it from the SEFSC's procedure, the most important feature in both practices was the same: observers were distributed independent of bycatch rates.
In summary, the simulation model included the following main assumptions:
* Computational groups of 25 fishing sets distributed across a grid of 100x100 cells where each cell represented 10x10 km
* Clumps of sea turtles and fishing sets represented in grids of 9x9 cells
* Scenarios with clumped turtles in 5 clumps per computational group
* Scenarios with clumped sets in 5 clumps of 5 sets each per computational group
* Overlap of set and turtle clumps
* Methods for placing turtles in the grid
* Methods for placing sets in the grid
* Methods for determining the direction of fishing
* 2 sets selected for observation in each computational group
Properties of estimation methods
We applied 3 estimation methods to the simulated data: 1) the delta-lognormal method, 2) GLM-P, and 3) GLMNB. Each method was used to estimate bycatch at 2 spatiotemporal scales: 1) for each of the quarter-area strata individually and 2) for all quarter-area strata combined. At the stratum scale, we estimated bycatch for individual quarter-area strata and summed stratum-specific estimates to obtain a total annual bycatch estimate. For the second spatiotemporal scale, we pooled sets across all quarter-area strata and estimated total annual bycatch. Hence, 6 estimates of total annual bycatch (all combinations of the 3 methods and 2 scales) were made for each of the 5 spatial scenarios.
We focused on evaluating the delta-lognormal method because it has long-standing use by the SEFSC. We also chose the GLM-P and GLM-NB because they are simple model-based predictors for count data and, thus, a logical place to begin evaluation of this class of models for estimating bycatch with zero-heavy data. Although the GAM performed well in work reported in McCracken (2004), we did not include it in this analysis because nonlinear effects of predictor variables have not been extensively studied in the Atlantic and estimation of these effects typically requires data sets larger than the ones available to the SEFSC for annual estimation of bycatch. For example, Kobayashi and Polovina (2005) fit GAMs with 55,785 unobserved sets and 2812 observed sets fished over 5 years in the Pacific fishery. Therefore, in our study, we focused on evaluating the delta-lognormal method and GLMs.
Delta-lognormal estimates are essentially the product of the proportion of fishing sets with bycatch and the average rate of bycatch for those sets (Yeung, 2001). The delta-lognormal method accommodates a predominant group of observations with a value of zero by including a probability of zero catch, and observations with nonzero values are assumed to be lognormally distributed (Pennington, 1983; Ortiz et al., 2000; NMFS, 2001; Fairfield and Garrison, 2008). A lognormal distribution is a continuous probability distribution where the logarithm of the random variable has a normal distribution. Minimum-variance unbiased estimators of means and variances are provided under the delta-lognormal method when data contain many zeros and the non-zero values are lognormally distributed (Pennington, 1983; NMFS, 2001; Garrison, 2003).
The GLM extends the classical linear model by supporting the use of distributions other than the normal distribution. The GLM most commonly applied to count data, the log-linear model, uses a Poisson error distribution (McCullagh and Nelder, 1989). In a Poisson model, counts are assumed to be independent and randomly distributed in space, and the mean and variance of the random variable are assumed to be equal (McCracken, 2004; Sileshi, 2006). However, bycatch data do not always show this relationship. The variance is often larger than the mean--a case known as over-dispersion (McCracken, 2004; Potts and Elith, 2006). Patchy distributions, hierarchical data, the observation of a rare event, or lack of independence can lead to the presence of excess zeros, variance heterogeneity, and in turn, overdispersion (McCracken, 2004; Lindsey, 2004; Fahrmeir and Echavarria, 2006). Poisson models are the most commonly used and most straightforward models for count data, but the Poisson distribution accounts for neither zero-inflation nor overdispersion. If overdispersion is not addressed, standard errors can be seriously underestimated and the form of the linear predictor can be misinterpreted (Rideout et al., 2001; Potts and Elith, 2006).
Modeling responses as a negative binomial random variable may be more appropriate if data are over-dispersed (Welsh et al., 1996; Thurston et al., 2000; Lindsey, 2004; Venables and Dichmont, 2004). Unlike the Poisson distribution, which has 1 parameter, the negative binomial distribution has 2 parameters: a mean and a dispersion parameter (White and Bennetts, 1996). The dispersion parameter can be understood as a measure of the degree of clumping in a population. The negative binomial distribution with a dispersion parameter that approaches infinity is consistent with the Poisson distribution where spatial independence is assumed. The spatial independence assumption is relaxed in the negative binomial distribution (White and Bennetts, 1996).
Estimation methods applied
Generalized linear model In the past, the SEFSC and PIFSC have used fishing area, data source (observer or logbook), light stick use, gear depth, month, latitude, sea-surface temperature, day of the year, and number of hooks as explanatory variables in GLMs (Witzell and Cramer, 1995; McCracken, 2004). Our set of potential explanatory variables consisted of all variables recorded both by SEFSC observers and in SEFSC logbooks. It is important that data are recorded in both sources because the model must be fitted with data from observed sets, and data from logbooks must be used to predict bycatch on unobserved sets. The common variables are set number (the sequence of sets within the trip), mainline length, target species, presence of light stick, number of hooks, date, latitude, longitude, sea-surface temperature, and fishing area.
We included mainline length and number of hooks as potential covariates because they are measures of fishing effort and we suspected a positive relationship between amount of effort and number of sea turtles caught. SEFSC data indicate that bycatch rates vary seasonally and spatially. Therefore, we included date as a seasonal covariate and latitude, longitude, and fishing area as spatial covariates. Sea-surface temperature was expected to influence the distribution of sea turtles because they are ectotherms, and research has shown a relationship between temperature gradients and aggregation of sea turtles and swordfish (Bigelow et al., 1999; Polovina et al., 2000; Lewison et al., 2004). Set number, target species, and light stick presence were included in the GLMs as covariates describing fishing methods that may have different levels of interactions with turtles. Gear configuration and fishing method vary depending on the target species and location of fishing (Beerkircher et al., 2004). When targeting swordfish, longlines are set overnight at shallow depths (10-100 m), and a light stick is often attached several meters above the hook on every second or third branchline. In contrast, when tuna are targeted, longlines are set at dawn and hauled in at late afternoon or evening. Further, sea turtles are attracted to light sticks (Wang et al., 2007).
In the simulation model, variable values were selected from real fishing sets observed by the SEFSC from 2005 to 2007 and assigned to simulated sets. When a simulated set had bycatch, we assigned variable values from an SEFSC-observed set with bycatch. Likewise, when a simulated set did not have bycatch, variable values from an SEFSC-observed set without bycatch were assigned.
Variable assignment was also designed to reflect the spatial distribution of simulated fishing sets. In scenarios with uniformly random sets, if the first simulated set in a stratum did not have bycatch, one SEFSC-observed set that did not have bycatch was selected at random to represent the first simulated set. An analogous procedure was used if the first simulated set in a stratum had bycatch. In scenarios with clumped sets, we assigned the 5 simulated sets in a clump with attributes from SEFSC-observed sets from a single trip. Therefore, SEFSC-observed trips with fewer than 5 sets were eliminated from consideration. The remaining trips were sorted into 6 groups: 0 sets with take, at least 1 set with take and 4 sets without take, at least 2 sets with take and 3 sets without take, at least 3 sets with take and 2 sets without take, at least 4 sets with take and 1 set without take, and at least 5 sets with take. For the first clump in a stratum, an SEFSC-observed trip that had at least as many sets with and without take as the sets simulated in the clump was selected at random. SEFSC-observed sets from that trip were also selected at random to match the simulated number of sets with and without take.
Also, variable values for additional simulated sets in a stratum were selected from SEFSC-observed sets that occurred close in time and space to each other. We calculated the distance from the SEFSC-observed set chosen to represent the first simulated set in a stratum to all other SEFSC-observed sets, indexed by s in the following equation. Because SEFSC-observed sets that were closer to the first-selected set should have a greater probability of selection, the reciprocal of the distance formula was used.
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.]
In scenarios with uniformly random sets, the distance values were used to calculate probabilities of selection for each SEFSC-observed set. These probabilities were used to select SEFSC-observed sets to represent the remaining simulated sets within a stratum. The same set could not be selected multiple times within a stratum. The probabilities of selection were recalculated from the first set in subsequent strata.
In scenarios with clumped sets, the average distance value for SEFSC-observed sets within a trip was calculated, and probabilities of selection were calculated for each SEFSC-observed trip. We assigned variable values to the second clump of simulated sets in a computational group by tallying the number of simulated sets with take in the clump and using the calculated probabilities to select an SEFSC-observed trip with the corresponding number of sets with take. Once an SEFSC-observed trip was selected, the required numbers of sets with take and sets without take were selected randomly from the trip. This algorithm continued through all set clumps until the next stratum was reached. The same trip could not be selected multiple times in the same stratum, and the probabilities were recalculated for the next stratum.
To select the best-fitting GLM, we first fit a saturated model and then performed a stepwise procedure based on Akaike information criterion (AIC) values. The resulting model was used to predict the number of sea turtles caught on unobserved sets. The GLMs were fitted in R software, vers. 2.14.1 with the glm2 and glm. nb packages (R Development Core Team, 2011). The glm2 package was used because some models that fail to converge with the glm package may have greater stability with glm2. The glm.nb package is a modification of the glm package with an additional parameter for a negative binomial GLM. We built the simulation model in Microsoft Visual Basic 2010 Express (Microsoft Corp., Redmond, Washington).
Delta-lognormal method In contrast, the only information besides the observed bycatch required for the delta-lognormal estimation method was the number of hooks per set. The number of hooks per set was simulated according to the procedure we used to assign explanatory variable values to sets for the GLM. To estimate bycatch on unobserved sets, we multiplied the mean observed bycatch rate by the total simulated number of hooks.
Estimating bycatch at 2 spatiotemporal scales The SEFSC estimates bycatch in each quarter-area stratum and sums the estimates across strata to obtain a total annual estimate. In 1999, the SEFSC investigated how pooling data across strata before estimation affects the bycatch estimate (Yeung, 1999). Bycatch point estimates were relatively insensitive to pooling, but estimate precision improved considerably. The only pooling currently done by the SEFSC occurs when a stratum has no observed sets. If a stratum has no observed sets, then the mean bycatch rate of that stratum from previous years is used. Pooling data obscures variation among strata, but it increases the sample size on which bycatch estimates are made. Thus, pooling data addresses the problem of little or no observer coverage and wide confidence intervals. To evaluate the efficacy of a pooling procedure, we pooled simulated data across all quarter-area strata first and then made an estimate of total annual bycatch. We compared this procedure to estimating bycatch in each stratum and summing estimates across strata to obtain a total annual estimate.
Evaluating estimation method performance
An estimation method performs well if point estimates are unbiased and precise. The performance of each estimation method was evaluated under each spatial scenario. Incorporating the different potential sea turtle distributions, set distributions, estimation methods, and spatiotemporal scales of estimation produced 30 potential models: 5 spatial scenarios with 6 estimates each (Fig. 3). Each of the 30 potential models was simulated 1000 times.
We assessed the accuracy of an estimation method by estimating bycatch in 1000 simulations of each spatial scenario, calculating the relative error for each simulation, and identifying the median relative error for the estimation method in that spatial scenario. If an estimation method is unbiased, the median relative error should be zero. The precision of an estimation method can be measured by examining the interquartile range (IQR) of its relative errors. If the IQR of an estimation method is small, then that estimation method is precise.
In addition to evaluating estimation methods using point estimates, we also examined confidence intervals (CIs). After we determined whether the delta-lognormal method or GLMs generated better point estimates, we eliminated the less suitable method from further consideration. Then we analyzed the effects of data pooling on the CIs of the more suitable estimation method. We calculated 95% CIs for each of the 1000 simulations under every spatial scenario and data pooling method. We examined the number of times the simulated total bycatch fell outside the 95% CI for that simulation. We also considered the median CI width for pooling methods under each spatial scenario.
Essentially, we generated 1000 estimates, as the SEFSC would, for each estimation method in each spatial scenario. We made point estimates with each estimation method for each of the 1000 simulations under every spatial scenario. We calculated 95% CIs for the more suitable estimation method in each of the 1000 simulations under every spatial scenario. Because we knew the total amount of bycatch in each of the 1000 simulations, we were able to compare the bycatch estimates to the total amount of bycatch simulated and thus evaluate the performance of the estimation method.
Method suitability based on point estimates
We found the delta-lognormal method with stratum-level estimation to be the most accurate of the methods evaluated (Fig. 4). In the co-occurrence clumping scenario ([Turtles.sub.clump], [Sets.sub.clump_turtles]) and sets-only clumping scenario ([Turtles.sub.uniform], [Sets.sub.clump-sets]), stratum-level estimates were slightly more accurate than pooled estimates, and, in the remaining 3 spatial scenarios, no substantial difference was seen in accuracy between estimates at the stratum-level and estimates from all sets pooled. For each of the 5 spatial scenarios, there was also no substantial difference in the precision between delta-lognormal estimates at the stratum-level and delta-lognormal estimates from all sets pooled.
The GLMs never outperformed the delta-lognormal methods. The GLMs had an accuracy similar to that of the delta-lognormal methods in the fully uniform scenario ([Turtles.sub.uniform], [Sets.sub.uniform])' and no substantial difference was seen between GLM-P and GLM-NB performance (Fig. 4). However, the GLMs produced more outliers than the delta-lognormal methods in the fully uniform scenario ([Turtles.sub.uniform], [Sets.sub.uniform]). The GLMs were biased lower than the delta-lognormal methods in the co-occurrence clumping scenario ([Turtles.sub.clump], [Sets.sub.clump-turtles]) and sets-only clumping scenario (Turtles.sub.uniform], [Sets.sub.clump-sets]). The GLM-P was less biased and more precise than the GLM-NB in the co-occurrence clumping scenario ([Turtles.sub.clump], [Sets.sub.clump-turtles]).
The delta-lognormal method with stratum-level estimation and the delta-lognormal method for all sets pooled performed equally well in the independent clumping scenario ([Turtles.sub.clump], [Sets.sub.clump-sets]) and turtles-only clumping scenario ([Turtles.sub.clump], [Sets.sub.uniform]). However, in these spatial scenarios, the simulated bycatch rates on sets with observers were much lower than the rates reported to the SEFSC by observers. Although the mean bycatch rate from SEFSC observer data was 0.062 turtles/set (minimum 0.031 turtles/set, maximum 0.081 turtles/set), the mean bycatch rate from simulated observers was 0.006 turtles/set in the independent clumping scenario ([Turtles.sub.clump], [Sets.sub.clumpsets]) and 0.004 turtles/set in the turtles-only clumping scenario ([Turtles.sub.clump], [Sets.sub.uniform]). By comparison, the mean bycatch rate from simulated observers was 0.122 turtles/set in the co-occurrence clumping scenario ([Turtles.sub.clump], [Sets.sub.clump-turtles]), 0.098 turtles/set in the sets-only clumping scenario ([Turtles.sub.uniform], [Sets.sub.clump-sets), and 0.095 turtles/set in the fully uniform scenario (Turtles.sub.uniform], [Sets.sub.uniform]). There were also more outliers in the independent clumping scenario ([Turtles.sub.clump], [Sets.sub.clump-sets]), and turtles-only clumping scenario ([Turtles.sub.clump], [Sets.sub.uniform]), and the IQRs and whiskers (data within 1.5 times the IQR) were larger for these 2 spatial scenarios than for the other 3 scenarios.
Convergence problems in GLMs
The GLM-P and GLM-NB did not converge for stratumlevel estimation in any spatial scenario. For example, in the co-occurrence clumping scenario ([Turtles.sub.clump], [Sets.sub.clump-turtles]), the spatial scenario with the greatest mean observed bycatch rate, the median number of strata with observed take was 19 out of 32. For strata with observed take, the median number of sets with take was 2. The stratum-level GLMs could not converge with such small sample sizes.
Therefore, the GLM-P and GLM-NB methods were considered for estimation only with all sets pooled. Further, for a reason similar to that for the failure of the GLMs at the stratum-level, the GLM-P and GLM-NB methods for estimation with all sets pooled did not converge in the independent clumping scenario ([Turtles.sub.clump], [Sets.sub.clump-sets]) or turtles-only clumping scenario ([Turtles.sub.clump], [Sets.sub.uniform]). The average number of observed sets with take, out of all observed sets pooled, was 2.64 for the independent clumping scenario ([Turtles.sub.clump], [Sets.sub.clump-sets]) and 1.86 for the turtles-only clumping scenario ([Turtles.sub.clump], [Sets.sub.uniform]). Therefore, GLM results are not presented for these 2 spatial scenarios.
In addition to generating an accurate point estimate, a bycatch estimation method should be able to produce a suitable measure of uncertainty, such as a CI. For every spatial scenario, the median 95% CI calculated from the delta-lognormal method was narrower with estimation from all sets pooled than with estimation from strata (Table 3). In the 2 spatial scenarios thought to be most realistic, the co-occurrence clumping scenario ([Turtles.sub.clump], [Sets.sub.clump-turtles]) and sets-only clumping scenario ([Turtles.sub.uniform], [Sets.sub.clump-sets]), the median widths of the CIs based on all sets pooled were ~54% and ~59% of the point estimates, respectively. However, the median widths of the CIs from stratum estimates were ~84% and ~93% of the point estimates, respectively (Table 3). Although the median CIs from all sets pooled were narrower, instances of the total simulated bycatch falling outside the CI occurred more often with all sets pooled than at the stratum level (Table 4). With 95% CIs from each of 1000 simulations, it was expected that the total simulated bycatch would fall below the CI in 25 simulations and be above the CI in 25 simulations. The stratum-level CIs for the more realistic spatial scenarios had far fewer than 25 estimates above and 25 estimates below; therefore the stratum-level CIs were too conservative (Table 4). Alternatively, the CIs from all sets pooled performed well in the sets-only clumping scenario ([Turtles.sub.uniform], [Sets.sub.clump-sets]) but, in the co-occurrence clumping scenario (Turtlesclump, [Sets.sub.clump-turtles])' they contained values that were less than the true amount of bycatch more often than expected (Table 4).
Performance of the estimation methods
The delta-lognormal method with stratum-level estimates was the most suitable method in the most realistic spatial scenarios, the co-occurrence clumping scenario ([Turtles.sub.clump], [Sets.sub.clump-turtles]) and sets-only clumping scenario ([Turtles.sub.uniform], [Sets.sub.clump-sets]). This result was seen because observed sets were representative of unobserved sets, sample sizes of observed bycatch were sufficient for estimating bycatch within strata, and model assumptions were not violated.
Observed fishing sets were representative in the cooccurrence clumping scenario ([Turtles.sub.clump], [Sets.sub.clump-turles]) because all sets fished where sea turtles were present. Likewise, observed sets were representative in the sets-only clumping scenario ([Turtles.sub.uniform], [Sets.sub.clump-sets]) because each set had the same probability of encountering a turtle when turtles had a uniformly random distribution. Further, because these 2 spatial scenarios had enough observed bycatch within strata to make stratum-level estimates, strata did not have to be pooled to achieve larger sample sizes. Therefore, differences between strata could be captured and potential biases associated with pooling were avoided.
On the other hand, the GLMs could be used only to estimate bycatch for all sets pooled because of convergence problems related to the small amount of observed bycatch in strata. Moreover, the relationship between environmental and fishing conditions and the amount of bycatch was probably not well established in these models because bycatch was rare and observer coverage was low. The use of poorly fitted models could explain why the GLM estimates had lower precision than the delta-lognormal estimates.
The GLMs were as accurate as the delta-lognormal methods in the fully uniform scenario ([Turtles.sub.uniform], [Sets.sub.uniform]) because this spatial scenario was the only one that did not violate the GLM-P assumption that counts are independent and randomly distributed in space (McCracken 2004, Sileshi 2006). Violations of GLM-P assumptions introduced biases in the other spatial scenarios. Additionally, it is likely that the GLMNB did not perform better than the GLM-P because overdispersion was not a problem (White and Bennetts, 1996; Sileshi, 2006).
In the 2 scenarios where sea turtles were clumped but sets did not mimic their clumping pattern, a low level of bycatch was seen. Under the independent clumping scenario ([Turtles.sub.clump], [Sets.sub.clump-sets]) and turtles-only clumping scenario ([Turtles.sub.clump], [Sets.sub.uniform]), some sets were not expected to encounter any turtles, whereas other sets were expected to encounter many turtles, but the overall frequency of encountering turtles was low. The lowest mean observed bycatch rate occurred in the turtles-only clumping scenario ([Turtles.sub.clump], [Sets.sub.uniform]) with 0.004 turtles/set. This low observed bycatch rate is likely related to the delta-lognormal method having the most bias in this spatial scenario as well. The delta-lognormal method of estimating stratum-level bycatch had a median relative error of -0.17 in the turtles-only clumping scenario ([Turtles.sub.clump], [Sets.sub.uniform]). The median relative error was only -0.05 in the co-occurrence clumping scenario ([Turtles.sub.clump], [Sets.sub.clump-turtles]) and -0.02 in the sets-only clumping scenario ([Turtles.sub.uniform], [Sets.sub.clump-sets]), the 2 most realistic spatial scenarios.
CIs were narrower for estimates from all sets pooled than for stratum-level estimates because the variance in bycatch rates was larger when calculated for strata than when calculated for all sets pooled. Consideration of CI width as a percentage of the bycatch point estimate highlighted how wide and, therefore, uninformative was the standard CI based on strata. Narrowing the CI with calculations from all sets pooled helped address this problem, but the problem of wide CIs was compounded by more underestimation than desired. For protected species conservation, underestimation is more problematic than overestimation. It is important to know a lower bound estimate for protected resource conservation because protected species have an incidental take limit that, if crossed, triggers formal consultation under section 7 of the ESA.
Simulation model assumptions and their implications
Although a simulation model never captures reality perfectly, it is important to consider the effects of model assumptions on results. We attempted to make reasonable assumptions both when incorporating well understood aspects of interactions of sea turtles and the U.S. Atlantic pelagic longline fishery and when modeling unknown features. However, each component of the simulation model could be designed in many ways. We consider the most influential assumptions to be: 1) spatial constraints, 2) the algorithm for selecting explanatory variable values for the GLM, and 3) the simplified effort-based distribution of observers.
First, density of sea turtles, the spatial configuration of sea turtles, the spatial characteristics of fishing, and their interactions had to be defined explicitly in our model. We made assumptions regarding the number of turtles, size of clumps, number of clumps, how turtles or fishing sets should be placed in clumps, how clumps could overlap, and the extent of the study area. Clump placement, the number of turtles per cell, the initial cell of a set, and the direction of fishing had stochastic elements, but model results could be influenced by constraints on the dimensions and spatial distribution of turtles and sets. Further, we acknowledge that our 5 spatial scenarios did not fully replicate reality. We attempted to represent a range of possible distributions, both to model longline interactions with sea turtles and to highlight properties of the estimation methods that could be relevant to other systems. Perhaps, the next step would be to combine multiple spatial scenarios in one model of fishery interactions. In other words, variation in spatial distributions could be more realistically captured by including more than one spatial scenario in a simulation.
Second, the GLM is based on the premise that environmental or fishing conditions can be used to predict the number of sea turtles caught. Therefore, the manner in which explanatory variable values were assigned to fishing sets in the simulation could have affected GLM performance. We selected variable values from sets observed by the SEFSC from 2005 to 2007 while attempting to account for the spatial distribution and stratum characteristics of the sets. However, variable values could be assigned in many ways, and different procedures could influence how well the GLMs estimated bycatch. Nevertheless, violation of GLM model assumptions could still be a problem even if a more realistic algorithm for selecting explanatory variable values was identified. Since some degree of set and turtle clumping seems to occur in nature and counts are at least dependent within a trip, violations of GLM-P assumptions are likely even with an improved algorithm for selecting explanatory variable values. Perhaps, GLM-NB performance would be improved under a more suitable algorithm for selecting explanatory variable values. However, the GLM-NB is typically used to address overdispersion (Welsh et al., 1996; Thurston et al., 2000; Lindsey, 2004; Venables and Dichmont, 2004), and little overdispersion was detected in our simulation model.
Overall, we do not expect the performance of the GLM-P to change in comparison with the delta-lognormal method. Also, the performance of the stratum-level delta-lognormal method compared with the performance of the delta-lognormal method with all sets pooled is likely robust. However, the GLM-NB could improve its performance relative to the other estimation methods if a clearer functional relationship between the explanatory variables and the level of bycatch was captured.
Third, we modeled a simplified effort-based distribution of observers to simulate observer data for estimating bycatch. If there are different patterns in SEFSC observer data and simulated observer data, the performance of the estimation method in the simulation may not accurately reflect the performance of the estimation method in the actual fishery. The SEFSC currently selects vessels for each calendar quarter and fishing area based on how many sets a vessel fished in that stratum in the previous year (Beerkircher et al., 2004; Fairfield and Garrison, 2008). Vessels that fished more sets in the previous year have a greater chance of being observed by the SEFSC in the current year, and a vessel may be observed up to 4 times a year (Beerkircher et al., 2004). Our simulation model, however, did not cover multiple years, and therefore the quarter-area effort data from the previous year were not available for the distribution of observers. Instead, we selected a cell at random to serve as an area of high effort and placed observers on the 2 sets (of the 25 simulated sets in a computational group) that were closest to that cell. The patterns that we were able to simulate, and that we believe are most relevant, are 8% observer coverage in each stratum and an observer distribution that is independent of the presence of sea turtles.
Recommendations for management
Bycatch in commercial fisheries is believed to be the main anthropogenic threat to sea turtles, and the pelagic longline fishery is considered one of the 3 fisheries most affecting sea turtles (Witherington et al., 2009). Therefore, improving bycatch estimates is important for sea turtle conservation and effective fishery management. Results from this study indicate that estimating bycatch with the stratum-level delta-lognormal method is appropriate and support the current procedure used by the SEFSC.
General application to zero-heavy data analysis
Not accounting for excess zeros and using models with inappropriate assumptions can result in biased estimates and incorrect conclusions (Martin et al., 2005), as was seen in the performance of the GLMs in our simulation. This study further supports the notion that no one model is clearly most appropriate for analyzing zero-heavy data (Sileshi, 2006). Rather, models must be compared to select a model that is most suitable for the data and the required output (Sileshi, 2006). We cannot recommend one method for addressing all zero-heavy data, but our study shows the importance of recognizing variance across time and space, demonstrates the necessity of representative samples and sample size, and indicates that the delta-lognormal method generates estimates that are less biased and more precise than the GLMs in the case of sea turtle bycatch by the U.S. Atlantic pelagic longline fishery. Many other fields with zero-heavy data also would benefit from an increased understanding of the delta-lognormal method and GLM.
The views and opinions expressed or implied in this article are those of the author (or authors) and do not necessarily reflect the position of the National Marine Fisheries Service, NOAA.
We wish to thank M. Kelly, P. Richards, and E. Smith for helpful suggestions regarding this project. We also would like to thank C. Beasley, J. Hart, J. Hepinstall-Cymerman, T. Prebyl, and C. Ricketts for their comments on this manuscript. Funding was provided by the National Marine Fisheries Service Southeast Fisheries Science Center.
Manuscript submitted 22 September 2011.
Manuscript accepted 29 May 2012.
Beerkircher, L. R., C. J. Brown, D. L. Abercrombie, and D. W. Lee.
2004. SEFSC Pelagic Observer Program data summary for 1992-2002. NOAA Tech. Memo. NMFS-SEFSC-522, 25 p.
Bigelow, K. A., C. H. Boggs, and X. He.
1999. Environmental effects on swordfish and blue shark catch rates in the U.S. North Pacific longline fishery. Fish. Oceanogr. 3:178-198.
Byles, R. A.
1988. The behavior and ecology of sea turtles in Virginia. Ph.D. diss., 112 p. Virginia Inst. Mar. Sci., College of William and Mary, Gloucester Point, VA.
Crowder, L. B., and S. A Murawski.
1998. Fisheries bycatch: implications for management. Fisheries 23(6):8-17.
Epperly, S. P., J. Braun, and A. J. Chester.
1995. Aerial surveys for sea turtles in North Carolina inshore waters. Fish. Bull. 93:254-261.
Fahrmeir, L., and L. O. Echavarria.
2006. Structured additive regression for overdispersed and zero-inflated count data. Appl. Stoch. Model Bus. Ind. 22:351-369.
Fairfield, C. P., and L. P. Garrison.
2008. Estimated bycatch of marine mammals and sea turtles in the U.S. Atlantic pelagic longline fleet during
2007. NOAA Tech. Memo. NMFS-SEFSC-572, 62 p.
Fairfield-Walsh, C., and L. Garrison.
2007. Estimated bycatch of marine mammals and turtles in the U.S. Atlantic pelagic longline fleet during
2006. NOAA Tech. Memo. NMFS-SEFSC-560, 54 p.
Gardner, B., P. d. Sullivan, S. J. Morreale, and S. P. Epperly.
2008. Spatial and temporal statistical analysis of bycatch data: patterns of sea turtle bycatch in the North Atlantic. Can. J. Fish. Aquat. Sci. 65:2461-2470.
Garrison, L. P.
2003. Estimated bycatch of marine mammals and turtles in the U.S. Atlantic pelagic longline fleet during 2001-2002. NOAA Tech. Memo. NMFS-SEFSC-515, 52 p.
Gilman, E., E. Zollett, S. Beverly, H. Nakano, K. Davis, D. Shiode, P. Dalzell, and I, Kinan.
2006. Reducing sea turtle by-catch in pelagic longline fisheries. Fish Fish. 7:2-23.
Godley, B. J., J. M. Blumenthal, A. C. Broderick, M. S. Coyne, M. H. Godfrey, L. A. Hawkes, and M. J. Witt.
2007. Satellite tracking of sea turtles: Where have we been and where do we go next? Endang. Species Res. 4:3-22.
Goodman, M. A., J. B. McNeill, E. Davenport, and A. A. Hohn.
2007. Protected species aerial survey data collection and analysis in waters underlying the R-5306A Airspace: final report submitted to U.S. Marine Corps, MCAS Cherry Point. NOAA Tech. Memo. NMFS-SEFSC-551, 25 p.
Kobayashi, D. R., and J. J. Polovina.
2005. Evaluation of time-area closures to reduce incidental sea turtle take in the Hawaii-based longline fishery: generalized additive model (GAM) development and retrospective examination. NOAA Tech. Memo. NMFS-PIFSC-4, 46 p.
Lewison, R. L., S. A. Freeman, and L. B. Crowder.
2004. Quantifying the effects of fisheries on threatened species: the impact of pelagic longlines on loggerhead and leatherback sea turtles. Ecol. Lett. 7:221-231.
Lindsey, J. K.
1997. Applying generalized linear models, 282 p. Springer, New York.
2004. Introduction to applied statistics: modelling approach, 2nd ed., 336 p. Oxford Univ. Press, New York.
Mansfield, K. L.
2006. Sources of mortality, movements and behavior of sea turtles in Virginia. Ph.D. diss., 367 p. Virginia Inst. Marine Science, College of William and Mary, Gloucester Point, VA.
Martin, T. G., B. A. Wintle, J. R. Rhodes, P. M. Kuhnert, S. A. Field, S. J. Low-Choy, A. J. Tyre, and H. P. Possingham.
2005. Zero tolerance ecology: improving ecological inference by modelling the source of zero observations. Ecol. Lett. 8:1235-1246.
McClellan, D. B.
1996. Aerial surveys for sea turtles, marine mammals, and vessel activity along the southeast Florida coast, 1992-1996. NOAA Tech. Memo. NMFS-SEFSC-390, 42 p.
McCracken, M. L.
2004. Modeling a very rare event to estimate sea turtle bycatch: lessons learned. NOAA Tech. Memo., NMFSPIFSC-3, 30 p.
McCullagh, P., and J. A. Nelder.
1989. Generalized linear models, 2nd ed., 532 p. Chapman and Hall, New York.
McDaniel, C. J., L. B. Crowder, and J. A. Priddy.
2000. Spatial dynamics of sea turtle abundance and shrimping intensity in the U.S. Gulf of Mexico. Conserv. Ecol. 4(1):Article 15.
NMFS (National Marine Fisheries Service).
2001. Stock assessments of loggerhead and leatherback sea turtles and an assessment of the impact of the pelagic longline fishery on the loggerhead and leatherback sea turtles of the Western North Atlantic. NOAA Tech. Memo. NMFS-SEFSC-455, 343 p.
2006. Final consolidated Atlantic Highly Migratory Species Fishery Management Plan. Public Document, 1600 p. NOAA Office of Sustainable Fisheries, Highly Migratory Species Manage. Div., Silver Spring, MD.
Nelson, D. A.
1996. Subadult loggerhead sea turtle (Caretta caretta) behavior in St. Mary's entrance channel, Georgia, USA. Ph.D. diss., 132 p. Virginia Inst. Mar. Sci., College of William and Mary, Gloucester Point, VA.
Ortiz, M., C. M. Legault, and N. M. Ehrhardt.
2000. An alternative method for estimating bycatch from the U.S. shrimp trawl fishery in the Gulf of Mexico, 1972-1995. Fish. Bull. 98:583-599.
1983. Efficient estimators of abundance, for fish and plankton surveys. Biometrics 39:281-286.
Polovina, J. J., D. R. Kobayashi, D. M. Parker, M. P. Seki, and G. H. Balazs.
2000. Turtles on the edge: movement of loggerhead turtles (Caretta caretta) along oceanic fronts, spanning longline fishing grounds in the central North Pacific, 1997-1998. Fish. Oceanogr. 9:71-82.
Potts, J. M., and J. Elith.
2006. Comparing species abundance models. Ecol. Model. 199:153-163.
R Development Core Team.
2011. R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria.
Rideout, M., J. Hinde, and C. G. B. Demetrio.
2001. A score test for testing a zero-inflated Poisson regression model against zero-inflated negative binomial alternatives. Biometrics 57:219-223.
2006. Selecting the right statistical model for analysis of insect count data by using information theoretic measures. Bull. Entomol. Res. 96:479-488.
Skillman, R. A., and P. Kleiber.
1998. Estimation of sea turtle take and mortality in the Hawai'i-based longline fishery, 1994-96. NOAA Tech. Memo. NMFS-SWFSC-257, 56 p.
Thurston, S. W., M. P. Wand, and J. K. Wiencke.
2000. Negative binomial additive models. Biometrics 56:139-144.
Venables, W. N., and C. M. Dichmont.
2004. GLMs, GAMs, and GLMMs: an overview of theory for applications in fisheries research. Fish. Res. 70: 319-337.
Walsh, C. F., and L. P. Garrison.
2006. Estimated bycatch of marine mammals and turtles in the U.S. Atlantic pelagic longline fleet during
2005. NOAATech. Memo. NMFS-SEFSC-539, 52 p. Waiters, C.
2003. Folly and fantasy in the analysis of spatial catch rate data. Can. J. Fish. Aquat. Sci. 60:1433-1436.
Wang, J. H., L. C. Boles, B. Higgins, and K. J. Lohmann.
2007. Behavioral responses of sea turtles to lightsticks used in longline fisheries. Anim. Conserv. 10:176-182.
Watson, J. W., S. P. Epperly, A. K. Shah, and D. G. Foster.
2005. Fishing methods to reduce sea turtle mortality associated with pelagic longlines. Can. J. Fish. Aquat. Sci. 62:965-981.
Welsh, A. H., R. B. Cunningham, C. F. Donnelly, and D. B. Lindenmayer.
1996. Modelling the abundanceofrarespecies:statistical models for counts with extra zeros. Ecol. Model. 88:297-308.
White, G. C. and R. E. Bennetts.
1996. Analysis of frequency count data using the negative binomial distribution. Ecology 77:2549-2557.
Williams, P., P. J. Anninos, P. T. Plotkin, and K. L. Salvini.
1996. Pelagic longline fishery-sea turtle interactions: proceedings of an industry, academic and government experts, and stakeholders workshop held in Silver Springs, Maryland, 24-25 May 1994. NOAA Tech. Memo. NMFS-OPR-7, 77 p.
Witherington, B., P. Kubilis, B. Brost, and A. Meylan.
2009. Decreasing annual nest counts in a globally important loggerhead sea turtle population. Ecol. Appl. 19(1):30-54.
Witzell, W. N.
1999. Distribution and relative abundance of sea turtles caught incidentally by the U.S. pelagic longline fleet in the western North Atlantic Ocean, 1992-1995. Fish. Bull. 97:200-211.
Witzell, W. N., and J. Cramer.
1995. Estimates of sea turtle by-catch by the U.S. pelagic longline fleet in the western north Atlantic Ocean. NOAATech. Memo. NMFS-SEFSC-359, 14p.
1999. Estimates of marine mammal and marine turtle bycatch by the U.S. Atlantic pelagic longline fleet in 1998. NOAATech. Memo. NMFS-SEFSC-430, 26 p.
2001. Estimates of marine mammal and marine turtle bycatch by the U.S. Atlantic pelagic longline fleet in 1999-2000. NOAA Tech. Memo. NMFS-SEFSC-467, 43 p.
(1) NMFS (National Marine Fisheries Service). 2009. Annual commercial landings by gear type, http://www.st.nmfs.noaa.gov/st1/commercial/landings/gear_landings.html, accessed 12 May.
(2) Garrison, L.P. 2009. Personal commun. National Marine Fisheries Service Southeast Fisheries Science Center, Miami, FL.
Paige F. Barlow (contact author) 
Jim Berkson 
Email address for contact author: firstname.lastname@example.org
 Department of Fish and Wildlife Conservation Virginia Polytechnic Institute and State University 100 Cheatham Hall, Blacksburg, Virginia 24061 Present address: Warnell School of Forestry and Natural Resources University of Georgia 180 E Green Street, Athens, Georgia 30602
 National Marine Fisheries Service Southeast Fisheries Science Center NMFS-RTR Program at Virginia Tech 100 Cheatham Hall, Blacksburg, Virginia 24061
Table 1 Data from the National Marine Fisheries Service (NMFS) Southeast Fisheries Science Center (SEFSC) and the scientific litera ture that were used in the simulation model built to represent interactions of sea turtles with the U.S. Atlantic pelagic longline fishery. Model features Values Source based on existing data Mean number of 8000 SEFSC annual fishing sets Mean mainline length of fishing 50 km SEFSC sets Attributes of Set number within a SEFSC fishing sets trip, mainline length, target species, presence of light stick, number of hooks, sea- surface temperature, fishing area, date, latitude, longitude Spatial scenarios Clumping apparent in SEFSC of fishing sets location records Observer coverage 8% SEFSC Spatial and temporal Variation across 4 SEFSC variation in fishing calendar quarters effort and bycatch and 10 fishing areas in the Atlantic Ocean, Caribbean Sea, and Gulf of Mexico Probability of sea Variation across SEFSC turtle capture calendar quarters and fishing areas Density of sea 0.5 turtles/ Byles (1988); Nelson turtles [km.sup.2] (1996); Mansfield (2006); Goodman et al. (2007) Spatial scenarios Clumping related to Williams et al. of sea turtles currents, frontal (1996); Bigelow et regions, bathymetric al. (1999); Witzell features, and prey (1999); Polovina et al. (2000); Lewison et al. (2004); Gilman et al. (2006); Gardner et al. (2008) Clumping area 90x90 km Gardner et al. (2008) Table 2 The number of computational groups of 25 fishing sets and bycatch probabilities per stratum that were included in the simulation model built to represent interactions of sea turtles with the U.S. Atlantic pelagic longline fishery. Bycatch probabilities differed between scenarios with clumping turtles and scenarios with turtles placed with a uniform probability because of the different turtle densities. Clumping means that turtles tend to concentrate in certain areas rather than occur equally spaced or spaced with uniform probability. The SEFSC estimates bycatch of sea turtles for each of 4 calendar quarters (Ql through Q4) and for each of 10 geographic regions or fishing areas. The SEFSC uses the following names for these fishing areas: CAR=Caribbean, FEC=Florida East Coast, GOM=Gulf of Me x ico, MAB=Mid-Atlantic Bight, NCA=North Central Atlantic, NEC=Northeast Coastal, NED=Northeast Distant, SAB=South Atlantic Bight, SAR=Sargasso Sea, and TUN=Tuna North. We simulated 32 quarter-area strata because 8 strata were without fishing or observer coverage from 2005 to 2007. Number of Bycatch Simulated SEFSC computational probability: quarter-area quarter-area groups turtles uniformly stratum stratum basis simulated random 1 Q1-CAR 3 6.17 x [10.sup.-3] 2 Q1-FEC 8 1.15 x [1O.sup.-2] 3 Q1-GOM 39 2.36 x [10.sup.-3] 4 Q1-MAB 6 5.20 x [10.sup.-3] 5 Q1-NCA 1 2.74 x [10.sup.-3] 6 Q1-SAB 5 8.98 x [10.sup.-3] 7 Q1-SAR 4 3.09 x [10.sup.-3] 8 Q1-TUN 1 2.74 x [10.sup.-3] 9 Q2-CAR 1 6.17 x [10.sup.-3] 10 Q2-FEC 7 2.74 x [10.sup.-3] 11 Q2-GOM 40 5.57 x [10.sup.-3] 12 Q2-MAB 10 5.20 x [10.sup.-3] 13 Q2-NCA 1 2.74 x [10.sup.-3] 14 Q2-NEC 2 1.32 x [10.sup.-2] 15 Q2-NED 1 2.59 x [10.sup.-2] 16 Q2-SAB 19 8.98 x [10.sup.-3] 17 Q2-TUN 2 2.74 x [10.sup.-3] 18 Q3-FEC 6 7.11 x [10.sup.-3] 19 Q3-GOM 38 1.07 x [10.sup.-3] 20 Q3-MAB 24 3.14 x [10.sup.-3] 21 Q3-NEC 12 1.99 x [10.sup.-2] 22 Q3-NED 12 2.23 x [10.sup.-2] 23 Q3-SAB 5 8.98 x [10.sup.-3] 24 Q3-TUN 2 2.74 x [10.sup.-3] 25 Q4-FEC 3 7.11 x [10.sup.-3] 26 Q4-GOM 31 1.02 x [10.sup.-2] 27 Q4-MAB 23 7.26 x [10.sup.-3] 28 Q4-NCA 2 6.17 x [10.sup.-3] 29 Q4-NEC 3 2.96 x [10.sup.-2] 30 Q4-NED 5 8.98 x [10.sup.-3] 31 Q4-SAB 2 2.35 x [10.sup.-2] 32 Q4-SAR 2 2.74 x [10.sup.-3] Bycatch Simulated SEFSC probability: quarter-area quarter-area turtles stratum stratum basis clumping 1 Q1-CAR 2.50 x [10.sup.-4] 2 Q1-FEC 4.65 x [10.sup.-4] 3 Q1-GOM 9.55 x [10.sup.-5] 4 Q1-MAB 2.11 x [10.sup.-4] 5 Q1-NCA 1.11 x [10.sup.-4] 6 Q1-SAB 3.64 x [10.sup.-4] 7 Q1-SAR 1.25 x [10.sup.-4] 8 Q1-TUN 1.11 x [10.sup.-4] 9 Q2-CAR 2.50 x [10.sup.-4] 10 Q2-FEC 1.11 x [10.sup.-4] 11 Q2-GOM 2.25 x [10.sup.-4] 12 Q2-MAB 2.11 x [10.sup.-4] 13 Q2-NCA 1.11 x [10.sup.-4] 14 Q2-NEC 5.33 x [10.sup.-4] 15 Q2-NED 1.05 x [10.sup.-3] 16 Q2-SAB 3.64 x [10.sup.-4] 17 Q2-TUN 1.11 x [10.sup.-4] 18 Q3-FEC 2.88 x [10.sup.-4] 19 Q3-GOM 4.35 x [10.sup.-5] 20 Q3-MAB 1.27 x [10.sup.-4] 21 Q3-NEC 8.08 x [10.sup.-4] 22 Q3-NED 9.02 x [10.sup.-4] 23 Q3-SAB 3.64 x [10.sup.-4] 24 Q3-TUN 1.11 x [10.sup.-4] 25 Q4-FEC 2.88 x [10.sup.-4] 26 Q4-GOM 4.14 x [10.sup.-4] 27 Q4-MAB 2.94 x [10.sup.-4] 28 Q4-NCA 2.50 x [10.sup.-4] 29 Q4-NEC 1.20 x [10.sup.-3] 30 Q4-NED 3.64 x [10.sup.-4] 31 Q4-SAB 9.52 x [10.sup.-4] 32 Q4-SAR 1.11 x [10.sup.-4] Table 3 Median widths of confidence intervals (CIs) from the 5 spatial scenarios and 2 spatiotemporal scales of delta-lognormal estimation in our simulation model of interactions of sea turtles with the U.S. Atlantic pelagic longline fishery. The numbers in parentheses represent the median widths of the CIs as percentages of the bycatch point estimates. The co-occurrence clumping scenario and sets-only clumping scenario were considered the most realistic spatial scenarios. Spatiotemporal scale for estimation Spatial scenario Stratum level All sets pooled Co-occurrence 649.8 (84.1%) 402.8 (53.4%) clumping ([Turtles.sub.clump], [Sets.sub.clump-turtles]) Independent clumping 100.4 (315.2%) 88.5 (268.7%) ([Turtles.sub.clump], [Sets.sub.clump-sets]) Sets-only clumping 570.6 (92.5%) 355.7 (59.3%) ([Turtles.sub.uniform], [Sets.sub.clump-sets) Turtles-only 84.4 (402.6%) 74.0 (322.7%) clumping ([Turtles.sub.clump], [Sets.sub.uniform]) Fully uniform 523.3 (89.8%) 335.2 (58.0%) distribution ([Turtles.sub.uniform], [Sets.sub.uniform] Table 4 Number of simulations representing interactions of sea turtles with the U.S. Atlantic pelagic longline fishery in which the simu-lated amount of bycatch fell outside the 95% confidence interval (CI). We ran 1000 simulations for each of the 5 spatial scenarios and 2 spatiotemporal scales of delta-lognormal estimation. Underestimation occurs when the total simulated amount of bycatch falls above the CI, and overestimation occurs when the total simulated amount of bycatch falls below the CI. The co-occurrence clumping scenario and sets- only clumping scenario were considered the most realistic spatial scenarios. Spatiotemporal scale for estimation Stratum level Spatial scenario Underestimate Overestimate Co-occurrence clumping 14 0 ([Turtles.sub.clump], [Sets.sub.clump-turtles]) Independent clumping 2 55 ([Turtles.sub.clump], [Sets.sub.clump,sets]) Sets-only clumping 1 1 ([Turtles.sub.uniform], [Sets.sub.slump-sets]) Turtles-only clumping 0 1 ([Turtles.sub.uniform], [Sets.sub.uniform]) Fully uniform 3 0 ([Turtles.sub.uniform], [Sets.sub.uniform]) Spatiotemporal scale for estimation All sets pooled Spatial scenario Underestimate Overestimate Co-occurrence clumping 61 8 ([Turtles.sub.clump], [Sets.sub.clump-turtles]) Independent clumping 2 77 ([Turtles.sub.clump], [Sets.sub.clump,sets]) Sets-only clumping 15 10 ([Turtles.sub.uniform], [Sets.sub.slump-sets]) Turtles-only clumping 0 15 ([Turtles.sub.uniform], [Sets.sub.uniform]) Fully uniform 60 2 ([Turtles.sub.uniform], [Sets.sub.uniform])
|Printer friendly Cite/link Email Feedback|
|Author:||Barlow, Paige F.; Berkson, Jim|
|Date:||Jul 1, 2012|
|Previous Article:||Evaluation of rockfish abundance in untrawlable habitat: combining acoustic and complementary sampling tools.|
|Next Article:||Interdecadal change in growth of sablefish (Anoplopoma fimbria) in the northeast Pacific Ocean.|