 Research article
 Open Access
 Published:
The systematic sampling for inferring the survey indices of Korean groundfish stocks
Fisheries and Aquatic Sciences volume 21, Article number: 24 (2018)
Abstract
The Korean bottom trawl survey has been deployed on a regular basis for about the last decade as part of groundfish stock assessments. The regularity indicates that they sample groundfish once per grid cell whose sides are half of one latitude and that of one longitude, respectively, and whose inside is furthermore divided into nine nested grids. Unless they have a special reason (e.g., running into a rocky bottom), their sample location is at the center grid of the nine nested grids. Given data collected by the survey, we intended to show how to appropriately estimate not only the survey index of a fish stock but also its uncertainty. For the regularity reason, we applied the systematic sampling theory for the above purposes and compared its results with a reference, which was based on the simple random sampling. When using the survey data about 11 fish stocks, collected by the spring and fall surveys in 2014, the survey indices of those stocks estimated under the systematic sampling were overall more precise than those under the simple random sampling. In estimates of the survey indices in number, the standard errors of those estimates under the systematic sampling were reduced from those under the simple random sampling by 0.23~27.44%, while in estimates of the survey indices in weight, they decreased by 0.04~31.97%. In bias of the estimates, the systematic sampling was the same as the simple random sampling. Our paper is first in formally showing how to apply the systematic sampling theory to the actual data collected by the Korean bottom trawl surveys.
Background
Based on the statistical sampling theory, most developed countries in fisheries management have deployed a bottom trawl survey on a periodic basis (e.g., seasonally) for several decades as part of groundfish stock assessments (NEFSC 2012; NWFSC 2011). They use the survey data to calculate the relative size of a fish population, called “the survey index” of a fish stock (Hyun et al. 2015; NEFSC 2012). They commonly used the stratified random sampling, considered to be appropriate for a survey over a large area such as ocean areas or lakes (Hyun et al. 2015; Smith and Gavaris 1993; Smith and Hubley 2014). For example, the bottom trawl survey of the US west and east coast groundfish and a count survey in Columbia River salmon spawning areas in the Northwestern USA have been based on a stratified random sampling design (Hyun et al. 2015; NEFSC 2012; NWFSC 2011).
On the other hand, South Korea’s (hereafter, Korea) stock assessments have a relatively short history. The Korean National Institute of Fisheries Science (KNIFS) started a bottom trawl survey about one decade ago (Lee 2018; Lee and Hyun 2017). Unlike the common practice of stratified random sampling, KNIFS has sampled groundfish on a regular basis every spring and fall. Rectangular grids were set over the ocean around the Korean peninsula, whose sides are onehalf degree of latitude by onehalf degree of longitude (Fig. 1), and KNIFS had sampled once per grid square in each survey season (see the “Methods” section for details). Such survey data have been accumulated but rarely used for inferring the survey indices of groundfish stocks, and one of our objectives in this paper is to illustrate how to use those data for estimating the survey indices of fish stocks and quantifying the uncertainty of the estimated survey indices. We would like to underscore that our analysis of the survey data is conditional on the data collection. In other words, we have nothing to do with the initial stage of the sampling, which is to “design” a survey. Thus, this paper’s issues do not include which sampling design is more or less appropriate.
For the regularity, we suggest that the systematic sampling theory should be applied for estimating the survey index and its uncertainty. In this paper, we intend to illustrate how to apply the systematic sampling theory to the Korean bottom trawl survey data, especially how to quantify uncertainty of the survey index of a fish stock under the systematic sampling method.
Methods
Sampling unit
The bottom trawl survey has been deployed every spring and fall since 2000, and it has trawled once per grid cell (Fig. 1) within each survey season. Each grid cell is a rectangle whose sides are onehalf degree of latitude by onehalf degree of longitude and which is divided into nine nested grid cells (Fig. 1). Generally, the survey has trawled at the center nested grid cell (“5” in Figs. 1 and 2) unless the survey had a problem (e.g., running into a rocky bottom). The practice of trawling once per grid cell (i.e., at the center nested grid cell) indicates an implicit assumption that the fish density is equal across all nine nested grid cells within a grid cell. However, not all grid cells have the same area, so we needed to treat grid cells separately, e.g., we calculated the area of each grid cell when calculating the survey index (i.e., the relative size of a fish population). We calculated the area of each grid cell, using the area by latitude and longitude, and showed them in Appendix Table 4.
In the systematic sampling, every kth item is inspected, and such a sample is called a 1ink systematic sample (Scheaffer et al. 2012). From this perspective, those nested grids selected for the bottom trawl survey can be treated as a 1in9 systematic sample, because every ninth item is inspected (Fig. 2b). For the scheme, the sampling unit, y_{si}, is the survey catch of stock s in number or weight caught by a standard tow in the center nested grid in grid cell i.
Survey index
If the average of the sample units of stock s collected by all survey tows (i.e., \( {\overline{y}}_{si} \)) is multiplied by the total number of possible tows in the population area, then the resultant quantity indicates the relative size of the stock’s population, also called the survey index of stock s (NEFSC 2012): \( {Y}_s=N\cdot {\overline{y}}_s=N\cdot \frac{\sum \limits_i{y}_{si}}{n} \), where N is the total number of possible tows in the population area, which is calculated by dividing the area of grid cell i by the area covered by a standard tow (Smith and Lundy 2006). In our case, it should be the weighted average of the area of grid cell i divided by the area covered by a standard tow:
See Table 1 for notations. However, the reason the survey index indicates the relative size of a fish population is due to unknown catchability or vulnerability, which is different by species, area, and time (e.g., season) (NEFSC 2012). For this reason, the survey index means the “minimum swept area abundance (or biomass)” (NEFSC 2012).
Expected value of the survey index and its uncertainty
Although the bottom trawl survey did not deploy a simple random sampling (SRS) (see the “Background” section), we applied SRS to the survey data as the reference, which is to be compared with results from the systematic sampling (SYS) below. Under SRS and SYS, we show the expected value of the survey index and its uncertainty. When denoting the expected value of a random variable as its hat (^) in this paper (e.g., \( E\left\{{Y}_s\right\}\equiv \widehat{Y_s} \)),
Then, it is easy to derive the variance of \( {Y}_s \) by applying the central limit theorem (CLT) (Casella and Berger 1990) to \( Var\left\{{\overline{y}}_s\right\} \), i.e.,
where Var{y_{s}} is the sample variance of y_{si}s (i.e., Var{y_{s}} = \( \frac{\sum \limits_{i=1}^n{\left({y}_{si}{\overline{y}}_s\right)}^2}{n1} \)) and the term \( \left(1\frac{n}{N}\right) \) is the finite population correction (FPC) (Cochran 1977; Scheaffer et al. 2012). Finally, the variance of the survey index is:
Equations 2 and 4 are the expected value of the survey index and its uncertainty under SRS (Cochran 1977; Scheaffer et al. 2012). The expected value of the survey index under SYS is the same as that under SRS, but its variance under SYS is different from that under SRS. When following Scheaffer et al. (2012) for the variance calculation under SYS, we needed to arrange the sampling units in temporal order (e.g., y_{s1}, y_{s2}, ⋯, y_{sn}), select two sample units y_{si} and y_{s, i + 1}, and then construct d_{si} = y_{s, i + 1} − y_{si}. Under this scheme, it is straightforward that E{d_{si}} = 0 and Var{d_{si}} = Var{y_{s, i + 1} − y_{si}} = 2 Var{y_{s}} with the assumption that the variances of the sampling units are constant across grid cells and the sampling units are independent (i.e., Cov{y_{s, i + 1}, y_{si}} = 0). On the other hand, Var{d_{si}} = \( \frac{\sum \limits_{i=1}^{n1}{\left({d}_{si}0\right)}^2}{\left(n2\right)} \). In Scheaffer et al. (2012), Var{d_{si}} is expressed as the maximum likelihood estimator, i.e., Var{d_{si}} = \( \frac{\sum \limits_{i=1}^{n1}{\left({d}_{si}0\right)}^2}{\left(n1\right)} \); note that the denominator of (n − 1) is the number of d_{si}s (see page 238 in Scheaffer et al. (2012)). However, we used Var{d_{si}} as the unbiased estimator in this paper, where the denominator is (n − 2) by subtracting 1 from the number of d_{si}s (i.e.,. (n − 1 − 1)). Therefore,
We use the mean of the sampling unit, \( {\overline{y}}_s \), to calculate the variance of \( {Y}_s \). By CLT, Var{\( {\overline{y}}_s \)} is:
where the term \( \left(1\frac{n}{N}\right) \) is FPC (Cochran 1977; Scheaffer et al. 2012). To contrast the variance of \( {\overline{y}}_s \) under SYS (Eq. 6) with that under SRS (Eq. 3), we put subscript “SYS” in Eq. 6. Therefore, we can calculate Var{\( {Y}_s \)} under SYS by replacing Var{\( {\overline{y}}_s \)} in Eq. 4 by \( Var{\left\{{\overline{y}}_s\right\}}_{SYS} \) in Eq. 6.
In summary, the expected value of the survey index, \( {Y}_s \), is Eq. 2 regardless of whether assuming SRS or SYS as the sampling design. However, Var{\( {\overline{y}}_s \)} used for calculating Var{\( {Y}_s \)} (Eq. 4) is given by Eq. 3 if we assume SRS, while that is Eq. 6 if assuming SYS.
Results and discussion
The point estimate of the survey index of a fish stock remains the same regardless of applying SRS or SYS, and thus, the major issue lies in the precision of the survey index between SRS and SYS. Overall, the survey index under SYS was more precise than that under SRS. In estimates of the survey index in number (\( {Y}_s \)), those estimates for 10 of the 11 stocks were more precise under SYS than those under SRS during both the spring and fall surveys (Table 2). In estimates of the survey index in biomass (\( {B}_s \)), a similar pattern was found, where those estimates for 9 of the 11 stocks were more precise under SYS than those under SRS during the spring survey while those for 10 stocks were more precise under SYS than those under SRS during the fall survey (Table 3). For example, the standard error of \( {Y}_s \) for stock 10 (black scraper) under SYS was reduced by 28.76% from that under SRS in the spring survey while that for stock 1 (Pacific cod) under SYS decreased by 26.70% from that under SRS in the fall survey (Table 2). In case of the standard error of \( {B}_s \), that for stock 2 (White croaker/Silver jewfish) was 30.65% lower in the spring survey while that for stock 1 (Pacific cod) was 31.13% lower in the fall survey (Table 3).
In the opposite cases where the standard error of the survey index under SYS was larger than that under SRS, those differences in the standard error were negligible and such cases were few: see Table 2 for change (%) of 1.50 and 1.81% in \( SE\left({\widehat{Y}}_s\right) \) for only stock 3 (Pacific herring) during both the spring and fall surveys, and Table 3 for change (%) of 1.50~1.75% in \( SE\left({\widehat{B}}_s\right) \) for stock 3 (Pacific herring) and 4 (Redtile fish) during the spring survey, and change (%) of 1.66% for only stock 3 during the fall survey. In other words, in the few cases that they were observed, the increase was at most 1.75%.
It was not difficult to figure out why such a negligible increase could happen in the standard errors of the survey index of a few stocks when changing the assumption from SRS to SYS. Such a case could happen when one or another of the sampling units (i.e., y_{si}s) is extremely different from the majority, e.g., most of the sampling units of stock 3 collected during the fall survey were less than 1000 but the fourth sampling unit (serial number 32) was 19,292 (Fig. 3a). The resultant differences between successive sampling units (i.e., d_{si}s) included this substantial variability, e.g., note the third and fourth successive sampling units were ±19,292 (see d_{s3} and d_{s4} in Fig. 3b). For the contrast, the usual case is shown in Fig. 3c, d. The sampling units of stock 1 caught during the fall survey ranged from 0 to 210 (Fig. 3c), which were much narrower than Fig. 1a. The range of the resultant d_{si}s became further narrower, at most 206 (i.e., the absolute value of − 206 in Fig. 3d).
Differences in the coefficient of variation (CV) of the survey index of a fish stock between SRS and SYS methods were not interesting because of those in the standard errors between them. However, it is worth noting the wide range of CVs of the survey indices among stocks (Fig. 4). CVs of the survey indices in number ranged from 24.5 to 96.2% under SRS while those ranged from 20.1 to 97.6% under SYS (Fig. 4a, b). Those in biomass ranged 25.1 to 95.5% under SRS while those ranged from 19.4 to 96.9% under SYS (Fig. 4c, d). Under SRS, the survey index of stock 3 (Pacific herring) was most uncertain while that of stock 9 (Chub mackerel) was least uncertain (shaded bars in Fig. 4). Under SYS, the survey indices of stock 3 and stock 9 were also most uncertain and least uncertain, respectively (blank bars in Fig. 4). Such a wide range in CVs of the survey indices implies that a much more sample size than those used in 2014 (n = 67 in the spring survey, and n = 64 in the fall survey) would be needed to reduce such a large uncertainty. If hypothetically setting CV to 40% were considered satisfactory, then CVs of the survey indices of only three stocks (stock 1, 6, and 9) were commonly below 40% even under SYS (blank bars in Fig. 4).
One of the common issues in fish stock assessments lies in whether to express the population size as abundance or biomass. Although our paper focuses on the methodology about the systematic sampling, a practical management issue could be which one between the survey index in number or weight to be used. Especially in the situation that fish ages in survey catches were not determined, the survey index in weight must be preferred to that in number because the latter could include juvenile fish, whose body sizes are too small to be considered to be a spawning stock and to be worthy of a commercial fishery.
To our knowledge, our paper is the first to formally show how to apply the SYS theory to the actual data collected by the Korean bottom trawl surveys. For estimation of the survey index of a fish stock and its uncertainty, Lee and Hyun (2017) applied the method of the stratified random sampling to those data, arbitrarily assuming three strata (east, south, and west), as well as applied the SRS method. The arbitrary assumption was motivated to convince Korean managers that the stratified random sampling outperformed SRS especially in estimating the precision of the survey indices. As we described in the “Background” section, the current practice of the bottom trawl survey started about one decade ago where groundfish were sampled once per grid cell (Fig. 1). Although we think KNIFS should consider changing the status quo to stratified random samplings, this paper confirms that the current SYS outperforms SRS, and it is worth using, until KNIFS eventually figures out appropriate strata into which the population area should be divided.
Conclusions
We applied the SYS and SRS methods to data collected by the bottom trawl surveys during spring and fall in 2014 to infer the survey indices of 11 Korean groundfish stocks. Overall, the survey indices estimated under SYS were more precise than those estimated under SRS. Such results are consistent with the statistical sampling theories with respect to SYS and SRS. The inference of the survey indices of fish stocks is a necessary part of stock assessments, with which the commercial fishery catch data must be integrated for inferring the fish population sizes and other associated parameters (e.g., fishing mortality and overfishing limits). We suggest KNIFS should apply the SYS methods illustrated in this paper to infer the survey indices of Korean groundfish stocks. However, we recommend that KNIFS should eventually change the sampling design to stratified random samplings.
Abbreviations
 CV:

Coefficient of variation
 KNIFS:

Korean National Institute of Fisheries Science
 SRS:

Simple random sampling
 SYS:

Systematic sampling
References
Casella G, Berger RL. Statistical inference. Belmont: Duxbury Press; 1990.
Cochran WG. Sampling techniques. 3rd ed. New York: John Wiley & Sons; 1977.
Hyun SY, Maunder MN, Rothschild BJ. Importance of modelling heteroscedasticity of survey index data in fishery stock assessments. ICES Journal of Marine Science: Journal du Conseil. 2015;72(1):130–6.
Lee H. Analysis of bottomwater trawl survey data from Korean coast areas for inference of the relative sizes of fish populations. Busan: Pukyong National University; 2018.
Lee H, Hyun SY. Application of sampling theories to data from bottom trawl surveys along the Korean coastal areas for inferring the relative size of a fish population. Korean J Fish Aquat Sci. 2017;50(5):594–604.
NEFSC. Assessment or data updates of 13 Northeast groundfish stocks through 2010. Woods Hole, MA: NOAA National Marine Fisheries Service; 2012. p. 789. Ref Doc. 12–06
NWFSC. The 2003 to 2008 U.S. West Coast bottom trawl surveys of groundfish resources off Washington, Oregon, and California: estimates of distribution, abundance, length, and age composition. Seattle: U.S. Dept. of Commerce, National Oceanic and Atmospheric Administration, National Marine Fisheries Service, Northwest Fisheries Science Center; 2011.
Scheaffer RL, Mendenhall III W, Ott RL, Gerow KG. Elementary survey sampling, Seventh edn. Boston: Books/Cole; 2012.
Smith SJ, Gavaris S. Improving the precision of abundance estimates of eastern Scotian Shelf Atlantic Cod from bottom trawl surveys. N Am J Fish Manag. 1993;13(1):35–47.
Smith SJ, Hubley B. Impact of survey design changes on stock assessment advice: sea scallops. ICES Journal of Marine Science: Journal du Conseil. 2014;71(2):320–7.
Smith SJ, Lundy MJ. Improving the precision of designbased scallop drag surveys using adaptive allocation methods. Can J Fish Aquat Sci. 2006;63(7):1639–46.
Acknowledgements
This work was supported with a grant from the Korean National Institute of Fisheries Science (R2018024). We also thank KNIFS for its provision of data used in this paper. Hyotae Lee at KNIFS organized the raw data, and Jin Woo Gim at Pukyong National University (PKNU) drew a map of the Korean coastal areas. Dale Marsden in the World Fisheries University Pilot Program at PKNU reviewed this manuscript.
Funding
This work was supported with a grant from the Korean National Institute of Fisheries Science (R2018024).
Availability of data and materials
Data used in this paper were collected by a bottom trawl survey in the Korean coastal areas, operated by KNIFS. Availability of the data would be determined by the institution.
Author information
Affiliations
Contributions
SYH implemented the statistical sampling theories and wrote the manuscript. YIS provided his idea for methods and reviewed the manuscript. Both authors read and approved the final manuscript.
Corresponding author
Correspondence to SaangYoon Hyun.
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
About this article
Received
Accepted
Published
DOI
Keywords
 Groundfish
 Precision
 Systematic sampling
 Survey index
 Uncertainty