Basic Data Manipulations

Backcalculation of Previous Lengths

Francis (1990) defined back-calculation as “… a technique that uses a set of measurements made on a fish at one time to infer its length at an earlier time or times. Specifically, the dimensions of one or more marks in some hard part of the fish, together with its current body length, are used to estimate its length at the time of formation of each of the marks. … The marks are generally annual rings associated with growth checks, … .” Thus, backcalculation is the reconstruction of the length of a fish at previous ages from measurements made on calcified structures.

Required Packages for this Supplement

Functions used in this supplement require the packages shown below.

> library(FSA)
> library(magrittr)
> library(dplyr)
> library(tidyr)
> library(stringr)

Data Used in this Supplement

All analyses in this supplement use the West Bearskin Lake (MN) Smallmouth Bass (Micropterus dolomieu) data from Weisberg (1993) found in SMBassWB.csv (viewdownloadmeta-data). The analysis in this supplement will focus on those fish captured in 1990. Three variables that were constant and not used in this analysis and three measurement variables that corresponded to ages that did not exist after reducing to only fish capted in 1990 were removed to save space.

> SMBassWB <- read.csv("SMBassWB.csv")
> wb90 <- filterD(SMBassWB,yearcap==1990) %>%



Two types of measurements can be made on calcified structures. A radial measurement is the total distance from the center of the structure (e.g., focus of scale or nucleus of otolith) to the edge of an annulus. An incremental measurement is the distance between two successive annuli. Radial measurements are required for back-calculation of fish length.

Back-calculation models estimate length at previous age ii (i.e., LiLi) from known values of length at time of capture (LCLC), scale radius to the iith annulus (SiSi), and scale radius at time of capture (SCSC). Several back-calculation models rely on the relationship between SCSC and LCLC. Depending on the model, a function of mean SCSC for a given LCLC (i.e., E(SC|LC)E(SC|LC) ) or mean LCLC for a given SCSC (i.e., E(LC|SC)E(LC|SC)) is used. These functions are not required to be linear, but often are, and in their linear form are represented as


Fitting these regressions is demonstrated below.


Common Back-Calculation Models

The first back-calculation model was jointly developed by Knut Dahl and Einar Lea and appeared in Lea (1910). The underlying concept of the Dahl-Lea model is that growth of the calcified structure is in exact proportion to growth in length of the fish. With this, the ratio of SiSi to SCSC is the same as the ratio of LiLi to LCLC. Rearrangement of this equality yields the Dahl-Lea back-calculation model


The Dahl-Lea model describes a family of straight lines that pass through the origin and each observed (SC,LCSC,LC) point. Visually (Figure 1), the estimated LiLi for a particular fish is found by locating SiSi along the x-axis, moving vertically until the straight line for that fish is met, and then moving horizontally to the point on the y-axis.

Figure 1: Plot of length-at-capture versus scale radius for West Bearskin Lake Smallmouth Bass in 1990. All four methods of backcalculation are shown for fish 704 (S2=3.49804S2=3.49804LC=218LC=218, and SC=7.44389SC=7.44389; black point and line) with calculational steps shown with the arrows. Fish 701 is shown as the gray point and line for comparative purposes.

Fraser (1916) was the first to describe, but Lee (1920) was the first to formally derive, the back-calculation model from the concept that “the growth increment of the scale is, on the average …, a constant proportion of the growth increment of the fish” (Francis 1990). In practice, the Fraser-Lee model modified the Dahl-Lea model by adjusting for the length of the fish when the calcified structure forms (i.e., L=cL=c when S=0S=0), that is,


where cc comes from the length of the fish at the time of scale formation, the intercept of the length-scale relationship regression (e.g., from Equation 2), or from published “standards” for a species (Carlander 1982). The Fraser-Lee model describes a family of lines with an intercept of cc that pass through the (SC,LCSC,LC) point (Francis (1990); Figure 1).

The scale proportional hypothesis (SPH) model was named by Francis (1990), but was first recognized by Whitney and Carlander (1956) when they said “{i}f the scale was 10 per cent larger when the fish was caught than the average scale for that size of fish, [then] the scale would be 10 per cent larger than normal throughout the life.” If “average” and “normal” are considered to be expected values, then this hypothesis can be written as


If it is assumed that the scale-length relationship is linear, then the two expected values in these ratios are computed by plugging LiLi and LCLC, respectively, into the scale-length relationship (i.e., Equation 1) to produce


which can be solved for LiLi to yield the general SPH back-calculation model


The linear SPH model produces a family of lines that all have an intercept of ab−ab and pass through each observed (SC,LCSC,LC) point (Figure 1). The SPH model is the same as the Fraser-Lee model except that the intercept from Equation 2 is replaced with ab−ab. Further note that the SPH model is the same as the Dahl-Lea model if a=0a=0.

The body proportional hypothesis (BPH) model was also named by Francis (1990) and was also first recognized by Whitney and Carlander (1956) when they said “{i}f a fish at time of capture were 10 per cent smaller than the average fish with that size of scale, [then] the fish would be 10 per cent smaller than the expected length for the size of that scale throughout life.” This hypothesis can be written as


If the length-scale relationship is linear then the expected values can be found by plugging SiSi and ScSc into Equation 2 to get


which can be solved for LiLi to yield the general BPH back-calculation model


The linear BPH model produces a family of lines that have an intercept of cLCc+dSCcLCc+dSC and pass through each observed (SC,LCSC,LC) point (Figure 1). In contrast to the other back-calculation models, the BPH model uses lines with a different intercept for each fish. The linear BPH model is the same as the Dahl-Lea model if c=0c=0.

Vigliola and Meekan (2009) described 18 other models for the back-calculation of fish length. Functions for each of these models can be created with bcFuns() from FSA.


Data Organization & Manipulation

In wide or one-fish-per-line format, all information about a single fish, including all of the measurements from the calcified structure, is in one row of the data.frame. The wb90 data.frame contains radial measurments in the wide format (note that the portion shown below has four rows with columns that wrapped).

> headtail(wb90,n=2)
    yearcap fish agecap lencap    anu1    anu2    anu3    anu4    anu5    anu6
1      1990  482      1     75 1.51076      NA      NA      NA      NA      NA
2      1990  768      1     75 1.57989      NA      NA      NA      NA      NA
180    1990  388      9    300 1.08462 2.03527 3.22724 4.63407 5.53355 6.53174
181    1990  389      9    329 1.05913 2.18769 3.55137 4.40766 5.78634 7.58178
       anu7    anu8     anu9   radcap
1        NA      NA       NA  1.51076
2        NA      NA       NA  1.57989
180 7.27807 8.08080  9.38096  9.38096
181 8.32094 9.46362 10.43491 10.43491

For the back-calculation of fish length, these data must be radial, and not incremental, measurements. If the wb90 data.frame had contained incremental measurements, then it could be converted to radial measurements with gConvert() from FSA. The gConvert() function requires the data frame with the incremental measurements as the first argument, the prefix (in quotes) for the columns that contain the incremental measurements in in.pre=, and the type of measurement to convert to in out.type= (the options are "rad" (the default) or "inc"). For example, the code below would create a new data.frame from wb90 with radial measurements (IF wb90 had incremental measurements).

> wb90A <- gConvert(wb90,in.pre="anu",out.type="rad")

For efficient back-calculation, the data must also be converted to long or one-measurement-per-line format. As demonstrated in Introductory Fisheries Analyses with R, wide data may be converted to long data with gather() from tidyr. As a reminder, the arguments to gather() are the wide data.frame, a name for the new variable in the long format that will identify the individual (i.e., which radial measurement), a name for the new variable in the long format that will be the value for the individual (i.e., the radial measurement), and the variables in the wide format that contain the measurements.

> wb90r <- gather(wb90,agei,radi,anu1:anu9) %>%
> headtail(wb90r)
     yearcap fish agecap lencap  radcap agei    radi
1       1990    0      7    278 9.06803 anu1 1.50631
2       1990    0      7    278 9.06803 anu2 3.11450
3       1990    0      7    278 9.06803 anu3 4.51154
1627    1990  998      7    298 8.54805 anu7 8.54805
1628    1990  998      7    298 8.54805 anu8      NA
1629    1990  998      7    298 8.54805 anu9      NA

As noted in the IFAR book, there are three potential problems with this result. First, the new agei variable contains the names of the variables from the original wide format data.frame (e.g., anu1anu2) rather than numbers that correspond to the age that the annulus was formed. Converting these labels to numbers begins by replacing the “anu” prefix with blanks (or an empty string) using str_sub() with the vector of names as the first argument, start=1 (because “anu” is a prefix) and end=3 (because “anu” is three characters long). The result from str_sub(), however, is a character that must then be converted to a numeric with as.numeric().

The second problem is that several of the radial measurements contain NA values. The non-NA values are found and retained by using ! within filterD().

The third problem, while not an issue with these particular data, is that “plus growth” may have been recorded. “Plus growth” is growth on the margin of the calcified structure that does not represent a complete year of growth. If “plus growth” is present, then the new agei variable will have a value greater than the age-at-capture value. These instances should be removed from the new long format data.frame.

The following code adjusts for these three issues.

> str_sub(wb90r$agei,start=1,end=3) <- ""
> wb90r %<>% mutate(agei=as.numeric(agei)) %>%
    filterD(! %>%
> headtail(wb90r)
    yearcap fish agecap lencap  radcap agei    radi
1      1990    0      7    278 9.06803    1 1.50631
2      1990    0      7    278 9.06803    2 3.11450
3      1990    0      7    278 9.06803    3 4.51154
765    1990  998      7    298 8.54805    5 5.17646
766    1990  998      7    298 8.54805    6 6.62240
767    1990  998      7    298 8.54805    7 8.54805


Computing Back-Calculated Lengths

Scale-Length Relationships

The scale-length (Equation 1) and length-scale (Equation 2) relationships required for all but the Dahl-Lea method are computed with the wide format data. Thus, the wide format data.frame must contain the length of the fish (e.g., lencap) and the total radius of the calcified structure (e.g., radcap) at the time of capture. Both linear relationships are fit with lm() and the coefficients should be extracted with coef() and saved into objects.

> <- lm(radcap~lencap,data=wb90)
> ( a <- coef([[1]] )
[1] -1.304391
> ( b <- coef([[2]] )
[1] 0.03537477
> <- lm(lencap~radcap,data=wb90)
> ( c <- coef([[1]] )
[1] 41.65166
> ( d <- coef([[2]] )
[1] 27.35597

Applying the Back-Calculation Models

The LiLi estimated with a back-calculation model are most easily added to the long format data.frame. This is largely an exercise in adding a variable to the data.frame with mutate() from dplyr. For example, the LiLi computed with all four back-calculation models are added to wb90r below.

> wb90r %<>% mutate(DL.len=(radi/radcap)*lencap,
> headtail(wb90r,n=2)
    yearcap fish agecap lencap  radcap agei    radi    DL.len    FL.len
1      1990    0      7    278 9.06803    1 1.50631  46.17918  80.91199
2      1990    0      7    278 9.06803    2 3.11450  95.48171 122.82772
766    1990  998      7    298 8.54805    6 6.62240 230.86847 240.25149
767    1990  998      7    298 8.54805    7 8.54805 298.00000 298.00000
      SPH.len   BPH.len
1    76.92752  79.50736
2   119.69064 121.72181
766 239.17509 241.01809
767 298.00000 298.00000

For example, the mean length-at-age may be computed from the back-calculated lengths (shown below for the Fraser-Lee results).

> tmp <- wb90r %>%
    group_by(agei) %>%
    summarize(n=validn(FL.len),mn=mean(FL.len),sd=sd(FL.len)) %>%
> tmp
  agei   n       mn        sd
1    1 181  78.5663  6.472692
2    2 178 114.1527 10.453632
3    3 155 146.7669 13.898434
4    4  71 172.6512 15.339848
5    5  64 201.0405 17.479717
6    6  64 235.3834 23.350527
7    7  50 268.5969 25.286958
8    8   2 283.2237 26.912010
9    9   2 314.5000 20.506097

Additionally, the mean length at each back-calculated age computed separately for each age-at-capture may be computed with sumTable() from FSA, where the left side of the formula is the quantitative variable to be summarized and the right side has grouping variables presented in row*column format.

> sumTable(FL.len~agecap*agei,data=wb90r,digits=1)
     1     2     3     4     5     6     7     8     9
1 73.7    NA    NA    NA    NA    NA    NA    NA    NA
2 79.7 113.3    NA    NA    NA    NA    NA    NA    NA
3 77.3 112.8 148.9    NA    NA    NA    NA    NA    NA
4 71.3 121.8 160.9 194.0    NA    NA    NA    NA    NA
6 79.8 107.9 136.0 169.6 198.8 229.7    NA    NA    NA
7 81.6 118.2 144.7 170.7 201.8 237.0 269.1    NA    NA
9 71.2  99.8 135.0 166.1 197.5 236.0 256.4 283.2 314.5


Reproducibility Information

  • Compiled Date: Thu Nov 05 2015
  • Compiled Time: 6:48:50 PM
  • R Version: R version 3.2.2 (2015-08-14)
  • System: Windows, i386-w64-mingw32/i386 (32-bit)
  • Base Packages: base, datasets, graphics, grDevices, methods, stats, utils
  • Required Packages: FSA, magrittr, dplyr, tidyr, stringr, captioner, knitr and their dependencies (assertthat, car, DBI, digest, evaluate, formatR, gdata, gplots, graphics, grDevices, highr, Hmisc, lazyeval, markdown, methods, plotrix, plyr, R6, Rcpp, sciplot, stats, stringi, tools, utils, yaml)
  • Other Packages: captioner_2.2.3, dplyr_0.4.3, extrafont_0.17, FSA_0.8.4, FSAdata_0.3.2, knitr_1.11, magrittr_1.5, rmarkdown_0.8.1, stringr_1.0.0, tidyr_0.3.1
  • Loaded-Only Packages: assertthat_0.1, DBI_0.3.1, digest_0.6.8, evaluate_0.8, extrafontdb_1.0, formatR_1.2.1, gdata_2.17.0, gtools_3.5.0, highr_0.5.1, htmltools_0.2.6, lazyeval_0.1.10, parallel_3.2.2, plyr_1.8.3, R6_2.1.1, Rcpp_0.12.1, Rttf2pt1_1.3.3, stringi_1.0-1, tools_3.2.2, yaml_2.1.13
  • Links: Script / RMarkdown



Carlander, K. D. 1982. Standard intercepts for calculating lengths from scale measurements for some centrarchid and percid fishes. Transactions of the American Fisheries Society 111:332–336.

Francis, R. I. C. C. 1990. Back-calculation of fish length: A critical review. Journal of Fish Biology 36:883–902.

Fraser, C. M. 1916. Growth of the spring salmon. Transactions of the Pacific Fisheries Society 1915:29–39.

Lea, E. 1910. On the methods used in the Herring-investigations. Publ. Circonst. Cons. perm. int. Explor. Mer 108(1):14–22.

Lee, R. M. 1920. A review of the methods of age and growth determination in fishes by means of scales. Fisheries Investigations, London Series 2 4((2)):1–32.

Vigliola, L., and M. G. Meekan. 2009. The back-calculation of fish growth from otoliths. Pages 174–211 in B. S. Green, B. D. Mapstone, G. Carlos, and G. A. Begg, editors. Tropical Fish Otoliths: Information for Assessment, Management, and Ecology. Springer.

Weisberg, S. 1993. Using hard-part increment data to estimate age and environmental effects. Canadian Journal of Fisheries and Aquatic Sciences 50(6):1229–1237.

Whitney, R. R., and K. D. Carlander. 1956. Interpretation of body-scale regression for computing body length of fish. Journal of Wildlife Management 20:21–27.