Compare condition data

This vignette compares fish condition data produced with the NEesp2::species_condition function to the condition data in the ecodata package. The NEesp2::species_condition function was built based on Laurel’s condition code. This analysis validates whether the function can be used to automate the workflow for the 2026 SOE.

Create data with `NEesp2::species_condition`

I’m creating condition data with two different survdat pulls, one from 2024 and one from mid-2025. I think the 2024 pull is the same one used to create the current ecodata condition data; it is on github here. The 2025 pull is a more recent pull that includes data through 2024.

# using a recent survdat pull:
all_condition <- readRDS(here::here("data-raw/condition.rds"))$survdat |>
  NEesp2::species_condition(
    LWparams = NEesp2::LWparams,
    species.codes = NEesp2::species.codes,
    output = "soe"
  )

## Joining with `by = join_by(STRATUM)`
## Joining with `by = join_by(count)`
## Removing 16811 outliers from the data set.

# using an older survdat pull:
load(here::here("data-raw/Survdat_2024.Rdata"))
all_condition_2024 <- survdat |>
  NEesp2::species_condition(
    LWparams = NEesp2::LWparams,
    species.codes = NEesp2::species.codes,
    output = "soe"
  )

## Joining with `by = join_by(STRATUM)`
## Joining with `by = join_by(count)`
## Removing 16239 outliers from the data set.

Different numbers of outliers were removed from each dataset. That could be because the more recent survdat pull has some additional data points that are being treated as outliers, or something else is going on.

Compare data

A series of plots to compare both condition calculations to what’s in ecodata.

Acadian redfish condition comparison

American plaice condition comparison

Atlantic cod condition comparison

Atlantic croaker condition comparison

Atlantic halibut condition comparison

Atlantic herring condition comparison

Atlantic mackerel condition comparison

Atlantic menhaden condition comparison

Atlantic wolffish condition comparison

Barndoor skate condition comparison

Black sea bass condition comparison

Blackbelly rosefish condition comparison

Bluefish condition comparison

Bluntnose stingray condition comparison

Bullnose ray condition comparison

Butterfish condition comparison

Clearnose skate condition comparison

Cownose ray condition comparison

Cusk condition comparison

Fourspot condition comparison

Goosefish condition comparison

Haddock condition comparison

Little skate condition comparison

Longhorn sculpin condition comparison

Ocean pout condition comparison

Offshore hake condition comparison

Pollock condition comparison

Red hake condition comparison

Rosette skate condition comparison

Roughtail stingray condition comparison

Scup condition comparison

Sea raven condition comparison

Silver hake condition comparison

Smooth dogfish condition comparison

Smooth skate condition comparison

Spiny butterfly ray condition comparison

Spiny dogfish condition comparison

Spot condition comparison

Spotted hake condition comparison

Summer flounder condition comparison

Thorny skate condition comparison

Weakfish condition comparison

White hake condition comparison

Windowpane condition comparison

Winter flounder condition comparison

Winter skate condition comparison

Witch flounder condition comparison

Yellowtail flounder condition comparison

Assess deviations

There are some small differences between the newly-calculated data and ecodata. These next analyses look at squared deviations from the ecodata values, to see if there is a systematic bias in any of the data.

full_data <- ecodata::condition |>
  dplyr::rename(ecodata_value = Value) |>
  dplyr::full_join(
    all_condition |>
      dplyr::rename(new_value = Value)
  ) |>
  dplyr::full_join(
    all_condition_2024 |>
      dplyr::rename(new_value_2024 = Value))

## Joining with `by = join_by(Time, Var, EPU, Units)`
## Joining with `by = join_by(Time, Var, EPU, Units)`

dev_data <- full_data |>
  dplyr::mutate(dev_squared = (ecodata_value - new_value)^2,
                dev_squared_2024 = (ecodata_value - new_value_2024)^2)

By EPU

EPUs across all years

EPUs summed by year

By species

Species across all years

Species summed by year

By year

What’s going on with mackerel?

Mackerel has a notable difference in the Scotian Shelf value in 2015. First, recalculate the data:

survdat_2024 <- survdat |>
  dplyr::filter(SVSPP == 121)

survdat_2025 <- readRDS(here::here("data-raw/condition.rds"))$survdat  |>
  dplyr::filter(SVSPP == 121)

cond_comparison <- survdat_2024 |>
  NEesp2::species_condition(
    LWparams = NEesp2::LWparams,
    species.codes = NEesp2::species.codes,
    output = "soe"
  ) |>
  dplyr::mutate(source = "survdat_2024") |>
  dplyr::bind_rows(
    survdat_2025 |>
      NEesp2::species_condition(
        LWparams = NEesp2::LWparams,
        species.codes = NEesp2::species.codes,
        output = "soe"
      ) |>
      dplyr::mutate(source = "survdat_2025")
  ) |>
  dplyr::bind_rows(ecodata::condition |>
                     dplyr::filter(Var == "Atlantic mackerel") |>
                     dplyr::mutate(source = "ecodata"))

## Joining with `by = join_by(STRATUM)`
## Joining with `by = join_by(count)`
## Removing 359 outliers from the data set.
## Joining with `by = join_by(STRATUM)`
## Joining with `by = join_by(count)`
## Removing 400 outliers from the data set.

Then plot:

cond_comparison |> 
  ggplot2::ggplot(ggplot2::aes(x = Time,
                               y = Value,
                               color = source)) +
  ggplot2::geom_line()+
  ggplot2::geom_point() +
  ggplot2::theme_bw() +
  ggplot2::facet_wrap(~EPU)

Scotian Shelf in 2015

2024 survdat pull

Take a look at what’s in the 2024 dataset.

mackerel_2024 <- survdat_2024 |>
      dplyr::left_join(NEesp2::strata_epu_key) |>
  dplyr::filter(EPU == "SS",
                YEAR == 2015,
                SEASON == "FALL") |>
  dplyr::select(CRUISE6, STRATUM, LENGTH, LAT, LON, EST_TOWDATE, INDID, INDWT, SEX)

knitr::kable(mackerel_2024)

CRUISE6	STRATUM	LENGTH	LAT	LON	EST_TOWDATE	INDID	INDWT	SEX
201504	1340	28	43.33608	-66.67322	2015-10-17 15:35:26	152374	0.195	2
201504	1340	30	43.33608	-66.67322	2015-10-17 15:35:26	152338	0.270	1
201504	1340	31	43.33608	-66.67322	2015-10-17 15:35:26	152357	0.315	2
201504	1340	32	43.33608	-66.67322	2015-10-17 15:35:26	152247	0.293	1
201504	1340	32	43.33608	-66.67322	2015-10-17 15:35:26	152344	0.320	1
201504	1340	33	43.83454	-66.73564	2015-10-18 01:38:50	153828	0.336	2
201504	1351	17	44.27302	-66.70087	2015-10-18 08:31:51	154973	0.051	2
201504	1351	18	44.27302	-66.70087	2015-10-18 08:31:51	154967	0.060	2
201504	1340	34	44.03344	-66.94544	2015-10-18 14:47:12	156565	0.455	2
201504	1340	31	43.88735	-67.20750	2015-10-18 17:30:51	157004	0.335	2
201504	1340	33	43.88735	-67.20750	2015-10-18 17:30:51	157001	0.410	1
201504	1340	33	43.88735	-67.20750	2015-10-18 17:30:51	156998	0.365	1

2025 survdat pull

Take a look at what’s in the 2025 dataset.

mackerel_2025 <- survdat_2025 |>
      dplyr::left_join(NEesp2::strata_epu_key) |>
  dplyr::filter(EPU == "SS",
                YEAR == 2015,
                SEASON == "FALL") |>
  dplyr::select(CRUISE6, STRATUM, LENGTH, LAT, LON, EST_TOWDATE, INDID, INDWT, SEX)

knitr::kable(mackerel_2025)

CRUISE6	STRATUM	LENGTH	LAT	LON	EST_TOWDATE	INDID	INDWT	SEX
201504	1340	33	43.83454	-66.73564	2015-10-18 02:38:50	153828	0.336	2
201504	1351	17	44.27302	-66.70087	2015-10-18 09:31:51	154973	0.051	2
201504	1351	18	44.27302	-66.70087	2015-10-18 09:31:51	154967	0.060	2
201504	1340	34	44.03344	-66.94544	2015-10-18 15:47:12	156565	0.455	2
201504	1340	31	43.88735	-67.20750	2015-10-18 18:30:51	157004	0.335	2
201504	1340	33	43.88735	-67.20750	2015-10-18 18:30:51	157001	0.410	1
201504	1340	33	43.88735	-67.20750	2015-10-18 18:30:51	156998	0.365	1

Looks like the October 17, 2015 tow is missing from the 2025 pull.

Double check the input data:

test_survdat <- readRDS(here::here("data-raw/condition.rds"))$survdat  |>
  dplyr::filter(stringr::str_detect(EST_TOWDATE, "2015-10-17"))

test_survdat$SVSPP |> unique() |> sort()

##  [1]  15  22  27  28  32  35  72  73  74  75  76  77  84 101 102 106 107 131 155
## [20] 156 163 193 197 301 312 502

Mackerel isn’t in there (SVSPP 121).

What’s going on with witch flounder?

Witch flounder has a notable difference in the Mid-Atlantic Bight value in 2015. First, recalculate the data:

survdat_2024 <- survdat |>
  dplyr::filter(SVSPP == 107)

survdat_2025 <- readRDS(here::here("data-raw/condition.rds"))$survdat  |>
  dplyr::filter(SVSPP == 107)

cond_2024 <- survdat_2024 |>
  NEesp2::species_condition(
    LWparams = NEesp2::LWparams,
    species.codes = NEesp2::species.codes,
    record_outliers = TRUE,
    output = "soe"
  )

## Joining with `by = join_by(STRATUM)`
## Joining with `by = join_by(count)`
## Removing 390 outliers from the data set.

cond_2025 <- survdat_2025 |>
      NEesp2::species_condition(
        LWparams = NEesp2::LWparams,
        species.codes = NEesp2::species.codes,
    record_outliers = TRUE,
        output = "soe"
      )

## Joining with `by = join_by(STRATUM)`
## Joining with `by = join_by(count)`
## Removing 424 outliers from the data set.

cond_comparison <- cond_2024$condition |>
  dplyr::mutate(source = "survdat_2024") |> 
  dplyr::bind_rows(cond_2025$condition|>
      dplyr::mutate(source = "survdat_2025")) |>
  dplyr::bind_rows(ecodata::condition |>
                     dplyr::filter(Var == "Witch flounder") |>
                     dplyr::mutate(source = "ecodata"))

Plot the data:

cond_comparison |>
  ggplot2::ggplot(ggplot2::aes(x = Time,
                               y = Value,
                               color = source)) +
  ggplot2::geom_line()+
  ggplot2::geom_point() +
  ggplot2::theme_bw() +
  ggplot2::facet_wrap(~EPU)

Mid-Atlantic Bight in 2015

2024 survdat pull

witch_2024 <- survdat_2024 |>
      dplyr::left_join(NEesp2::strata_epu_key) |>
  dplyr::filter(EPU == "MAB",
                YEAR == 2015,
                SEASON == "FALL") |>
  dplyr::select(CRUISE6, STRATUM, LENGTH, LAT, LON, EST_TOWDATE, INDID, INDWT, SEX)

knitr::kable(witch_2024)

CRUISE6	STRATUM	LENGTH	LAT	LON	EST_TOWDATE	INDID	INDWT	SEX
201504	1720	22	38.47240	-73.50522	2015-09-02 18:17:17	2537	0.046	1
201504	1020	30	39.91528	-73.08380	2015-09-13 01:45:02	59094	0.152	1
201504	1070	29	40.21977	-71.00322	2015-09-16 22:04:58	76958	0.162	1
201504	1060	34	40.49986	-70.79864	2015-09-17 02:44:47	78183	0.237	1

2025 survdat pull

witch_2025 <- survdat_2025 |>
      dplyr::left_join(NEesp2::strata_epu_key) |>
  dplyr::filter(EPU == "MAB",
                YEAR == 2015,
                SEASON == "FALL") |>
  dplyr::select(CRUISE6, STRATUM, LENGTH, LAT, LON, EST_TOWDATE, INDID, INDWT, SEX)

knitr::kable(witch_2025)

CRUISE6	STRATUM	LENGTH	LAT	LON	EST_TOWDATE	INDID	INDWT	SEX
201504	1720	22	38.47240	-73.50522	2015-09-02 19:17:17	2537	0.046	1
201504	1020	30	39.91528	-73.08380	2015-09-13 02:45:02	59094	0.152	1
201504	1070	29	40.21977	-71.00322	2015-09-16 23:04:58	76958	0.162	1
201504	1060	34	40.49986	-70.79864	2015-09-17 03:44:47	78183	0.237	1

The two datasets are the same. Is the discrepancy caused by outliers being removed?

Outlier analysis

Outliers in the 2024 data:

# cond_comparison |>
#   dplyr::filter(Time == 2015) |>
#   dplyr::arrange(EPU)

cond_2024$outliers |>
  dplyr::filter(YEAR == 2015)  |>
  dplyr::select(CRUISE6, STRATUM, LENGTH, LAT, LON, EST_TOWDATE, INDID, INDWT, SEX) |>
      dplyr::left_join(NEesp2::strata_epu_key) |>
  dplyr::filter(EPU == "MAB") |>
  dplyr::arrange(INDWT) |>
  nrow()

## [1] 0

There are no outliers in the 2024 data set.

Outliers in the 2025 data:

cond_2025$outliers |>
  dplyr::filter(YEAR == 2015)  |>
  dplyr::select(CRUISE6, STRATUM, LENGTH, LAT, LON, EST_TOWDATE, INDID, INDWT, SEX) |>
        dplyr::left_join(NEesp2::strata_epu_key) |>
  dplyr::filter(EPU == "MAB") |>
   dplyr::arrange(INDWT) |>
  knitr::kable()

CRUISE6	STRATUM	LENGTH	LAT	LON	EST_TOWDATE	INDID	INDWT	SEX	EPU
201504	1720	22	38.4724	-73.50522	2015-09-02 19:17:17	2537	0.046	1	MAB

The 2025 data set has one outlier. This data point is considered an outlier in the 2025 data, but not in the 2024 data, which causes the difference in the 2015 condition value in the MAB.

Take a look at the outlier:

plt <- survdat_2024 |>
  dplyr::mutate(source = "2024") |>
  dplyr::bind_rows(
    survdat_2025 |>
      dplyr::mutate(source = "2025")
  ) |>
  dplyr::filter(SEASON == "FALL",
                !is.na(LENGTH),
                !is.na(INDWT)) |>
  ggplot2::ggplot(ggplot2::aes(x = LENGTH,
                               y = INDWT,
                               color = source,
                               shape = source)) +
  ggplot2::geom_point(cex = 2,
                      alpha = 0.5) +
  ggplot2::theme_bw()

plt +
  ggplot2::geom_point(ggplot2::aes(x = LENGTH,
                               y = INDWT),
                      data = witch_2025[1,],
                      inherit.aes = FALSE,
                      color = "black")

The black point is the outlier, as defined by having a relative condition < 2SD below the species’ mean relative condition over all years. I don’t think it should be treated as an outlier.

Summary

Some discrepancies are caused by differences in the underlying survdat data. These should be validated against the current data in the NEFSC database. Other discrepancies are caused by differences in outlier removal. Since outliers are defined based on standard deviations, the addition of more data changes the outlier cutoff. We would need to update the outlier removal method to be more consistent over time if we want to avoid this issue in the future.

Create data with NEesp2::species_condition

Compare data

Acadian redfish condition comparison

American plaice condition comparison

Atlantic cod condition comparison

Atlantic croaker condition comparison

Atlantic halibut condition comparison

Atlantic herring condition comparison

Atlantic mackerel condition comparison

Atlantic menhaden condition comparison

Atlantic wolffish condition comparison

Barndoor skate condition comparison

Black sea bass condition comparison

Blackbelly rosefish condition comparison

Bluefish condition comparison

Bluntnose stingray condition comparison

Bullnose ray condition comparison

Butterfish condition comparison

Clearnose skate condition comparison

Cownose ray condition comparison

Cusk condition comparison

Fourspot condition comparison

Goosefish condition comparison

Haddock condition comparison

Little skate condition comparison

Longhorn sculpin condition comparison

Ocean pout condition comparison

Offshore hake condition comparison

Pollock condition comparison

Red hake condition comparison

Rosette skate condition comparison

Roughtail stingray condition comparison

Scup condition comparison

Sea raven condition comparison

Silver hake condition comparison

Smooth dogfish condition comparison

Smooth skate condition comparison

Spiny butterfly ray condition comparison

Spiny dogfish condition comparison

Spot condition comparison

Spotted hake condition comparison

Summer flounder condition comparison

Thorny skate condition comparison

Weakfish condition comparison

White hake condition comparison

Windowpane condition comparison

Winter flounder condition comparison

Winter skate condition comparison

Witch flounder condition comparison

Yellowtail flounder condition comparison

Assess deviations

By EPU

EPUs across all years

EPUs summed by year

By species

Species across all years

Species summed by year

By year

What’s going on with mackerel?

Scotian Shelf in 2015

2024 survdat pull

2025 survdat pull

What’s going on with witch flounder?

Mid-Atlantic Bight in 2015

2024 survdat pull

2025 survdat pull

Outlier analysis

Summary

Create data with `NEesp2::species_condition`