Chapter 9 Trip Cost Model Estimates
9.1 Product Overview
A Heckman selection model is used to estimate a composite trip cost variable for all federal commercial trips from 2000-2024.
- Unit: Trip (all federal commercial fishing trips).
- Summary Group(s): N/A.
- Frequency: Annual.
- Time Series: 2000-2024 (Updated for the past full year in Fall-Winter of the current year).
9.1.1 Point of Contact
Samantha Werner (Samantha.Werner@noaa.gov).
9.2 List of Metrics
- CAMSID
- YEAR
- TRIP_COST_2024_DOL
- TRIP_COST_WINSOR_2024_DOL
- OBSERVED_COST_DUMMY
- TRIP_COST_NOMINALDOLS
- TRIP_COST_NOMINALDOLS_WINSOR
9.3 Metric Descriptions
| Variable Name | Description |
|---|---|
| CAMSID | CAMS trip identification number. |
| YEAR | YEAR associated with the CAMSID record within the CAMS_LAND table. |
| TRIP_COST_2024_DOL | Estimated and reported trip costs in 2024 constant U.S. dollars. Composite variable includes: fuel, ice, bait, food/groceries, water, oil, and supplies. |
| TRIP_COST_WINSOR_2024_DOL | Estimated and reported trip costs in 2024 constant U.S. dollars, winsorized at the 1st and 99th percentiles. |
| OBSERVED_COST_DUMMY | Indicator variable of whether or not the trip had costs recorded by an onboard observer (0 = unobserved, 1 = observed). |
| TRIP_COST_NOMINALDOLS | Estimated and reported trip costs in nominal U.S. dollars. |
| TRIP_COST_NOMINALDOLS_WINSOR | Estimated and reported trip costs in nominal U.S. dollars, winsorized at the 1st and 99th percentiles. |
9.4 Additional Methods and Decision Rules
9.4.1 General Information
There are two datasets: 2000-2009 and 2010-2024. The first dataset for the early 2000s is static, while the latter is re-run annually to include new data. These differences exist because of changes in the SBRM stratification between the two time periods.
9.4.2 Trip Costs
- Composition: Trip costs are presented as a composite variable including: fuel, ice, bait, food/groceries, water, oil, and supplies.
- Observed Values: If a trip cost was observed, that actual value is contained in the dataset rather than a prediction.
9.4.3 Dollar Values
Trip costs are presented in both nominal and deflated values adjusted to the most recent year in the time series (e.g., 2024 dollars).
9.4.4 Heckman Modeling Notes
9.4.4.1 Background and Justification
Commercial fishing variable costs are collected on a subset of trips via the Northeast Observer Program (NEFOP) and At-sea Monitoring (ASM). Because these programs prioritize biological data, cost estimates may suffer from selection bias. Heckman Selection models are used to simultaneously test and correct for this bias. Research shows that Ordinary Least Squares (OLS) and Heckman model predictions can differ significantly, particularly at the sub-fleet level.
Here is the updated R Markdown (.Rmd) content. I have incorporated Sections V through VIII from the document, maintaining the level 4 (####) header structure and ensuring all technical details (like the Heckman Selection criteria and Winsorization process) are clearly defined for your report.
9.4.4.2 IV. Database Generation
The project database was compiled using information from two main sources: (i) vessel trip reports derived from CAMS data and (ii) observer data containing realized trip cost information. Estimates were calculated for all commercial (REC=0), federal (PERMIT_STATE_FED=“FEDERAL”) CAMS trips which had a valid date sail and land (RECORD_SAIL and RECORD_LAND) accomplished during 2010-2024.
- Data Merging: Observer records (NEFOP & ASM) containing trip cost information were merged into the CAMS dataset. This was achieved using a hierarchical matching algorithm based on the vessel hull number, gear type, fishing area, and record sailed/landed.
- Fleet Classification: Trips were coded into SBRM fleet types according to a cumulative SBRM definition of the fishing fleet classifications. Gear, region of departure, and mesh size from the CAMS record were used to identify each trip’s respective SBRM fleet type.
- Sector Affiliation: Groundfish sector affiliations (Sector IDs) were obtained from the Moratorium Qualification Review System (MQRS) database and merged using vessel permit numbers and year variables.
- Secondary Sources: Average weekly wages of marina workers were retrieved from the Bureau of Labor Statistics. Daily New York Harbor diesel prices (Ultra-Low-Sulfur No. 2) were sourced from Federal Reserve Economic Data (DDFUELNYH).
- Vessel Characteristics: Vessel characteristics, including age, horsepower, and gross tons, were retrieved from the Permit database.
9.4.4.3 V. Model Types: Gears and Trip Durations
After generating the project database, the primary gear for each trip (NEGEAR) was identified using the gear associated with the largest quantity (lbs.) of species kept on the trip. Trips were then grouped into general gear categories believed to have similar cost functions: Trawl, Gillnet, Dredge, Longline, Seine, Pots and Traps, and Handlines/Handgear/Other Gear.
- Trip Durations: Where possible, gear groups were further split into day trips (≤24 hours) and multiday trips (>24 hours).
- Statistical Validation: Chow tests were employed to assess the existence of structural breaks; results suggest significant differences (\(\alpha=0.01\)) in model coefficients when comparing the two trip durations and various gear-type models.
- Model Consolidation: In cases where gear types had limited observed trips, the day and multiday trips were estimated as a single model with a dummy variable to control for duration.
9.4.4.3.1 i. Models and Variables
The Heckman Selection model depends on an underlying linear regression and a selection equation comprised of variables driving selection bias.
- Model Choice: In cases where the Heckman Selection model fails to reject the null hypothesis (\(\rho=0\)), an OLS model was used to estimate costs.
- Dependent Variable: The dependent variable is the natural log of a composite trip cost variable (fuel, ice, bait, food/groceries, water, oil, and supplies) presented in 2024 constant dollars.
| Independent Variable | Description | Units | Model Type* |
|---|---|---|---|
| Ln Vessel Horse Power | Natural log of vessel engine horsepower. | kW | OLS, H, HS |
| Ln Vessel Gross Tons | Natural log of vessel weight in gross metric tons. | Tons | OLS, H, HS |
| Ln Vessel Age | Natural log of vessel age (built date to 2024). | Years | OLS, H, HS |
| Ln Hours Absent | Natural log of hours absent from port during a trip. | Hours | OLS, H, HS |
| Ln Diesel Price | Natural log of NY Harbor diesel price in 2024 constant dollars. | $/Gallon | OLS, H, HS |
| Average Weekly Wage | Natural log of marina worker wages by state in 2024 constant dollars. | $/Week | OLS, H, HS |
| Seasonal Quarter | Categorical variable for the season of trip commencement (Q1-Q4). | Categorical | OLS, H, HS |
| Calendar Year | Calendar year in which the fishing trip was completed. | Categorical | OLS, H, HS |
| SBRM Fleet Type | Categorizes trips by gear, region, access, and mesh size for stratification. | Categorical | HS |
| Groundfish Sector ID | Categorical variable for sector affiliation, common pool, or neither. | Categorical | HS** |
| Ln Number of Observers | Natural log of onboard observers employed during the trip month. | Numeric | HS |
Table 1 Notes: * O=OLS, H=Heckman Selection (Trip cost regression), HS=Heckman Selection (Selection Equation). **Groundfish Sector IDs are only used in gear models subject to groundfish monitoring.
9.4.4.3.2 ii. Model Dollar Values
All variables were adjusted to 2024 constant US dollars using the GDP implicit price deflator to control for inflation within the model.
9.4.4.3.3 iii. Clustered Standard Errors
Robust standard errors were used to address heteroscedasticity, identified using the Breusch-Pagan/Cook-Weisberg Test, in all Heckman and OLS models.
9.4.4.3.4 iv. Selection Bias and Trip Cost Predictions
The following table outlines the model choice for 2010-2024 predictions based on the Wald Test of Independent Equations.
| Gear Type | Trip Duration | No. Observed Trips | Wald Test (\(P>\chi^2\)) | Model Used |
|---|---|---|---|---|
| Trawl | Day | 15,959 | 0.2991 | OLS |
| Trawl | Multiday | 13,689 | 0.000 | Heckman Selection |
| Gillnet | Day | 15,864 | 0.000 | Heckman Selection |
| Gillnet | Multiday | 2,203 | 0.000 | Heckman Selection |
| Longline | All Trips | 942 | 0.000 | Heckman Selection |
| Dredge | Day | 1,739 | 0.6341 | OLS |
| Dredge | Multiday | 5,040 | 0.000 | Heckman Selection |
| Pots and Traps | All Trips | 487 | 0.00 | Heckman Selection |
| Seine | All Trips | 474 | 0.000 | Heckman Selection |
| Handline/Other | All Trips | 474 | 0.3147 | OLS |
9.4.4.4 VI. Retransformation Process
As the independent variable (trip cost) was log-transformed for modeling purposes, Duan’s smearing estimate (1983) was used as a non-parametric retransformation process post-trip cost prediction.
9.4.4.5 VII. Data Cleaning
Specific rules were applied to handle missing or zero values during the creation of the composite trip cost variable:
- Fuel Price: The average annual fuel price was used for any trip reporting a fuel price \(\le 0\) (~10% of records).
- Fuel Gallons: Records reporting fuel gallons \(\le 0\) were replaced with a predicted value based on hourly trip duration.
- Ice: The mean annual ice price was used if the price was missing but a positive value for ice tons was reported.
- Missing Values: All other missing values were interpreted as a zero cost.
- Duration: If reported CAMS hours were \(\le 0\), the median hourly trip duration value was used for prediction.
9.4.4.6 VIII. Outliers
Outliers were addressed during multiple stages of the modeling and prediction process:
- Sub-category Winsorization: Each of the seven trip cost sub-categories was winsorized individually prior to generating the composite trip cost variable.
- Composite Winsorization: The composite trip cost value was winsorized by gear and trip duration prior to modeling.
- Duration Capping: Trips reporting durations less than 12 minutes or more than 20 hours were removed from the modeling stage.
- Prediction Forms: Costs are presented in both winsorized and raw forms. Winsorization replaces values less than the 1st or greater than the 99th percentile with the 1st and 99th percentile values, respectively.
9.5 Data Sources and code
9.5.1 Data Sources
Internal Observer / ASM Permit CAMS
External
Daily average Diesel prices from FRED ( DDFUELNYH,) National Marina Wage data from https://www.bls.gov/cew/downloadable-data-files.htm
9.5.2 Code
Code located on GitHub at: https://github.com/SWerner2/Trip_costs (permission required) Readme - https://github.com/SWerner2/Trip_costs/blob/main/README.md