10:00
2023 Data Analyst Training
The cornerstone of Field to Market’s program lies in the sustainability metrics embedded within the Fieldprint Platform.
These metrics have been developed or adopted by Field to Market through the multi-stakeholder governance process over the past decade.
\(BTU/unit\)
All energy used in the production of one crop in one year, from pre-planting activities to the first point of sale.
Includes energy embodied in the seed, fertilizer, and chemicals applied to the fields. Factors from GREET and other sources.
\(lbs\ CO_2e/unit\)
Total GHG emissions from four main sources: energy use, nitrous oxide emissions from soils, methane emissions (from flooded fields), and residue burning.
Uses Energy Use, DayCent/DNDC metamodel
\(\frac{ton}{ac}/year\)
Soil lost to erosion from water and wind as modeled through NRCS WEPP and WEPS using the crop rotation information.
\(ac\ in/unit\ increase\)
Account for the amount of water used to achieve an increase in crop yield.
\(ac/unit\)
Looks at productivity by accounting for how much land is used to produce a crop.
\((0,100)\)
The Habitat Potential Index measures the capacity of a farm to support habitat for plants and animals. This could look at things like flooded rice fields that support migrating waterfowl, or edge of field areas that allow for wildlife to form habitat.
\((-1,1)\)
Utilizes the Soil Conditioning Index (SCI), a USDA NRCS tool, to analyze whether a field is gaining or losing carbon.
\((0,4)\)
Uses the Stewardship Tool for Environmental Performance from NRCS to provide a detailed assessment of the risk of nutrient loss into local waterways through leaching and runoff, and shows how well practices are at mitigating that risk.
10:00
Qualified Data Management Partner
The Comprehensive Data (CD) output file is a MS Excel file
really-long-name-with-metadata.xlsx
Row = one grower, one field, one crop, one year
Columns = dynamic
Hey, a data dictionary would be nice.
In obtaining your CD file, do you know:
What things to watch out for?
How to identify potential outliers?
How to check for data errors?
fieldprintrStill very early in development, the fieldprintr R package includes a dummy CD data frame called candyland and a wrapper function for importing CD *.xlsx files into R.

# load libraries ----------------------------------------------------------
library(fieldprintr)
library(tidyverse)
library(skimr)
#library(scales)
theme_set(theme_bw())
orange <- "#f15d22"
blue <- "#3f8d95"
gray <- "#F1F1F1"
# read in comprehensive data file =============================================
# alternatively, use the readxl::read_xlsx() and janitor::clean_names() functions
raw_data <- fieldprintr::read_ftm_cd(
"data/12-09-2022-1234_Comprehensive_Data_Candyland_Farms.xlsx")
# alternatively, use the built-in dataset
fieldprintr::candyland# A tibble: 300 × 422
grower_id field_name field_size_ac farm_serial_number tract_number
<chr> <chr> <dbl> <dbl> <dbl>
1 432 10 22.8 NA 14634
2 441 5a & 5b 59.9 NA 12254
3 427 Bubblegum 1 & 2 41.6 NA 5582
4 427 Bubblegum 11 43.4 NA 16327
5 427 Bubblegum 13 20.7 NA 16327
6 427 Bubblegum 17N 14.5 NA 16327
7 427 Bubblegum 17S 11.6 NA 16327
8 427 Bubblegum 19E 19.5 NA 16327
9 427 Bubblegum 19W 41.1 NA 16327
10 427 Bubblegum 23 23.1 NA 16327
# ℹ 290 more rows
# ℹ 417 more variables: field_number <chr>, location <chr>, state <chr>,
# field_geo_json <chr>, crop_year <chr>, crop <chr>, last_modified_on <dttm>,
# metric_version <chr>, adjusted_yield <dbl>, adjusted_yield_units <chr>,
# land_use_score_acre_yield_units <dbl>, total_soil_loss_year <dbl>,
# soil_conservation_score_ton_acre_year <dbl>,
# water_erosion_ton_acre_year <dbl>, wind_erosion_ton_acre_year <dbl>, …
Removed 18 irrigated fields to be analyzed separately.
# helper table for column filters
cols_scores <- read_csv(
"data/CD-column-groupings.csv") |>
# filter to just the fieldprint scores
filter(group == "fieldprint_scores") |>
pull(variable_clean)
c_metrics <-
candyland |>
filter(is_irrigated == "No") |>
select(crop, crop_year,
yield = adjusted_yield,
all_of(cols_scores),
# remove the character columns
-energy_use_score_units, -ghg_score_units,
# remove metrics that are not used (ex irrigation)
-contains("irrigation")) Without visualizations, summary values and statistical print outs can fall short. Our eyes are really useful for making sense of data.
Behold the importance of data visualization with [Datasaurus](https://cran.r-project.org/web/packages/datasauRus/vignettes/Datasaurus.html).
But remember, too, certain data visualization choices can obscure the data.
Weissgerber et al 2015. [Beyond Bar and Line Graphs: Time for a New Data Presentation Paradigm](https://doi.org/10.1371/journal.pbio.1002128)
Boxplots are one way to quickly highlight potential “outliers”. In the geom_boxplot() function it reads:
The upper whisker extends from the hinge to the largest value no further than 1.5 * IQR from the hinge (where IQR is the inter-quartile range, or distance between the first and third quartiles). The lower whisker extends from the hinge to the smallest value at most 1.5 * IQR of the hinge.

Try something a little more advanced such as the 5%, 50% (median), and 95% quantiles.
c_metrics |>
pivot_longer(land_use_score_acre_yield_units:last_col()) |>
ggplot(aes(y = crop, x = value,
fill = factor(after_stat(quantile)))) +
ggridges::stat_density_ridges(
geom = "density_ridges_gradient",
quantile_lines = TRUE,
quantiles = c(0.025, 0.5, 0.975),
scale = 1.2,
rel_min_height = .01,
jittered_points = FALSE, # make TRUE if you want
point_shape = "|") +
scale_color_manual(values = c("#13274FCC", "#A0A0A088")) +
scale_fill_manual(
name = "Probability",
values = c("#13274FCC", "#A0A0A088", "#A0A0A088", "#CE1141CC"),
labels = c("(0, 0.025]", "(0.025, 0.50)", "(0.025, 0.50)", "(0.975, 1]")) +
facet_wrap(vars(name), scales = "free_x", ncol = 2)
Yield is a key variable in many of the metrics. Therefore, let’s step back and see if we can spot any outliers in yield on an annual basis.

The Fieldprint Platform has boundaries implemented for all nutrients for an individual operation. It is possible to enter excessive amounts of fertilizer when operations are aggregated.
Some agronomic knowledge is needed to judge if fertilizer and manure amounts are appropriate for the regions included in a project.
candyland |>
rename(N = total_n_lb_acre_for_all_trips,
P = total_p2o5_lb_acre_for_all_trips,
K = total_k2o_lb_acre_for_all_trips) |>
pivot_longer(cols = c(N, P, K),
names_to = "Nutrient",
values_to = "Rate") |>
ggplot(aes(x = fct_relevel(Nutrient, "N", "P", "K"), y = Rate)) +
geom_boxplot() +
geom_jitter(width = 0.1, alpha = 0.3) +
scale_fill_manual(values = c(gray, blue, orange, "white")) +
facet_wrap(vars(crop), scales = "free_y", ncol = 4) +
labs(x = "Nutrient",
y = "lbs/ac") +
guides(fill = "none")
candyland |>
rename(Herbicide = total_herbicides_for_all_trips,
Insecticide = total_insecticides_for_all_trips,
Fungicide = total_fungicides_for_all_trips,
GrowthRegulator = total_growth_regulators_for_all_trips,
Fumigant = total_fumigant_for_all_trips) |>
pivot_longer(cols = c(Herbicide,
Insecticide,
Fungicide,
GrowthRegulator,
Fumigant),
names_to = "Protectants",
values_to = "applications") |>
ggplot(aes(x = Protectants, y = applications, color = crop)) +
geom_jitter(height = 0.01, width = 0.2, alpha = 0.1) +
scale_color_manual(values = c("black", blue, orange, "#002255")) +
facet_wrap(vars(crop), scales = "free_y", ncol = 2) +
labs(x = "",
y = "Number of Applications") +
guides(fill = "none")
Does herbicide usage seem too high?
Is there high incidence of herbicide-resistant weeds?
Do we need to collaborate with an agronomist/weed specialist?
May be opportunities for improvement if we understand why some producers are applying more or less

Session One