Provides age-adjusted incident rates with respect to a reference population, by default seer_std_ages.

Warning! All variables related to your population cohort must be included in the input data to age_adjust(), otherwise the default assumption will result in the Florida state population being used for the age-adjustment. For example if you are calculating incidence rates for subset of counties, you must include county_name in the input data to age_adjust(), but county_name should not be a grouping variable. See group_drop() for a helper function to remove a grouping variable.

age_adjust(data, count = n, population = NULL,
  population_standard = fcds::seer_std_ages, by_year = "year",
  age = age_group, keep_age = FALSE)

Arguments

data

A data frame, containing counts

count

The unquoted column name containing raw event or outcome counts.

population

Population data specific to the population described by data. By default, uses the county-specific Florida population data provided by SEER (see seer_pop_fl_1990). If data describes years prior to 1990, then seer_pop_fl is used instead.

Note that origin (Spanish/Hispanic Origin) is not avaiable for years prior to 1990 and an error will occur if origin is included in the input data but the data appear to cover years before 1990. You can override this error by explicitly setting population = fcds::seer_pop_fl_1990.

population_standard

The standard age-specific population used to calculate the age-adjusted rates. By default, uses the 2000 U.S. standard population provided by SEER (see seer_std_ages).

by_year

The column or columns by which data and population should be joined, by default "year". The syntax follows from the by argument of dplyr::left_join(), where the name of each entry is the column name in data and the value is the column name in population. If both are the same, a single value is sufficient.

age

The unquoted column name containing the age or age group. The default expects that the column age_group exists in data, population, and population_standard. If the age column used in data does not exist in the population data sets, age_adjust() will fall back to use the columns age_group from the population data but the custom column from data.

keep_age

Age-adjustment by definition summarizes event or outcome counts (incidence) over all included ages. Set keep_age = TRUE to join the source data with all age-specific population without completing the age adjustment calculation. This option is primarily provided for debugging purposes.

year

The unquoted column name containing the year that will be matched to the population data.

Value

A data frame with age-adjusted incidence rates in the column rate. Note that the age column will no longer be included in the output because the age-adjusted rate summarizes the observed incidence across all ages.

If keep_age is TRUE, the age column is retained, but the final rate is not calculated, adding the columns population, std_pop, and w for the specific population, standard population and standardizing population weight, respectively.

NOTE: The output rate and count are relative to the time span of the intput data. If the supplied event count data summarizes multiple years -- for example, the FDCS summarized data spans 5 years -- the resulting rate and count are for the entire period (e.g. 5 years). Convention is to divide rate by the number of years in the period but leave n as the raw event count for the period.

Age-Adjusted Rates

Calculating age-adjusted rates requires three primary inputs:

  1. Raw age-specific count of event or outcome, possibly observed or summarized at repeated, consistent time intervals.

  2. Population data with the same demographic resolution as the age-specific counts, as in, for example, the population for the same geographic region, sex, race, year, and age.

  3. The standard reference population that is used to weight incidence among the observed age-specific count.

Each input is required to contain matching age information. The default data supplied with the package for population (seer_pop_fl) and population_standard (seer_std_ages) use the column name age_group. You can specify the name of the column containing age information with the age argument. If the column name in data is not present in the population data, age_adjust() will fall back to age_group for those data sets.

As described in the SEER*Stat Tutorial: Calculating Age-adjusted Rates: The age-adjusted rate for an age group comprised of the ages x through y is calculated using the following formula:

References

https://seer.cancer.gov/seerstat/tutorials/aarates/definition.html

Examples

# This example is drawn from the SEER*Stat Age-adjusted Rate Tutorial: # https://seer.cancer.gov/seerstat/tutorials/aarates/definition.html d_incidence <- dplyr::tribble( ~age_group, ~n, "0 - 4", 116, "5 - 9", 67, "10 - 14", 71, "15 - 19", 87, "20 - 24", 177, "25 - 29", 290, "30 - 34", 657, "35 - 39", 1072, "40 - 44", 1691, "45 - 49", 2428, "50 - 54", 2931, "55 - 59", 2881, "60 - 64", 2817, "65 - 69", 2817, "70 - 74", 2744, "75 - 79", 2634, "80 - 84", 1884, "85+", 1705 ) %>% dplyr::mutate(year = 2013) %>% standardize_age_groups() d_population <- dplyr::tribble( ~age_group, ~population, "0 - 4", 693068, "5 - 9", 736212, "10 - 14", 770999, "15 - 19", 651390, "20 - 24", 639159, "25 - 29", 676354, "30 - 34", 736557, "35 - 39", 724826, "40 - 44", 700200, "45 - 49", 617437, "50 - 54", 516541, "55 - 59", 361170, "60 - 64", 259440, "65 - 69", 206204, "70 - 74", 172087, "75 - 79", 142958, "80 - 84", 99654, "85+", 92692, ) %>% dplyr::mutate(year = 2013) %>% standardize_age_groups() # Because the example data do not include the year of observation, we set # by_year = NULL so that age_adjust() does not attempt to join # d_incidence with d_population by a year column. age_adjust(d_incidence, population = d_population, by_year = NULL)
#> # A tibble: 1 x 3 #> n population rate #> <dbl> <dbl> <dbl> #> 1 27069 8796948 400.
age_adjust(d_incidence, population = d_population, by_year = NULL, keep_age = TRUE)
#> # A tibble: 18 x 6 #> age_group n population std_pop w rate #> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> #> 1 0 - 4 116 693068 18986520 0.0691 1.16 #> 2 5 - 9 67 736212 19919840 0.0725 0.660 #> 3 10 - 14 71 770999 20056779 0.0730 0.673 #> 4 15 - 19 87 651390 19819518 0.0722 0.964 #> 5 20 - 24 177 639159 18257225 0.0665 1.84 #> 6 25 - 29 290 676354 17722067 0.0645 2.77 #> 7 30 - 34 657 736557 19511370 0.0710 6.34 #> 8 35 - 39 1072 724826 22179956 0.0808 11.9 #> 9 40 - 44 1691 700200 22479229 0.0819 19.8 #> 10 45 - 49 2428 617437 19805793 0.0721 28.4 #> 11 50 - 54 2931 516541 17224359 0.0627 35.6 #> 12 55 - 59 2881 361170 13307234 0.0485 38.7 #> 13 60 - 64 2817 259440 10654272 0.0388 42.1 #> 14 65 - 69 2817 206204 9409940 0.0343 46.8 #> 15 70 - 74 2744 172087 8725574 0.0318 50.7 #> 16 75 - 79 2634 142958 7414559 0.0270 49.7 #> 17 80 - 84 1884 99654 4900234 0.0178 33.7 #> 18 85+ 1705 92692 4259173 0.0155 28.5