A small package for reading SEER fixed width files.
The main workhorse of this package is seer_read_fwf()
. This function wraps readr::read_fwf()
to import the SEER fixed-width ASCII data files, using the column names and field width definitions in the SEER SAS script.
The data files are available from the SEER Data & Software page, where users must request access prior to downloading. The SAS script is included in the file download, or avilable online. The online version is used by seer_read_fwf()
, but a local version can be specified in the helper function seer_read_col_positions("local_file.sas")
.
Two additional functions are provided to help recode the SEER data. seer_recode()
uses the seer_data_dictionary
data provided in this package to automatically recode all variables with a one-to-one correspondence, for example:
seer_data_dictionary$SEX
#> # A tibble: 2 x 2
#> Code Description
#> * <chr> <chr>
#> 1 1 Male
#> 2 2 Female
The package also includes the function seer_rename_site_specific()
that can be used to replace the site-specific variables with their corresponding labels, formatted appropriately to serve as variable names. As an example, CSSSF variables for Breast cancer would be renamed according to the following table.
Original Variable | New Variable Name |
---|---|
CS1SITE |
estrogen_receptor_er_assay_2004 |
CS2SITE |
progesterone_receptor_pr_assay_2004 |
CS3SITE |
number_of_positive_ipsilateral_level_i_ii_axillary_lymph_nodes_2004 |
CS4SITE |
immunohistochemistry_ihc_of_regional_lymph_nodes_2004 |
CS5SITE |
molecular_mol_studies_of_regional_lymph_nodes_2004 |
CS6SITE |
size_of_tumor_invasive_component_2004 |
CS7SITE |
nottingham_or_bloom_richardson_br_score_grade_2010 |
CS15SITE |
her_2_summary_result_of_testing_2010 |
Thank you to Vincent Major for making available the scripts in SEER_read_fwf, which provided a foundation for this package.