The following table is from How to Use R With Excel by Alyssa Columbus, a start-to-finish guide on replacing Excel with R. You can also download a printable version.
Filter two sheets data sets to include only patients that appear in both.
Create a grouping variable by extracting part of a string from a text field.
Extract the first or last N characters from a string.
Count the number of unique entries in a column and compare to the overall total number of rows.
Format dates: extract the date portion of a date-time and remove outlier or missing dates.
Add several columns together into a Total
column.
I’ve generated fake example data to be used for the above tasks. There are two data sets stored as sheets in the file data-export-example.xlsx.
MRN | Patient ID | Birth Date | Admission Date | ICD10 Code |
---|---|---|---|---|
101914 | PATIENT-00035984 | 11/10/92 12:00:00 AM | 2001-03-27 | C83.50 Lymphoblastic (diffuse) lymphoma, unspecified site |
101914 | PATIENT-00035984 | 11/10/92 12:00:00 AM | 2004-04-22 | C16.0 Malignant neoplasm of cardia |
105705 | PATIENT-00012113 | 9/1/24 12:00:00 AM | 2011-10-05 | C19 Malignant neoplasm of rectosigmoid junction |
105705 | PATIENT-00012113 | 9/1/24 12:00:00 AM | 2012-11-19 | C34.11 Malignant neoplasm of upper lobe, right bronchus or lung |
105705 | PATIENT-00012113 | 9/1/24 12:00:00 AM | 2016-11-22 | C43.8 Malignant melanoma of overlapping sites of skin |
106355 | PATIENT-00057620 | 8/9/10 12:00:00 AM | 2000-04-30 | C06.1 Malignant neoplasm of vestibule of mouth |
MRN | Patient ID | Admission Date | Therapy Date | PX2 | CODE_TYPE | DESCRIPTION | CHEMOCAT |
---|---|---|---|---|---|---|---|
467156 | PATIENT-00037449 | 2000-01-04 | 2000-02-06 | C9423 | HCPC | DACARBAZINE, BRAND NAME, 100 MG | DACARBAZINE |
467156 | PATIENT-00037449 | 2000-01-04 | 2000-02-20 | C9414 | HCPC | ETOPOSIDE, ORAL, BRAND NAME, 50 MG | ETOPOSIDE |
242023 | PATIENT-00070776 | 2000-04-16 | 2000-06-11 | J2675 | HCPC | PROGESTERONE | PROGESTERONE |
242023 | PATIENT-00070776 | 2000-04-16 | 2000-06-25 | C9420 | HCPC | CYCLOPHOSPHAMIDE, BRAND NAME, 100 MG | CYCLOPHOSPHAMIDE |
106355 | PATIENT-00057620 | 2000-04-30 | 2000-05-18 | S0179 | HCPC | MEGESTROL ACETATE, ORAL, 20MG | MEGESTROL |
769022 | PATIENT-00037390 | 2000-07-14 | 2000-08-15 | S0187 | HCPC | TAMOXIFEN CITRATE, ORAL, 10MG | TAMOXIFEN |
data_file <- "data-export_example.xlsx"
download.file(
paste0("https://gerkelab.github.io/core-r-course/",
"materials/session-extra-03/",
"data-export-example.xlsx"),
data_file
)
diagnoses <- readxl::read_xlsx(data_file, sheet = "Diagnoses")
therapies <- readxl::read_xlsx(data_file, sheet = "Therapies")
writexl
writexl is a relatively new package that can be used to write data frames from R into Excel .xlsx
files.
You can write an Excel file from a data frame in the same way you use write_csv()
.
You can also save multiple data frames into a single Excel file as different sheets by using a named list.
esquisse
esquisse is an RStudio addin the lets you build ggplot2 plots using a graphical interface. Learn more at https://github.com/dreamRs/esquisse/.