Often data comes to us in a “messy” format. One of the “less glamorous” but very important jobs of anyone working with data is to know how to “clean up” “messy” data. In other words to put the data into a common format.
For example, suppose that employees in a company submit expense reports every month for travel and other business expenses. It would be best if the data entry system (e.g. a website) enforced a standard format for the data. However, this doesn’t always happen. Sometimes the system tries to enforce a specific format but people “misuse” the system leading to inconsistencies between different entries. Sometimes data from different data entry systems are combined into a single data set.
The data below represents expenses from various employees. There are many inconsistencies between different entries. For example, some names are in “first last” format and some names are in “last, first” format. There are many other formatting inconsistencies that are listed below.
employee date amount category comments
1 Sue Smith 1/19/2023 59.99 food lunch with client
2 Schwartz, Joe 01/19/2023 $27.00 office supplies paper for printer
3 mike david harris 2023-01-19 25 Office Stuff <NA>
4 Dr. A. Davis 19/1/2023 five hundred FOOD NA
5 Dr Jones 1/19/23 1,234.56 office suppl. chairs
6 S. Jones Jr 19/1/23 1000 Office supplies desk
7 Conway, Ella Sr. Jan 19, 2023 $35.23 LUNCH ---
8 Pat Williams 2/20/2023 150 travel train ticket
9 Chen, Wei 20/02/2023 $89.50 Travel hotel
10 jennifer ann martinez 2023-02-21 42 FOOD <NA>
11 Ms. B. Kim 21/2/2023 two hundred office supplies stapler
12 J. Thompson 2/21/23 2,500.00 TRAVEL flight
13 R. Patel Ph.D 21/2/23 75 office suppl. notebooks
14 Garcia, Maria Jr. Feb 22, 2023 $120.00 food team dinner
15 Dr. Evans 3/01/2023 95.99 Office Stuff supplies
16 Thompson III 3/05/2023 200 travel conference
17 Brown, Tom 2/15/2023 42.50 food dinner meeting
18 Williams, Kate 02/15/2023 $89.99 office supplies toner cartridge
19 robert james lee 2023/02/15 75 Office Stuff <NA>
20 mike green jr. 2023/02/15 500 OFFICE SUPPLIES <NA>
21 Dr. B. Wilson 15/2/2023 two hundred FOOD NA
22 Dr Smith 2/15/23 2,345.67 office suppl. monitor
23 R. Brown Jr March 2, 2023 750 Office supplies filing cabinet
24 Martinez, Ana Sr. Feb 15, 2023 $48.75 LUNCH ---
25 Johnson,Mary 3/22/2023 63.45 food team lunch
26 Cooper, Sam 03/22/2023 $156.00 office supplies office decoration
27 peter michael wu 2023-03-22 50 Office Stuff <NA>
28 Dr. C. Taylor 2/3/2023 one thousand FOOD NA
29 Drake Wilson 3/14/24 500 Office supplies printer
30 Dr Chen 3/2/23 3,456.78 office suppl. laptop
31 T. Wilson Jr Jun 27 2024 500 Office supplies printer
32 Rodriguez, Eva Sr. Mar 22, 2023 $92.45 LUNCH ---
The following are many of the inconsistencies
Inconsistencies with the name entries
some names are in “first last” format and some names are in “last, first” format.
some names are capitalized, some aren’t
some names have middle include middle names, some don’t
some names include titles, some don’t
some names include a title, e.g. Dr. , while some don’t
some of the titles contain a period after the title, some don’t
some names include a suffix (e.g. Jr), some don’t
Inconsistencies with dates
some dates are in mm/dd/yyyy format (USA style)
some dates are in dd/mm/yyyy format (European style)
some dates are in yyyy-mm-dd format (standard for many computer systems)
some dates use 4 digit years, some use 2 digit years
some dates have a leading zero for single digit days or months
There are additional inconsistencies with the amount, category and comments columns
26 What you need to do
Write a function named cleanUp that takes the dataframe as an argument and returns a cleaned dataframe. Each column should have a consistent format for all columns. Do not lose any data in the process.
Split the employee column into the following fields (and remove the original single employee column):
employeeFullName — a paste of all the name components below (title, first, middle, last, suffix)
employeeTitle — e.g. Dr, Ms (no period)
employeeFirst — first name or initial
employeeMiddle — middle name or initial (NA if none)
employeeLast — last name or initial
employeeSuffix — e.g. Sr, Jr (no period; NA if none)
Leave out periods from all name parts (e.g. J. becomes J, Dr. becomes Dr).
All other columns (date, amount, category, comments) should be cleaned to a consistent structure as before.
click here to see one solution
Given the data above, your output should show the following:
cleanup(expenses)
employeeFullName employeeTitle employeeFirst employeeMiddle employeeLast employeeSuffix date amount category comments
1 Sue Smith <NA> Sue <NA> Smith <NA> 2023-01-19 59.99 food lunch with client
2 Joe Schwartz <NA> Joe <NA> Schwartz <NA> 2023-01-19 27.00 office supplies paper for printer
3 Mike David Harris <NA> Mike David Harris <NA> 2023-01-19 25.00 office supplies <NA>
4 Dr A Davis Dr A <NA> Davis <NA> 2023-01-19 500.00 food <NA>
5 Dr Jones Dr <NA> <NA> Jones <NA> 2023-01-19 1234.56 office supplies chairs
6 S Jones Jr <NA> S <NA> Jones Jr 2023-01-19 1000.00 office supplies desk
7 Ella Conway Sr <NA> Ella <NA> Conway Sr 2023-01-19 35.23 food <NA>
8 Pat Williams <NA> Pat <NA> Williams <NA> 2023-02-20 150.00 travel train ticket
9 Wei Chen <NA> Wei <NA> Chen <NA> 2023-02-20 89.50 travel hotel
10 Jennifer Ann Martinez <NA> Jennifer Ann Martinez <NA> 2023-02-21 42.00 food <NA>
11 Ms B Kim Ms B <NA> Kim <NA> 2023-02-21 200.00 office supplies stapler
12 J Thompson <NA> J <NA> Thompson <NA> 2023-02-21 2500.00 travel flight
13 R Patel <NA> R <NA> Patel <NA> 2023-02-21 75.00 office supplies notebooks
14 Maria Garcia Jr <NA> Maria <NA> Garcia Jr 2023-02-22 120.00 food team dinner
15 Dr Evans Dr <NA> <NA> Evans <NA> 2023-03-01 95.99 office supplies supplies
16 Thompson III <NA> <NA> <NA> Thompson III 2023-03-05 200.00 travel conference
17 Tom Brown <NA> Tom <NA> Brown <NA> 2023-02-15 42.50 food dinner meeting
18 Kate Williams <NA> Kate <NA> Williams <NA> 2023-02-15 89.99 office supplies toner cartridge
19 Robert James Lee <NA> Robert James Lee <NA> 2023-02-15 75.00 office supplies <NA>
20 Mike Green Jr <NA> Mike <NA> Green Jr 2023-02-15 500.00 office supplies <NA>
21 Dr B Wilson Dr B <NA> Wilson <NA> 2023-02-15 200.00 food <NA>
22 Dr Smith Dr <NA> <NA> Smith <NA> 2023-02-15 2345.67 office supplies monitor
23 R Brown Jr <NA> R <NA> Brown Jr 2023-03-02 750.00 office supplies filing cabinet
24 Ana Martinez Sr <NA> Ana <NA> Martinez Sr 2023-02-15 48.75 food <NA>
25 Mary Johnson <NA> Mary <NA> Johnson <NA> 2023-03-22 63.45 food team lunch
26 Sam Cooper <NA> Sam <NA> Cooper <NA> 2023-03-22 156.00 office supplies office decoration
27 Peter Michael Wu <NA> Peter Michael Wu <NA> 2023-03-22 50.00 office supplies <NA>
28 Dr C Taylor Dr C <NA> Taylor <NA> 2023-02-03 1000.00 food <NA>
29 Drake Wilson <NA> Drake <NA> Wilson <NA> 2024-03-14 500.00 office supplies printer
30 Dr Chen Dr <NA> <NA> Chen <NA> 2023-03-02 3456.78 office supplies laptop
31 T Wilson Jr <NA> T <NA> Wilson Jr <NA> 500.00 office supplies printer
32 Eva Rodriguez Sr <NA> Eva <NA> Rodriguez Sr 2023-03-22 92.45 food <NA>
Click here for answer
# =============================================================================# STRING HELPERS# =============================================================================# strip_periods(text)# Arguments: text = character vector (e.g. "Dr.", "J. Smith")# Returns: character vector of same length with periods removed and# leading/trailing spaces trimmed.# Example: strip_periods(" J. ") -> "J"strip_periods <-function(text) {gsub("\\.", "", trimws(text))}# to_sentence_case(text)# Arguments: text = single string (e.g. "mike david", "O'BRIEN")# Returns: string with first letter of each word uppercase, rest lowercase (Sentence case).# Returns NA unchanged if text is NA; empty string returned as-is.# Example: to_sentence_case("mike david harris") -> "Mike David Harris"to_sentence_case <-function(text) {if (is.na(text) ||nchar(trimws(text)) ==0) return(text) words <-strsplit(trimws(text), "\\s+")[[1]] capped <-paste0(toupper(substring(words, 1, 1)), tolower(substring(words, 2)))paste(capped, collapse =" ")}# =============================================================================# NAME PARSING — split into small functions by format# =============================================================================# has_suffix(namePart)# Arguments: namePart = single string, no leading/trailing title (e.g. "Jones Jr")# Returns: TRUE if name ends with a suffix like Jr, Sr, III, II, IV.# Example: has_suffix("Jones Jr") -> TRUE# Suffix pattern: one of Jr/Sr/III/II/IV at end. Require at least one space before suffix# so that "Thompson III" correctly captures "III" (R's regex can otherwise match "II" inside "III").suffix_pattern <-"(Jr\\.?|Sr\\.?|III|II|IV)"suffix_at_end_regex <-paste0("\\s+", suffix_pattern, "\\s*$")has_suffix <-function(namePart) {grepl(suffix_at_end_regex, namePart, ignore.case =TRUE)}# extract_suffix(namePart)# Arguments: namePart = string that may end with a suffix# Returns: list of two: suffix (string or NA), and nameWithSuffixRemoved (string).# Example: extract_suffix("Conway, Ella Sr.") -> list(suffix = "Sr", nameWithSuffixRemoved = "Conway, Ella")# Uses two groups (name)(suffix) so the suffix is captured correctly; space before suffix required.extract_suffix <-function(namePart) {if (!has_suffix(namePart)) {return(list(suffix =NA, nameWithSuffixRemoved =trimws(namePart))) }# Two groups: everything before the suffix, and the suffix. Requiring \s+ before suffix fixes III vs II. nameWithoutSuffix <-sub(paste0("^(.*)\\s+", suffix_pattern, "\\s*$"), "\\1", namePart, ignore.case =TRUE) suffix <-sub(paste0("^.*\\s+", suffix_pattern, "\\s*$"), "\\1", namePart, ignore.case =TRUE)list(suffix =strip_periods(suffix), nameWithSuffixRemoved =trimws(nameWithoutSuffix))}# normalize_suffix(suffix)# Arguments: suffix = string (e.g. "jr", "SR", "iii") or NA# Returns: correctly cased suffix: "Jr", "Sr", "II", "III", "IV", or NA if not recognized.# Example: normalize_suffix("jr") -> "Jr"normalize_suffix <-function(suffix) {if (is.na(suffix) ||trimws(suffix) =="") return(NA) s <-tolower(trimws(suffix))if (s %in%c("jr", "jr.")) return("Jr")if (s %in%c("sr", "sr.")) return("Sr")if (s =="ii") return("II")if (s =="iii") return("III")if (s =="iv") return("IV") suffix}# has_trailing_title(namePart)# Arguments: namePart = string that might end with "Ph.D" or "PhD"# Returns: TRUE if it ends with that.trailing_phd_regex <-paste0("\\s+", # space before Ph.D"Ph\\.?D\\.?", # Ph.D or PhD (periods optional)"\\s*$"# optional spaces at end)has_trailing_title <-function(namePart) {grepl(trailing_phd_regex, namePart, ignore.case =TRUE)}# extract_trailing_phd(namePart)# Returns: list(titlePart = "PhD" or NA, nameWithTitleRemoved = string).extract_trailing_phd <-function(namePart) {if (!has_trailing_title(namePart)) {return(list(titlePart =NA, nameWithTitleRemoved =trimws(namePart))) } tit <-sub(paste0(".*", trailing_phd_regex), "\\1", namePart, ignore.case =TRUE) nameWithout <-sub(trailing_phd_regex, "", namePart, ignore.case =TRUE)list(titlePart =strip_periods(tit), nameWithTitleRemoved =trimws(nameWithout))}# has_leading_title(namePart)# TRUE if name starts with Dr, Mr, Ms, Mrs (with optional period).leading_title_regex <-paste0("^(Dr\\.?|Mr\\.?|Ms\\.?|Mrs\\.?)", # title at start"\\s+"# at least one space after)has_leading_title <-function(namePart) {grepl(leading_title_regex, namePart, ignore.case =TRUE)}# extract_leading_title(namePart)# Returns: list(titlePart = "Dr" or NA, nameWithTitleRemoved = string).extract_leading_title <-function(namePart) {if (!has_leading_title(namePart)) {return(list(titlePart =NA, nameWithTitleRemoved =trimws(namePart))) } tit <-sub(paste0(leading_title_regex, ".*"), "\\1", namePart, ignore.case =TRUE) nameWithout <-sub(leading_title_regex, "", namePart, ignore.case =TRUE)list(titlePart =strip_periods(tit), nameWithTitleRemoved =trimws(nameWithout))}# parse_name_last_comma_first(namePart)# For names like "Last, First" or "Last, First Middle".# Arguments: namePart = string containing exactly one comma# Returns: list(first =, middle =, last =). Periods stripped from each.parse_name_last_comma_first <-function(namePart) { parts <-strsplit(namePart, ",\\s*")[[1]] last <-strip_periods(trimws(parts[1])) firstAndMiddle <-strsplit(trimws(parts[2]), "\\s+")[[1]] first <-strip_periods(firstAndMiddle[1]) middle <-if (length(firstAndMiddle) >=2) strip_periods(firstAndMiddle[2]) elseNAlist(first = first, middle = middle, last = last)}# parse_name_first_last(namePart)# For names like "First Last", "First Middle Last", or just "Last".# Arguments: namePart = string with no comma# Returns: list(first =, middle =, last =).parse_name_first_last <-function(namePart) { words <-strsplit(trimws(namePart), "\\s+")[[1]]if (length(words) ==1) {list(first =NA, middle =NA, last =strip_periods(words[1])) } elseif (length(words) ==2) {list(first =strip_periods(words[1]), middle =NA, last =strip_periods(words[2])) } else { first <-strip_periods(words[1]) middle <-strip_periods(words[2]) last <-paste(strip_periods(words[3:length(words)]), collapse =" ")list(first = first, middle = middle, last = last) }}# parse_one_employee(employeeName)# Arguments: employeeName = one raw employee string (e.g. "Dr. A. Davis", "Schwartz, Joe")# Returns: list with employeeTitle, employeeFirst, employeeMiddle, employeeLast,# employeeSuffix, employeeFullName. All name parts without periods.# Example: parse_one_employee("Conway, Ella Sr.") -> first=Ella, last=Conway, suffix=Sr, title=NAparse_one_employee <-function(employeeName) { raw <-trimws(employeeName) outSuffix <-extract_suffix(raw) raw <- outSuffix$nameWithSuffixRemoved outPhd <-extract_trailing_phd(raw) title <- outPhd$titlePart raw <- outPhd$nameWithTitleRemoved outLead <-extract_leading_title(raw)if (!is.na(outLead$titlePart)) { title <-if (is.na(title) || title =="") outLead$titlePart elsepaste0(outLead$titlePart, " ", title) } raw <- outLead$nameWithTitleRemovedif (is.na(title) || title =="") title <-NAif (length(strsplit(raw, ",\\s*")[[1]]) >=2) { parsed <-parse_name_last_comma_first(raw) } else { parsed <-parse_name_first_last(raw) } first <-to_sentence_case(parsed$first) middle <-to_sentence_case(parsed$middle) last <-to_sentence_case(parsed$last) suffix <-normalize_suffix(outSuffix$suffix) fullParts <-c(title, first, middle, last, suffix) fullParts <- fullParts[!is.na(fullParts) & fullParts !=""] employeeFullName <-paste(fullParts, collapse =" ")list(employeeTitle = title, employeeFirst = first, employeeMiddle = middle,employeeLast = last, employeeSuffix = suffix, employeeFullName = employeeFullName)}# parse_employee_column(employeeNames)# Arguments: employeeNames = character vector of raw employee strings# Returns: data frame with columns employeeFullName, employeeTitle, employeeFirst,# employeeMiddle, employeeLast, employeeSuffix.parse_employee_column <-function(employeeNames) { rows <-lapply(employeeNames, parse_one_employee) out <-as.data.frame(do.call(rbind, lapply(rows, unlist)), stringsAsFactors =FALSE) out <- out[c("employeeFullName", "employeeTitle", "employeeFirst", "employeeMiddle", "employeeLast", "employeeSuffix")] out}# =============================================================================# DATE PARSING# =============================================================================# is_iso_date(dateStr)# Returns TRUE if string is already yyyy-mm-dd (e.g. "2023-01-19").iso_date_regex <-paste0("^[0-9]{4}", # 4 digits at start (year)"-","[0-9]{1,2}", # 1 or 2 digits (month)"-","[0-9]{1,2}$"# 1 or 2 digits at end (day))is_iso_date <-function(dateStr) {grepl(iso_date_regex, trimws(dateStr))}# is_month_name_date(dateStr)# Returns TRUE for strings like "Jan 19, 2023" or "January 19, 2023".month_name_date_regex <-paste0("^[A-Za-z]+", # month name (letters)"\\s+","[0-9]{1,2}", # day",\\s*","[0-9]{2,4}$"# 2- or 4-digit year)is_month_name_date <-function(dateStr) {grepl(month_name_date_regex, trimws(dateStr))}# parse_one_date(dateStr)# Arguments: dateStr = one date string in any of the supported formats# Returns: string "yyyy-mm-dd" or NA if unparseable.# Example: parse_one_date("1/19/2023") -> "2023-01-19"parse_one_date <-function(dateStr) { s <-trimws(dateStr)if (is_iso_date(s)) return(s)if (is_month_name_date(s)) { d <-as.Date(s, format ="%b %d, %Y")if (is.na(d)) d <-as.Date(s, format ="%B %d, %Y")return(format(d, "%Y-%m-%d")) } date_parts <-strsplit(s, "[/-]")[[1]]if (length(date_parts) !=3) return(NA) n1 <-as.integer(date_parts[1]) n2 <-as.integer(date_parts[2]) n3 <-as.integer(date_parts[3]) len1 <-nchar(date_parts[1]) len3 <-nchar(date_parts[3])if (len1 ==4) { y <- n1; m <- n2; d <- n3 } elseif (len3 ==4) { y <- n3if (n1 >12) { d <- n1; m <- n2 } elseif (n2 >12) { m <- n1; d <- n2 } else { m <- n1; d <- n2 } } else { y <-2000+ (n3 -2000) %%100if (n1 >12) { d <- n1; m <- n2 } elseif (n2 >12) { m <- n1; d <- n2 } else { m <- n1; d <- n2 } }sprintf("%04d-%02d-%02d", y, m, d)}# parse_dates(dateStrings)# Arguments: dateStrings = character vector of date strings# Returns: character vector of same length, each element "yyyy-mm-dd" or NA.parse_dates <-function(dateStrings) {sapply(trimws(dateStrings), parse_one_date)}# =============================================================================# AMOUNT, CATEGORY, COMMENTS# =============================================================================# word_to_number(word)# Arguments: word = string with a number in words (e.g. "five hundred", "twenty-one",# "one thousand two hundred thirty-four"). Hyphens and "and" are allowed.# Returns: numeric value, or NA if the string is not a recognized number phrase.# Examples: word_to_number("five hundred") -> 500; word_to_number("twenty-one") -> 21;# word_to_number("one thousand two hundred thirty four") -> 1234.# Supports: zero–nine, ten–nineteen, twenty–ninety, hundred, thousand, million.word_to_number <-function(word) { w <-tolower(trimws(word)) w <-gsub("-", " ", w) w <-gsub("\\band\\b", " ", w) words <-strsplit(w, "\\s+")[[1]] words <- words[words !=""]if (length(words) ==0) return(NA) ones <-c(zero =0, one =1, two =2, three =3, four =4, five =5, six =6, seven =7, eight =8, nine =9) teens <-c(ten =10, eleven =11, twelve =12, thirteen =13, fourteen =14, fifteen =15,sixteen =16, seventeen =17, eighteen =18, nineteen =19) tens <-c(twenty =20, thirty =30, forty =40, fifty =50, sixty =60, seventy =70, eighty =80, ninety =90) total <-0 current <-0 i <-1while (i <=length(words)) { x <- words[i]if (x %in%names(ones)) { current <- current + ones[x] i <- i +1 } elseif (x %in%names(teens)) { current <- current + teens[x] i <- i +1 } elseif (x %in%names(tens)) { current <- current + tens[x] i <- i +1 } elseif (x =="hundred") {if (current ==0) current <-1 current <- current *100 i <- i +1 } elseif (x =="thousand") {if (current ==0) current <-1 total <- total + current *1000 current <-0 i <- i +1 } elseif (x =="million") {if (current ==0) current <-1 total <- total + current *1000000 current <-0 i <- i +1 } else {return(NA) } } total + current}# clean_amounts(amountStrings)# Arguments: amountStrings = character vector (e.g. "$27.00", "1,234.56", "five hundred")# Returns: numeric vector. Removes $ and commas; converts word amounts when possible.clean_amounts <-function(amountStrings) { x <-trimws(amountStrings) result <-numeric(length(x))for (i inseq_along(x)) { s <-gsub("[$,]", "", x[i]) s <-trimws(s)if (grepl("^[0-9.]+$", s)) { result[i] <-as.numeric(s) } else { result[i] <-word_to_number(x[i]) } } result}# clean_categories(categoryStrings)# Arguments: categoryStrings = character vector (e.g. "FOOD", "office suppl.")# Returns: character vector, lowercase, with abbreviations normalized# (e.g. "office suppl." and "Office Stuff" -> "office supplies", "LUNCH" -> "food").office_suppl_regex <-paste0("\\boffice\\s+suppl\\.?\\s*$"# "office suppl" or "office suppl." at end of string)office_stuff_regex <-"\\boffice\\s+stuff\\s*$"lunch_regex <-"\\blunch\\s*$"clean_categories <-function(categoryStrings) { x <-tolower(trimws(categoryStrings)) x <-sub(office_suppl_regex, "office supplies", x) x <-sub(office_stuff_regex, "office supplies", x) x <-sub(lunch_regex, "food", x) x}# clean_comments(commentStrings)# Arguments: commentStrings = character vector; can contain placeholders like "---", "NA"# Returns: character vector; placeholders and empty strings replaced with NA.placeholder_regex <-"^(---|NA|n/a)$"clean_comments <-function(commentStrings) { x <-trimws(as.character(commentStrings)) x[grepl(placeholder_regex, x, ignore.case =TRUE)] <-NA x[x ==""] <-NA x}# =============================================================================# MAIN CLEANUP# =============================================================================# cleanUp(expensesDF)# Arguments: expensesDF = data frame with columns employee, date, amount, category, comments# Returns: data frame with employee replaced by employeeFullName, employeeTitle,# employeeFirst, employeeMiddle, employeeLast, employeeSuffix; and with# date, amount, category, comments cleaned to consistent formats.# Example: cleanUp(expenses) -> wide table with cleaned columnscleanup <-function(expensesDF) { df <-as.data.frame(expensesDF, stringsAsFactors =FALSE) emp <-parse_employee_column(df$employee) df$employee <-NULL df <-cbind(emp, df) df$date <-parse_dates(df$date) df$amount <-clean_amounts(df$amount) df$category <-clean_categories(df$category) df$comments <-clean_comments(df$comments) df}cleanup(expenses)
employeeFullName employeeTitle employeeFirst employeeMiddle employeeLast employeeSuffix date amount category comments
1 Sue Smith <NA> Sue <NA> Smith <NA> 2023-01-19 59.99 food lunch with client
2 Joe Schwartz <NA> Joe <NA> Schwartz <NA> 2023-01-19 27.00 office supplies paper for printer
3 Mike David Harris <NA> Mike David Harris <NA> 2023-01-19 25.00 office supplies <NA>
4 Dr A Davis Dr A <NA> Davis <NA> 2023-01-19 500.00 food <NA>
5 Dr Jones Dr <NA> <NA> Jones <NA> 2023-01-19 1234.56 office supplies chairs
6 S Jones Jr <NA> S <NA> Jones Jr 2023-01-19 1000.00 office supplies desk
7 Ella Conway Sr <NA> Ella <NA> Conway Sr 2023-01-19 35.23 food <NA>
8 Pat Williams <NA> Pat <NA> Williams <NA> 2023-02-20 150.00 travel train ticket
9 Wei Chen <NA> Wei <NA> Chen <NA> 2023-02-20 89.50 travel hotel
10 Jennifer Ann Martinez <NA> Jennifer Ann Martinez <NA> 2023-02-21 42.00 food <NA>
11 Ms B Kim Ms B <NA> Kim <NA> 2023-02-21 200.00 office supplies stapler
12 J Thompson <NA> J <NA> Thompson <NA> 2023-02-21 2500.00 travel flight
13 R Patel <NA> R <NA> Patel <NA> 2023-02-21 75.00 office supplies notebooks
14 Maria Garcia Jr <NA> Maria <NA> Garcia Jr 2023-02-22 120.00 food team dinner
15 Dr Evans Dr <NA> <NA> Evans <NA> 2023-03-01 95.99 office supplies supplies
16 Thompson III <NA> <NA> <NA> Thompson III 2023-03-05 200.00 travel conference
17 Tom Brown <NA> Tom <NA> Brown <NA> 2023-02-15 42.50 food dinner meeting
18 Kate Williams <NA> Kate <NA> Williams <NA> 2023-02-15 89.99 office supplies toner cartridge
19 Robert James Lee <NA> Robert James Lee <NA> 2023-02-15 75.00 office supplies <NA>
20 Mike Green Jr <NA> Mike <NA> Green Jr 2023-02-15 500.00 office supplies <NA>
21 Dr B Wilson Dr B <NA> Wilson <NA> 2023-02-15 200.00 food <NA>
22 Dr Smith Dr <NA> <NA> Smith <NA> 2023-02-15 2345.67 office supplies monitor
23 R Brown Jr <NA> R <NA> Brown Jr 2023-03-02 750.00 office supplies filing cabinet
24 Ana Martinez Sr <NA> Ana <NA> Martinez Sr 2023-02-15 48.75 food <NA>
25 Mary Johnson <NA> Mary <NA> Johnson <NA> 2023-03-22 63.45 food team lunch
26 Sam Cooper <NA> Sam <NA> Cooper <NA> 2023-03-22 156.00 office supplies office decoration
27 Peter Michael Wu <NA> Peter Michael Wu <NA> 2023-03-22 50.00 office supplies <NA>
28 Dr C Taylor Dr C <NA> Taylor <NA> 2023-02-03 1000.00 food <NA>
29 Drake Wilson <NA> Drake <NA> Wilson <NA> 2024-03-14 500.00 office supplies printer
30 Dr Chen Dr <NA> <NA> Chen <NA> 2023-03-02 3456.78 office supplies laptop
31 T Wilson Jr <NA> T <NA> Wilson Jr <NA> 500.00 office supplies printer
32 Eva Rodriguez Sr <NA> Eva <NA> Rodriguez Sr 2023-03-22 92.45 food <NA>