12  12. Working with files

12.1 Download files to work with.

Use this link to download a set of files in a ZIP file for use with these lessons.

12.2 A “directory” means the same thing as a “folder”

The terms “directory” and “folder” mean exactly the same thing - they’re just two different words for places where we store files on a computer. The word “directory” came first, used in early computer systems in the 1960s. It comes from the word’s original meaning of an organized listing or index, like a phone directory. In the 1980s, when computers started using visual interfaces with icons, Apple introduced the word “folder” because it was more intuitive for regular users - it referenced something familiar that people used to organize paper documents. Today, both terms are used interchangeably, with “folder” being more common in everyday use and “directory” often preferred in technical contexts.

12.3 Obvious stuff … folders and files

As I’m sure you know

  • files on a computer are stored in folders.

  • Those folders can be in other (higher level) folders, etc.

12.4 The “root” folder, i.e. /

Most folders are contained in other “higher level” folders. The “top” level folder on a computer is known as the “root” folder and is referred to as just “/”. The root folder (i.e. /) is not contained in any other folder.

12.4.1 full path

A full path of a file (or folder) specifies the EXACT location of the file (or folder) on a computer, i.e. exactly which folder contains the file (or folder).

(see below for an exmaple)

A full path always starts with a slash, i.e. /

It consists of a list of folder names, starting from the “root” folder, i.e. / and continuing with folder names separated by slashes. The final entry in the path is the folder or filename being identified.

Example:

For example,

  • The “full path” /Users/yrosenth/myFiles/personal/work/endOfMonthReport.txt identifies the following file

    • the file is named “endOfMonthReport.txt” (the last part of the path)

    • the file is in a folder named “work” (the 2nd to last part of the path).

  • continuing on backwards through the path …

    • the work folder is a folder named “personal”

    • which is in a folder named “myFiles”

    • which is in a folder named “yrosenth”

    • which is in a folder named “Users”.

    • The folder “Users” is in the “root” folder (i.e. the “highest” level folder - see above) on the computer.

12.4.2 two different files with the same names

Just like two people named Sarah Johnson can live at different addresses (one in Chicago, one in New York), you can have two files with the same name in different directories (i.e. folders).

For example, the following are the “full paths” for two different files with the same name, “budget.xlsx”. The full path helps you tell the two files apart by showing exactly where each file is located.

  • Full path 1: /Users/Joe/work/budget.xlsx

    • This file is named “budget.xlsx”.
    • It is in the “work” folder,
    • which is in the folder named “Joe”
    • which is in the folder name “Users”
    • which is in the “root” (i.e. /) folder (see above)
  • Full path 2: /Users/Sue/personal/budget.xlsx

    • This file is also named “budget.xlsx”.
    • It is in the “personal” folder
    • which is in the folder named “Sue”
    • which is in the “Users” folder
    • which is in the “root” (i.e. /) folder (see above)

12.5 “relative paths” vs “full paths”

Another way to describe which folder a file is in is to use what’s known as a “relative path”. We will describe “relative paths” in detail later. For now, we will stick to use “full paths” as described above. However, one thing should be made absolutely clear now:

Difference between a “full path” and a “relative path”

A “full path” always starts with a “/”.

IMPORTANT: MAKE SURE THAT YOU KNOW THIS …

Later, we will study another type of file path called a relative path. A realtive path is simiilar to a “full path” but a relative path does NOT start with a “/”. We’ll learn more about relative paths later and how they differ from full paths.

12.5.1 Mac and Linux style paths VS Windows style paths

Mac and Linux systems use forward slashes (/) as path separators - i.e.  see the paths above, eg. /home/Joe/work/budget.xlsx

Windows traditionally uses backslashes (\) as separators, like this:

\\Users\\Joe\\work\\budget.xlsx

Note that R understands the Linux/Mac style paths (with forward slashes) even on Windows. It is recommended therefore to always use Linux/Mac style paths in R code so that the code will work on both Linux/Mac and on Windows.

However, since backslashes have a special meaning in R if you need to use Windows-style paths in R, you must use double backslashes (\), e.g.

dir_tree("\\Users\\Joe\\work")

12.5.2 Windows drive letters

Windows systems use letters (like C:, D:) at the start of paths to represent different drives or storage devices. The C: drive is typically your main hard drive where Windows is installed. Other letters might be additional hard drives, USB drives, or network drives.

For example:

C:\Users\username\Documents\file.txt (your main hard drive)
D:\Backups\file.txt (maybe an external drive)
E:\Photos\vacation.jpg (could be a USB drive)

Mac and Linux systems don’t use drive letters. Instead, all hard drives are “mounted” as folders within the single root directory (/). For example, a USB drive might appear at /media/username/USB_DRIVE/. When working in R on Windows, you may need to include the drive letter at the start of full paths, like: “C:/Users/username/Documents/file.txt” (using forward slashes) or "C:\Users\username\Documents\file.txt" (using double backslashes)

12.6 dir( SOME_FOLDER ) # same as list.files( SOME_FOLDER )

The dir() function displays the names of all files and folders in a particular directory (remember - “directory” and “folder” mean the same thing).

dir() does exactly the same thing as list.files() - they’re just two names for the same function. Some people prefer dir() because it’s shorter to type. See ?dir

For example the following shows the contents of a folder located at the specified full path.

dir("/Users/yrosenth/Downloads/myFiles/personal")
[1] "school"           "shakepereBio.txt" "work"            

The following shows the contents of the school and work folders that are located inside the personal folder shown above.

dir("/Users/yrosenth/Downloads/myFiles/personal/school")
[1] "datamgmt" "english"  "finance" 
dir("/Users/yrosenth/Downloads/myFiles/personal/work")
[1] "bigProject"       "endOfMonthReport"

12.7 The “working directory”, getwd(), setwd(), dir()

R maintains a “working directory” - i.e. a default directory - that is used when you don’t otherwise specify a path. For example, if you call dir() without a path, the listing shows the files in the “working directory” (see the example below).

The getwd() function returns the full path of the “working directory”.

getwd()
[1] "C:/Users/yrosenth/Dropbox/_yrQuarto-master/DataManagementUsingRSrc"

The setwd( SOME_PATH ) - allows you to set the “working directory” to a path of your choice.

# set the working directory
setwd("/Users/yrosenth/Downloads/myFiles")

# show that it worked
getwd()
[1] "C:/Users/yrosenth/Downloads/myFiles"
# Using dir() without a specified directory returns the contents
# of the working directory. 
dir()
[1] "bashShellScripts" "mixOfFiles"       "personal"        
# Since the working directory is currently 
#
#    /Users/yrosenth/Downloads/myFiles
#
# the above call to 
#
#    dir() 
# 
# without a path is the same as the following call to
#
#    dir("/Users/yrosenth/Downloads/myFiles")

dir("/Users/yrosenth/Downloads/myFiles")
[1] "bashShellScripts" "mixOfFiles"       "personal"        

12.7.1 Specifying a pattern to dir

Later we will learn more about “regular expressions”.

12.7.2 Summary - different ways of using dir function:

# list everything in the "working directory"
dir()  
  [1] "_book"                                                                
  [2] "_book-save2"                                                          
  [3] "_book.save"                                                           
  [4] "_book.zip"                                                            
  [5] "_quarto-save.yml"                                                     
  [6] "_quarto.yml"                                                          
  [7] "_quarto.yml.bak"                                                      
  [8] "00000-InstallingRPackages-v002_files.save"                            
  [9] "00000-tmpMermaid.qmd"                                                 
 [10] "0000011-WhatIsACli-v010.qmd"                                          
 [11] "0000011-WhatIsACli-v010_files-save"                                   
 [12] "0000013-HowToRunTheBashShellCli-v011.qmd"                             
 [13] "0000013-introToBash-image01.png"                                      
 [14] "0000013-introToBash-image02.png"                                      
 [15] "0000013-introToBash-image03.png"                                      
 [16] "0000013-introToBash-image04.png"                                      
 [17] "0000013-introToBash-image05.png"                                      
 [18] "0000013-IntroToTheBashCLI-v009.qmd"                                   
 [19] "0000017-introToQuarto-v001.html"                                      
 [20] "0000017-introToQuarto-v001.qmd"                                       
 [21] "0000022-InstallingRPackages-v003.html"                                
 [22] "0000022-InstallingRPackages-v003.qmd"                                 
 [23] "0000022-InstallingRPackages-v003_files-save"                          
 [24] "0000025-introToBash-v007.ipynb"                                       
 [25] "0000025-introToBash-v007.qmd"                                         
 [26] "0000025-introToBash-v007_files-save"                                  
 [27] "0000031-IntroToJson-v001.qmd"                                         
 [28] "0000031-IntroToJson-v001_files"                                       
 [29] "0000033-IntroToCli-v001.qmd"                                          
 [30] "0000051-filenames-v001.qmd"                                           
 [31] "0000055-workingWithFiles.qmd"                                         
 [32] "0000055-workingWithFiles.rmarkdown"                                   
 [33] "0000075-whatIsJsonData.qmd"                                           
 [34] "0000075-whatIsJsonData.quarto_ipynb"                                  
 [35] "0000080-editingAndValidatingJsonFiles.qmd"                            
 [36] "0000085-jsonDataTypes.qmd"                                            
 [37] "0000088-nestedDataframes.qmd"                                         
 [38] "0000088-nestedDataframes_files"                                       
 [39] "0000090-usingJsonInR.qmd"                                             
 [40] "0000100-topics-v001.qmd"                                              
 [41] "000013-IntroToTheBashShellCLI-v008.docx"                              
 [42] "000013-IntroToTheBashShellCLI-v008.qmd"                               
 [43] "000013-IntroToTheBashShellCLI-v009.qmd"                               
 [44] "000013-IntroToTheBashShellCLI-v009_files-save"                        
 [45] "0000200-bibliography-v001.qmd"                                        
 [46] "000025-introToCliCommands-v003.zip"                                   
 [47] "000025-introToCliCommands-v004"                                       
 [48] "000025-introToCliCommands-v004.zip"                                   
 [49] "000025-introToCliCommands-v005.qmd"                                   
 [50] "000025-introToCliCommands-v006.ipynb"                                 
 [51] "000025-introToCliCommands-v006.qmd"                                   
 [52] "0000300-whereToFindDatasets-v001.qmd"                                 
 [53] "0000300-whereTogetDatasets-v001.qmd"                                  
 [54] "0000310-whereToFindApis-v001.qmd"                                     
 [55] "000130-WebscrapingJavascriptSitesWithRselenium-v002.docx"             
 [56] "000200-IntroToJSONDataFormat-v008.docx"                               
 [57] "000260-PracticeWithRegulaExpressions-v003.qmd"                        
 [58] "000260-PracticeWithRegulaExpressions-v003_files-save"                 
 [59] "000420-practiceWithWebScraping-v001.docx"                             
 [60] "0100100-stringr-tutorials-v001.ipynb"                                 
 [61] "0100100-stringr-tutorials-v001.qmd"                                   
 [62] "0100100-stringr-tutorials-v002.ipynb"                                 
 [63] "0200200-a-introToRegularExpressions-v036.qmd"                         
 [64] "0200200-b-introToRegularExpressions-v036.qmd"                         
 [65] "0200200-introToRegularExpressions-v034.qmd"                           
 [66] "0200200-introToRegularExpressions-v036.qmd"                           
 [67] "0200200-rstudioFolding-v033.qmd"                                      
 [68] "0300100-excelAndR-v001.qmd"                                           
 [69] "0400050-tibbles-v001.html"                                            
 [70] "0400050-tibbles-v001.qmd"                                             
 [71] "0400050-whatIsTidyverse-v001.html"                                    
 [72] "0400050-whatIsTidyverse-v001.qmd"                                     
 [73] "0400100-usingPipes-v001.html"                                         
 [74] "0400100-usingPipes-v001.qmd"                                          
 [75] "0400200-stringr-tutorials-v003.html"                                  
 [76] "0400200-stringr-tutorials-v003.qmd"                                   
 [77] "0500100-a-tdyvrsDplyr-v005.qmd"                                       
 [78] "0500100-b-tdyvrsDplyr-v005.qmd"                                       
 [79] "0500100-tdyvrsDplyr-v0001.qmd"                                        
 [80] "0500100-tdyvrsDplyr-v002.qmd"                                         
 [81] "0500100-tdyvrsDplyr-v003.html"                                        
 [82] "0500100-tdyvrsDplyr-v003.qmd"                                         
 [83] "0500100-tdyvrsDplyr-v003_files-save"                                  
 [84] "0500100-tdyvrsDplyr-v004.html"                                        
 [85] "0500100-tdyvrsDplyr-v004.qmd"                                         
 [86] "0500100-tdyvrsDplyr-v004_files-save"                                  
 [87] "0500400-sqlSelectIntro-v003.qmd"                                      
 [88] "050050-tdyvrsGgplot2-v0001.qmd"                                       
 [89] "050050-tdyvrsGgplot2-v0001_files-save"                                
 [90] "0600100-introToDb-v001.qmd"                                           
 [91] "0990000-part-Questions-v001.qmd"                                      
 [92] "0990000-partQuestions-v001_files.save"                                
 [93] "0990100-cleaningData-v001.qmd"                                        
 [94] "0990100-cleaningData-v001_files.save"                                 
 [95] "0999000-appendices-v001.qmd"                                          
 [96] "0999000-toDo-v001.qmd"                                                
 [97] "83spr23"                                                              
 [98] "83spr23-datamgmt-000600-WebScraping"                                  
 [99] "ai-000100-overview.html"                                              
[100] "ai-000100-overview.qmd"                                               
[101] "aiCoding00100-overview.html"                                          
[102] "aiCoding00100-overview.qmd"                                           
[103] "assignment-api-0000100-weatherDataApi-v001.qmd"                       
[104] "assignment-api-0000100-weatherDataApi-v001.quarto_ipynb"              
[105] "booksDatabase-AnswersWithDplyr-v001.R"                                
[106] "booksDatabaseDescription-withQuestions-ANSWERS-v021.docx"             
[107] "booksDatabaseDescription-withQuestions-ANSWERS-v022.docx"             
[108] "booksDatabaseDescription-withQuestions-ANSWERS-v023.docx"             
[109] "cellsize.xlsx"                                                        
[110] "claude-json-parsing-exercises.md"                                     
[111] "copy_quartoYml_to_bookFolder-v001.R"                                  
[112] "cover.png"                                                            
[113] "createData-famousQuotesDf-v001.R"                                     
[114] "createData-vecAddresses-v001.R"                                       
[115] "createData-vecFruit-v001.R"                                           
[116] "cssSelectorExamples-v001.html"                                        
[117] "data"                                                                 
[118] "data.csv"                                                             
[119] "data.lnk"                                                             
[120] "data2.csv"                                                            
[121] "data3.csv"                                                            
[122] "DataManagementUsingR.Rproj"                                           
[123] "datamgmt"                                                             
[124] "files"                                                                
[125] "forecastDfRows-v002.png"                                              
[126] "forecastDfRows.png"                                                   
[127] "grades.csv"                                                           
[128] "htmlExamples"                                                         
[129] "htmlExamples-v001.zip"                                                
[130] "httpdocs-web.config"                                                  
[131] "hw1-v002.R"                                                           
[132] "hw1-v003-separateTitles.R"                                            
[133] "images"                                                               
[134] "index.html"                                                           
[135] "index.qmd"                                                            
[136] "index_files.save"                                                     
[137] "intro.qmd"                                                            
[138] "intro00100-operatingSystems.html"                                     
[139] "intro00100-operatingSystems.qmd"                                      
[140] "intro00120-files.html"                                                
[141] "intro00120-files.qmd"                                                 
[142] "intro00130-chromeExtensions.html"                                     
[143] "intro00130-chromeExtensions.qmd"                                      
[144] "javascriptExample-v002.html"                                          
[145] "myf"                                                                  
[146] "myFiles"                                                              
[147] "myFiles.zip"                                                          
[148] "NYTimes-APIs-v001.R"                                                  
[149] "oldfiles"                                                             
[150] "otherSqlWebsites-dataMgmt-spring2024-v001.docx"                       
[151] "otherSqlWebsites-ids1020-spring2024-v001.docx"                        
[152] "penguin-plot.png"                                                     
[153] "references.bib"                                                       
[154] "references.qmd"                                                       
[155] "salespeople-v002.csv"                                                 
[156] "site_libs"                                                            
[157] "site_libs.save"                                                       
[158] "sql0006-introToDatabases-v023.qmd"                                    
[159] "sql0010-introToSql-v023.qmd"                                          
[160] "sql0014-theBooksDatabase-v001.qmd"                                    
[161] "sql0017-RDataFiles-v001.qmd"                                          
[162] "sql0030-sqlFunctions_aggregateFunctions_groupBy_having-v019.qmd"      
[163] "sql0035-workingWithMultipleTables-v012.qmd"                           
[164] "sql0040-crossJoin-leftJoin-subquery-selfJoins-v023.qmd"               
[165] "sql0050-commonTableExpressions-v001.qmd"                              
[166] "sql0200-otherSqlCommands-v001.R"                                      
[167] "sql0200-otherSqlCommands-v002.qmd"                                    
[168] "sql0900-sqlWindowFunctions-v001.R"                                    
[169] "sql0900-sqlWindowFunctions-v002.qmd"                                  
[170] "sql0950-workingWithDatabasesInR-v009.R"                               
[171] "sql0950-workingWithDatabasesInR-v010.qmd"                             
[172] "sqlbolt_tables-v007-xxxx.RData"                                       
[173] "sqlbolt_tables-v007.RData"                                            
[174] "stuffToDo-withCss-v002.html"                                          
[175] "stuffToDo.html"                                                       
[176] "summary.qmd"                                                          
[177] "The jsonlite Package A Practical and Consistent Mapping-1403.2805.pdf"
[178] "tmp"                                                                  
[179] "tmp2"                                                                 
[180] "tmp3"                                                                 
[181] "tmptmp"                                                               
[182] "Use the following websites to practice CSS selectors.docx"            
[183] "web-000130-WebscrapingJavascriptSites-v003.qmd"                       
[184] "web-000300-internetArchitecture-v001.ipynb"                           
[185] "web-000300-internetArchitecture-v001.qmd"                             
[186] "web-000400-webscrapingContinued-011.qmd"                              
[187] "web-000400-webscrapingContinued-015.qmd"                              
[188] "web-000400-webscrapingContinued-015_files-save"                       
[189] "web-000420-practiceWithWebScraping-v001-ANSWERS-v002.qmd"             
[190] "web-000420-practiceWithWebScraping-v001.qmd"                          
[191] "web-current-changed.config"                                           
[192] "web-current.config"                                                   
[193] "web-RSelenium-ScrapingJavascriptEnabledSites-v003.qmd"                
[194] "web.config"                                                           
[195] "web.config-save-old"                                                  
[196] "web.config-suggestionFromChatGPT.config"                              
[197] "yellowUmbrella.png"                                                   
[198] "yrBuildQuarto-v001.sh"                                                
[199] "yrChapterNumber-tmp.sh"                                               
[200] "yrChapterNumber-v001.sh"                                              
[201] "yrChapterNumber-v002.sh"                                              
[202] "yrChapterNumber-v003.sh"                                              
[203] "yrChapterNumber-v004.sh"                                              
[204] "yrNocacheHeaders-v001.html"                                           
[205] "yrReadme-DatamanagementUsingRSrc.md"                                  
[206] "yrRenumberChapters-v001.sh"                                           
[207] "yrRenumberChapters-v002.sh"                                           
[208] "yrRenumberChapters-v003.sh"                                           
[209] "yrRenumberChapters-v004.sh"                                           
[210] "yrstuff-v001.js"                                                      
[211] "yrStyles-main-v001.css"                                               
# list everything in the specified directory
dir("/Users/yrosenth/Downloads/myFiles/work")  
character(0)
# list only files and folders containing .xlsx in the name
dir("/Users/yrosenth/Downloads/myFiles/work", pattern = ".xlsx")  
character(0)

12.7.3 Other R functions also use the “working directory”

There are many R functions that work with files or folders. dir() is just one exmple. Some other examples are read.csv(), write.csv(), readLines(), writeLines(), read.fwf(), etc as well as most of the functions in the “fs” package.

Most (if not all) of these functions allow you to leave out the path information when you call the function. If you don’t specify the path then R uses the “working directory” as the path.

12.8 Finding a full path to a specific file

You can do this with the built in Windows or Mac tools or by using R functions.

12.8.1 Using “File Explorer” (win) or “Finder” (mac)

To find the full path for a file, navigate to it using the “Finder” app on Mac or the “File Explorer” app on Windows. Each of these tools have a way to show you the full path to the file. These features keep changing with different releases of each operating system. Therefore you should search online for how to find the full path of a file using your specific operating system version. The following are some videos that might work for your operating system. If they don’t please search online for specific instructions for your operating system.

12.8.2 Using the file.choose() or choose.files() R functions

See the R documentation for ?file.choose and ?choose.files.

These commands open up a window that allows you to choose a file (or files). When you finally select the file (or files) the full path is returned to R.

IMPORTANT* - The window that opens in response to calling file.choose() or choose.files() can sometimes open up “in back of” RStudio or other programs. This seems to be a bug. If you don’t see the window for choosing the files try minimizing the RStudio window (and possibly other windows) and see if there is a file chooser window in back of RStudio (or other programs).

12.8.3 Finding a full path to a “Special folder”

To find the full path for a file, navigate to it using the “Finder” app on Mac or the “File Explorer” app on Windows. Each of these tools have a way to show you the full path to the file.

12.9 functions to read and write files in “Base R”

R has numerous functions that allow reading files and writing to files (i.e. creating or modifying files). The following are built into R “out of the box” (i.e. the “base R”).

For help with these functions see the R documentation (i.e. ?functionName)

12.9.1 readLines, writeLines

These functions read and write lines of a file into/out-of a character vector.

12.9.2 read.csv, write.csv

read.csv: reads a .csv file into R as a data.frame

write.csv: write a dataframe to a csv file

12.9.3 read.fwf

read a “fixed width file” into a data.frame. A “fixed width file” is different from a CSV file. A CSV file separates the data with commas. A “fixed width file” places each column of data in a “fixed width”. See the example below. Each column of info is typed into a specific number of spaces. If a data value is shorter than the width of the field for it’s column then the rest of the info in the column is filled in with spaces. Note that you need to use a monospace font in your editor to view this type of information in the way it was intended.

Example: notice how each piece of data is padded with spaces if it doesn’t take up the entire space for the column.

student        year       test1  test2   
joe            freshman   99     100
samantha       senior     100    97
bartholemeu    sophomore  77     65
ike            senior     82     81

12.10 fs package (a non-core part of “the tidyverse”)

The fs package provides modern file system operations in R. It makes working with files and folders more consistent across operating systems (Windows, Mac, Linux) and safer than base R functions. Think of it as a toolkit for handling files - copying, moving, creating directories, checking paths, and viewing directory structures. All functions start with clear prefixes like file_ or dir_ which makes them easy to remember and use.

Install the “fs” package with: install.packages(“fs”)

12.10.1 dir_ls() function

This works similarly to the dir() funciton in Base R. However, dir_ls() allows you to get much more information about each file, e.g.

  • size of the file
  • when the file was created
  • when the file was last modified
  • other info

See the R documentation (i.e. ?fs::dir_ls after you’ve already installed the fs package with install.packages(“fs”).

12.10.2 dir_tree

dir_tree() from the fs package displays a directory structure in a visual tree format, similar to how you might see folders indented in a file explorer. It shows the hierarchy of files and folders using lines and indentation. For example, if you run:

fs::dir_tree("my_project")

It might show something like:

my_project
├── data
│   ├── raw.csv
│   └── clean.csv
├── scripts
│   ├── analysis.R
│   └── cleaning.R
└── README.md

This makes it easy to visualize how your files and folders are organized. The vertical and horizontal lines show how everything connects, and each level of folders is indented further to the right.

12.10.3 Other functions in the fs package

The fs package provides modern, user-friendly functions for working with files. Here are some commonly used ones:

fs::dir_ls() - Lists directory contents (similar to dir() and list.files())

fs::dir_create() - Creates directories, can create multiple levels at once

fs::file_copy() - Copies files from one location to another

fs::file_move() - Moves files from one location to another

fs::file_delete() - Deletes files

fs::file_exists() - Checks if files exist

12.11 readLines() and writeLines() to read/write lines of text files

These are built into Base R. see the R documentation

12.12 read_csv and write_csv

read_csv() and write_csv() from the readr package are the standard functions for working with CSV files in modern R: read_csv() loads a CSV file into R as a data frame:

data <- read_csv("myfile.csv")

write_csv() saves a data frame to a CSV file:

write_csv(data, "output.csv")

These functions are faster and have better defaults than the older read.csv() and write.csv() functions from base R. They automatically handle common issues like string encoding and they don’t add row numbers to your output files.

You’ll need the readr package installed and loaded to use them: library(readr) (though readr is automatically loaded if you use the tidyverse).

12.13 Read/Write Excel files

To work with Excel files in R, you’ll need either the readxl package (for reading) or the writexl package (for writing): Reading Excel files:

library(readxl)
data <- read_excel("myfile.xlsx")

# For specific sheet: 
data <- read_excel("myfile.xlsx", sheet = "Sheet2")

Writing Excel files:

library(writexl)
write_xlsx(data, "output.xlsx")

Note that readxl comes with the tidyverse, but writexl needs to be installed separately. These functions work with both newer .xlsx and older .xls files.

12.14 Reading fixed width field files

Use read_fwf() from the readr package (part of tidyverse) to read fixed-width files where each column takes up a specific number of characters. You specify the width of each field:

library(readr)

# Define the widths of each column
data <- read_fwf("myfile.txt", 
                 fwf_widths(c(5, 10, 15),  # each number is the width of a column
                           col_names = c("id", "name", "address")))

# Or specify exact positions
data <- read_fwf("myfile.txt",
                 fwf_positions(start = c(1, 6, 16),  # where each column starts
                             end = c(5, 15, 30),     # where each column ends
                             col_names = c("id", "name", "address")))

Here’s a simple example. Imagine you have a file named employees.txt that looks like this:

12345John Smith           123 Main St          
98765Mary Johnson         456 Oak Ave          
54321Bob Wilson          789 Pine Rd

Each line represents an employee record where:

  • Employee ID takes exactly 5 characters
  • Name takes exactly 20 characters (including spaces)
  • Address takes exactly 20 characters

Here’s how to read it:

rCopylibrary(readr)

employees <- read_fwf("employees.txt",
                     fwf_widths(c(5, 20, 20),  # width of each column
                               col_names = c("id", "name", "address")))

The fwf_widths(c(5, 20, 20)) tells R:

Read first 5 characters as ‘id’ Read next 20 characters as ‘name’ Read next 20 characters as ‘address’

The resulting data frame would look like this:

# A tibble:
#      id  name                address            
#   <chr>  <chr>              <chr>              
# 1 12345  John Smith         123 Main St        
# 2 98765  Mary Johnson       456 Oak Ave        
# 3 54321  Bob Wilson         789 Pine Rd

12.15 Capture output of R commands

12.15.1 capture.output(someCommand, file=“someFile.txt”)

To save or “capture” the output from an R command, you can use the capture.output() function. It lets you store text output in a file or in a character vector:

# Save to a file:
capture.output(summary(mydata), file = "output.txt")

# Save to a variable:
my_output <- capture.output(summary(mydata))

12.16 sink(“someFile”), sink()

The sink() function is another option - it redirects all output to a file until you turn it off:

sink("output.txt")
print(summary(mydata))
sink()  # turns off the redirection