Strings in R 4.x vs 3.x (and earlier)

Among the several user-facing changes listed in R 4.0.0’s release notes was this point:

There is a new syntax for specifying raw character constants similar to the one used in C++: r"(...)" with ... any character sequence not containing the sequence )". This makes it easier to write strings that contain backslashes or both single and double quotes. For more details see ?Quotes.

To get a better sense of this (wonderful) feature addition, I thought it’d be useful to see some before/after examples.

Backslashes

One of the biggest frustrations when working with strings in R has been backslashes. In R 3.6 and earlier, it would think you’re trying to use escape sequences:

# R 3.3.3
(x <- c("test\string", "test string"))
Error: '\s' is an unrecognized escape in character string starting ""test\s"

If you wanted to have a string like “test\string”, you would need to escape the backslash:

# R 3.3.3
(x <- c("test\\string", "test string"))
[1] "test\\string" "test string"
# R 3.3.3
cat(x, sep="\n")
test\string
test string

Now, if we wanted to do any pattern matching, it’d get…rather tricky:

# R 3.3.3
grepl("\s", x)
Error: '\s' is an unrecognized escape in character string starting ""\s"

When we escape it, it becomes \s (whitespace character class in regular expressions):

# R 3.3.3
grepl("\\s", x)
[1] FALSE  TRUE

So the second string (the one with a space in it) is a match for it. If we wanted to look for the actual, literal “\s” we would need to, like, double-escape:

# R 3.3.3
grepl("\\\\s", x)
[1]  TRUE FALSE

As you can see, “\\s” matches “\s”. Not great, right?!? Hence the new system in 4.0:

# R 4.0.0
> (x <- c(r"(test\string)", "test string"))
[1] "test\\string" "test string"
# R 4.0.0
cat(x, sep = "\n")
test\string
test string

And matching is easier too:

# R 4.0.0
grepl(r"(\s)", x)
[1] FALSE  TRUE
# R 4.0.0
> grepl(r"(\\s)", x)
[1]  TRUE FALSE

Doesn’t that look nicer?

Quotes

Another benefit is mixing double and single quotes. In R 3.6 and earlier, you would need to escape whichever quotes you used to enclose the string. For example:

> ""test" and 'test'"
Error: unexpected symbol in """test"
> "\"test\" and 'test'"
[1] "\"test\" and 'test'"

But the new r"(...)" syntax in R 4.0 lets you do the following:

> r"("test" and 'test')"
[1] "\"test\" and 'test'"

SO. NICE.

Posted on:
May 22, 2020
Length:
2 minute read, 390 words
Tags:
R
See Also:
Wikipedia Preview for R Markdown documents
Even faster matrix math in R on macOS with M1
Making Of: Session Tick visualization