Strings in R 4.x vs 3.x (and earlier)
Among the several user-facing changes listed in R 4.0.0’s release notes was this point:
There is a new syntax for specifying raw character constants similar to the one used in C++:
r"(...)"
with...
any character sequence not containing the sequence)"
. This makes it easier to write strings that contain backslashes or both single and double quotes. For more details see?Quotes
.
To get a better sense of this (wonderful) feature addition, I thought it’d be useful to see some before/after examples.
Backslashes
One of the biggest frustrations when working with strings in R has been backslashes. In R 3.6 and earlier, it would think you’re trying to use escape sequences:
# R 3.3.3
(x <- c("test\string", "test string"))
Error: '\s' is an unrecognized escape in character string starting ""test\s"
If you wanted to have a string like “test\string”, you would need to escape the backslash:
# R 3.3.3
(x <- c("test\\string", "test string"))
[1] "test\\string" "test string"
# R 3.3.3
cat(x, sep="\n")
test\string
test string
Now, if we wanted to do any pattern matching, it’d get…rather tricky:
# R 3.3.3
grepl("\s", x)
Error: '\s' is an unrecognized escape in character string starting ""\s"
When we escape it, it becomes \s
(whitespace character class in
regular expressions):
# R 3.3.3
grepl("\\s", x)
[1] FALSE TRUE
So the second string (the one with a space in it) is a match for it. If we wanted to look for the actual, literal “\s” we would need to, like, double-escape:
# R 3.3.3
grepl("\\\\s", x)
[1] TRUE FALSE
As you can see, “\\s” matches “\s”. Not great, right?!? Hence the new system in 4.0:
# R 4.0.0
> (x <- c(r"(test\string)", "test string"))
[1] "test\\string" "test string"
# R 4.0.0
cat(x, sep = "\n")
test\string
test string
And matching is easier too:
# R 4.0.0
grepl(r"(\s)", x)
[1] FALSE TRUE
# R 4.0.0
> grepl(r"(\\s)", x)
[1] TRUE FALSE
Doesn’t that look nicer?
Quotes
Another benefit is mixing double and single quotes. In R 3.6 and earlier, you would need to escape whichever quotes you used to enclose the string. For example:
> ""test" and 'test'"
Error: unexpected symbol in """test"
> "\"test\" and 'test'"
[1] "\"test\" and 'test'"
But the new r"(...)"
syntax in R 4.0 lets you do the following:
> r"("test" and 'test')"
[1] "\"test\" and 'test'"
SO. NICE.