Is anyone good with #Rstats and #regex ? I'm having issues.
strings <- c("150 hertz", "70 hz", NA, "between 87 and 100 hz ocillations", "15hz", "triangle 110 hertz", "144Hz, Sine waveform", "It is a hysterical idling. More vibraton than sound.", NA)
I want to replace each string with the digits (well, the first set) found in it, if any. I try this:
sub("(^.*)(\\d{2,5})(.*$)", "\\2", strings)
I get this as a result:
[1] "50"
[2] "70"
[3] NA
[4] "00"
[5] "15"
[6] "10"
[7] "44"
[8] "It is a hysterical idling. More vibraton than sound."
[9] NA
I expect to get all digits (the first set in each string) if they are from 2 to 5 digits long. Instead, I only get 2 digits.
Using similar regex in #geany just to prepare this little example I got the expected behavior. I've updated and restarted R. I've used sub and gsub. Same result. If I specify \d{3,5} I get three digits. If I say \d{1,3} I get one digit. I always get the number of digits specified in the first value in the curly brackets.
Maybe R is just vomiting or something. But if you know of an issue with R and regex that results in this, please let me know.