Is anyone good with #Rstats and #regex ? I'm having issues.

strings <- c("150 hertz", "70 hz", NA, "between 87 and 100 hz ocillations", "15hz", "triangle 110 hertz", "144Hz, Sine waveform", "It is a hysterical idling. More vibraton than sound.", NA)

I want to replace each string with the digits (well, the first set) found in it, if any. I try this:

sub("(^.*)(\\d{2,5})(.*$)", "\\2", strings)

I get this as a result:

[1] "50"
[2] "70"
[3] NA
[4] "00"
[5] "15"
[6] "10"
[7] "44"
[8] "It is a hysterical idling. More vibraton than sound."
[9] NA

I expect to get all digits (the first set in each string) if they are from 2 to 5 digits long. Instead, I only get 2 digits.

Using similar regex in #geany just to prepare this little example I got the expected behavior. I've updated and restarted R. I've used sub and gsub. Same result. If I specify \d{3,5} I get three digits. If I say \d{1,3} I get one digit. I always get the number of digits specified in the first value in the curly brackets.

Maybe R is just vomiting or something. But if you know of an issue with R and regex that results in this, please let me know.

@guyjantic is it getting confused by .*<number stuff>.*?

. Includes numbers.

Maybe a something like ^[//s ]*(//d{2,5}).*$

@guyjantic
You need to make that greedy. Might be as easy as

sub("(^.*?)(\\d{2,5})(.*?$)", "\\2", strings)

This makes the matches before and after "lazy", meaning they match as few as possible.

Edit: I didn't test it due to on my phone now.

@guyjantic if you are happy with just the first numerical thing, parse_number() works really well.
@nxskok Ooh, this is new! Thanks.
@guyjantic it's actually in readr, which seems like an odd place for it, but it seems to be very good at finding a number hidden in text. (It will only find the first one; if you want more than that, you need something more sophisticated.)

@guyjantic Is "87" what you want from the fourth string?

If so, making it non greedy seems to work:
sub(".*?(\\d{2,5}).*", "\\1", strings)

@JMkinen I went with stringr::str_extract() but this is exactly what I was looking for :)
@guyjantic I'm no regexpert (sry, pun intended) by any means, but why not just str_extract(x, "[:digit:]")? str_extract() always returns the first match only - for all matches there's str_extract_all() - and [:digit:] returns, well, all digits.
@MarauderPixie That's what I went with and it worked like a charm. I'm not super familiar with stringr so I'd tried str_sub but needed a hint to try str_extract().
@guyjantic maybe
gsub("^\\D*(\\d{2,5}).*$", "\\1", strings)