On Nov 19, 8:31 am, Robert <Rob...@discussions.microsoft.com> wrote:
> I want to get all the 4 digit numbers from the following sfsdf1234567sdfsdf > where the alpha chars are entirely random.
> When I use the pattern [0-9]..[0-9] I get a match count of 1. if i use > [0-9]... I get 2 as a match count.
> I want my regexp to match against 1234, 2345, 3456, 4567 and get a count of 4?
> Can someone tell me the correct pattern to use? (without counting on the > surrounding characters).
The best I could do was find all contiguous digits greater than four and then post process that ...
sMatch = RegExpFind("\d{4,}", "sfsdf1234567sdfsdf") if sMatch <> "" then Redim aMatches(Len(sMatch) - 3)
aMatches(0) = Left(sMatch, 4)
For i = 1 to Len(sMatch) - 4 aMatches(i) = Mid(sMatch, i + 1, 4) Next wsh.echo Join(aMatches, vbnewline) else wsh.echo "No match found" end if
' Finds the first match, only Function RegExpFind(patrn, strng) Dim regEx, Matches ' Define variables. Set regEx = New RegExp ' Create a regular expression. regEx.Pattern = patrn ' Set pattern. regEx.IgnoreCase = False ' Set case insensitivity. regEx.Global = False ' Set global applicability. Set Matches = regEx.Execute(strng) ' Execute search. RegExpFind = Matches(0).Value ' Return match End Function
It get the answer you were looking for in this case. I doesn't try to find disjointed groups of four or more digits. In that case, the function would need to return an array and each element would need to be processed, but that's certainly possible. _____________________ Tom Lavedas
On Nov 19, 2:31 pm, Robert <Rob...@discussions.microsoft.com> wrote:
> I want to get all the 4 digit numbers from the following sfsdf1234567sdfsdf > where the alpha chars are entirely random. ... > I want my regexp to match against 1234, 2345, 3456, 4567 and get a count of 4?
This was a highly interesting problem because I think it exposes a bug in VBScript Regular Expression object. In particular, the regular expression which finds the number of such matches is:
(?=\d{4})
That is subject = "sfsdf1234567sdfsdf" Set regExM = New RegExp regExM.Pattern = "(?=\d{4})" regExM.Global = True Set Matches = regExM.Execute(subject) MsgBox Matches.count
This, however, does not return the matches. To do that, we should actually capture the 4 lookahead digits. We can do so as follows: regExM.Pattern = "(?=(\d{4}))"
HOWEVER, if at this point you try to iterate through all the matches:
For Each Match in Matches MsgBox Match.SubMatches(0) Next
your VBScript will get really whacked out (the bug). The supposedly captured subpatterns are actually not available. There is some major nastiness in there. One's first reaction might be to think, well you shouldn't be trying to make captures within a lookahead. I mean it doesn't say anywhere explicitly that that should work, right?
But actually, capturing within forward lookahead does work in both JScript and VBscript (as the below examples demonstrate) if you approach it in an alternate fashion. Here is the VBScript code which will show the number of matches along with the actual matches:
subject = "sfsdf1234567sdfsdf" Set regExR = New RegExp regExR.Pattern = "(\d)(?=(\d{3}))" regExR.Global = True
Set regExM = New RegExp regExM.Pattern = "\d{4}" regExM.Global = True
Set Matches = regExM.Execute(regExR.replace(subject, "$1$2 ")) res = Matches.count & " matches:" For Each Match in Matches res = res & vbCrLf & Match.Value Next MsgBox res
Finally, here is a slightly more compact javascript way to do the same thing (assuming it's in a script element in a web page):
Csaba Gabor from Vienna Replace alert with WScript.Echo if you want to place the above three JScript lines into a .js file to run from the command line.
PS. I would be curious to know whether the .NET version of the RegExp engine exhibits the same flaw.
> I want to get all the 4 digit numbers from the following > sfsdf1234567sdfsdf > where the alpha chars are entirely random. ... > I want my regexp to match against 1234, 2345, 3456, 4567 and get a count > of 4?
This was a highly interesting problem because I think it exposes a bug in VBScript Regular Expression object. In particular, the regular expression which finds the number of such matches is:
(?=\d{4})
That is subject = "sfsdf1234567sdfsdf" Set regExM = New RegExp regExM.Pattern = "(?=\d{4})" regExM.Global = True Set Matches = regExM.Execute(subject) MsgBox Matches.count
This, however, does not return the matches. To do that, we should actually capture the 4 lookahead digits. We can do so as follows: regExM.Pattern = "(?=(\d{4}))"
HOWEVER, if at this point you try to iterate through all the matches:
For Each Match in Matches MsgBox Match.SubMatches(0) Next
your VBScript will get really whacked out (the bug). The supposedly captured subpatterns are actually not available. There is some major nastiness in there. One's first reaction might be to think, well you shouldn't be trying to make captures within a lookahead. I mean it doesn't say anywhere explicitly that that should work, right?
But actually, capturing within forward lookahead does work in both JScript and VBscript (as the below examples demonstrate) if you approach it in an alternate fashion. Here is the VBScript code which will show the number of matches along with the actual matches:
subject = "sfsdf1234567sdfsdf" Set regExR = New RegExp regExR.Pattern = "(\d)(?=(\d{3}))" regExR.Global = True
Set regExM = New RegExp regExM.Pattern = "\d{4}" regExM.Global = True
Set Matches = regExM.Execute(regExR.replace(subject, "$1$2 ")) res = Matches.count & " matches:" For Each Match in Matches res = res & vbCrLf & Match.Value Next MsgBox res
Finally, here is a slightly more compact javascript way to do the same thing (assuming it's in a script element in a web page):
Csaba Gabor from Vienna Replace alert with WScript.Echo if you want to place the above three JScript lines into a .js file to run from the command line.
PS. I would be curious to know whether the .NET version of the RegExp engine exhibits the same flaw. -------------------------------------------
Hi, Csaba Download and install Regular Expression Workbench: http://code.msdn.microsoft.com/RegexWorkbench/Release/ProjectReleases... It uses the dot net engine. I find it especially handy for parsing a regular expression into chunks that I can understand, and for testing a regular expression I'm trying to build for a VBScript application. I have to be especially careful to remember that just because it works in RE Workbench (dot net) does not mean that it works in VBScript. I don't know of any valid VBScript regular expressions that it does not correctly parse.
In microsoft.public.scripting.vbscript message <A3A5975E-7677-4B93-948B- C61DC30FE...@microsoft.com>, Thu, 19 Nov 2009 05:31:01, Robert <Rob...@discussions.microsoft.com> posted:
>I want to get all the 4 digit numbers from the following sfsdf1234567sdfsdf >where the alpha chars are entirely random.
>When I use the pattern [0-9]..[0-9] I get a match count of 1. if i use >[0-9]... I get 2 as a match count.
>I want my regexp to match against 1234, 2345, 3456, 4567 and get a count of 4?
>Can someone tell me the correct pattern to use? (without counting on the >surrounding characters).
This works in JavaScript (Firefox 3.0.15); perhaps it can be translated:
St = "aaa1234567b45454bb" ; T = [] RE = /(\d\d\d\d)/gi RE.lastIndex = 0 // Seems needed in FF & Op, to repeat while (true) { A = RE.exec(St) ; RE.lastIndex -= 3 if (!A) break T.push(A[1]) }
Result is in T.
-- (c) John Stockton, nr London, UK. ?...@merlyn.demon.co.uk Turnpike v6.05 MIME. Web <URL:http://www.merlyn.demon.co.uk/> - FAQqish topics, acronyms & links; Astro stuff via astron-1.htm, gravity0.htm ; quotings.htm, pascal.htm, etc. No Encoding. Quotes before replies. Snip well. Write clearly. Don't Mail News.
> In microsoft.public.scripting.vbscript message > <A3A5975E-7677-4B93-948B- C61DC30FE...@microsoft.com>, Thu, 19 Nov > 2009 05:31:01, Robert <Rob...@discussions.microsoft.com> posted: >>I want to get all the 4 digit numbers from the following >>sfsdf1234567sdfsdf where the alpha chars are entirely random.
>>When I use the pattern [0-9]..[0-9] I get a match count of 1. if i use >>[0-9]... I get 2 as a match count.
>>I want my regexp to match against 1234, 2345, 3456, 4567 and get a >>count of 4?
>>Can someone tell me the correct pattern to use? (without counting on >>the surrounding characters).
> This works in JavaScript (Firefox 3.0.15); perhaps it can be > translated:
> St = "aaa1234567b45454bb" ; T = [] > RE = /(\d\d\d\d)/gi
why the i ?
> RE.lastIndex = 0 // Seems needed in FF & Op, to repeat > while (true) { > A = RE.exec(St) ; RE.lastIndex -= 3 > if (!A) break > T.push(A[1]) > }
> Result is in T.
This can be done with match():
<script type='text/javascript'> var s = "aaa1234567b45454bb"; var r = s.match(/\d{4}/g);
document.write(r); // 1234,4545 </script>
translated into vbs with regEx.Execute():
<script type='text/vbscript'> s = "aaa1234567b45454bb" Set regEx = New RegExp regEx.Pattern = "\d{4}" regEx.Global = True Set Matches = regEx.Execute(s)
r = "" For Each Match in Matches if r <> "" Then r = r & "," r = r & Match.Value Next document.write r </script>
[making quite a point of the ease of Javascript]
-- Evertjan. The Netherlands. (Please change the x'es to dots in my emailaddress)
> Dr J R Stockton wrote on 20 nov 2009 in > microsoft.public.scripting.vbscript:
> > In microsoft.public.scripting.vbscript message > > <A3A5975E-7677-4B93-948B- C61DC30FE...@microsoft.com>, Thu, 19 Nov > > 2009 05:31:01, Robert <Rob...@discussions.microsoft.com> posted: > >>I want to get all the 4 digit numbers from the following > >>sfsdf1234567sdfsdf where the alpha chars are entirely random.
> >>When I use the pattern [0-9]..[0-9] I get a match count of 1. if i use > >>[0-9]... I get 2 as a match count.
> >>I want my regexp to match against 1234, 2345, 3456, 4567 and get a > >>count of 4?
> >>Can someone tell me the correct pattern to use? (without counting on > >>the surrounding characters).
> > This works in JavaScript (Firefox 3.0.15); perhaps it can be > > translated:
> > St = "aaa1234567b45454bb" ; T = [] > > RE = /(\d\d\d\d)/gi
> why the i ?
> > RE.lastIndex = 0 // Seems needed in FF & Op, to repeat > > while (true) { > > A = RE.exec(St) ; RE.lastIndex -= 3 > > if (!A) break > > T.push(A[1]) > > }
> > Result is in T.
> This can be done with match():
> <script type='text/javascript'> > var s = "aaa1234567b45454bb"; > var r = s.match(/\d{4}/g);
> document.write(r); // 1234,4545 > </script>
> translated into vbs with regEx.Execute():
> <script type='text/vbscript'> > s = "aaa1234567b45454bb" > Set regEx = New RegExp > regEx.Pattern = "\d{4}" > regEx.Global = True > Set Matches = regEx.Execute(s)
> r = "" > For Each Match in Matches > if r <> "" Then r = r & "," > r = r & Match.Value > Next > document.write r > </script>
> [making quite a point of the ease of Javascript]
> -- > Evertjan. > The Netherlands. > (Please change the x'es to dots in my emailaddress)
Go all the way back to Robert's OP. The formulation you post does not meet the original requirements - and that's why the rest of the discussion ensued. _____________________ Tom Lavedas
> On Nov 21, 5:04 am, "Evertjan." <exjxw.hannivo...@interxnl.net> wrote: >> Dr J R Stockton wrote on 20 nov 2009 in >> > <A3A5975E-7677-4B93-948B- C61DC30FE...@microsoft.com>, Thu, 19 Nov >> > 2009 05:31:01, Robert <Rob...@discussions.microsoft.com> posted: >> >>I want to get all the 4 digit numbers from the following >> >>sfsdf1234567sdfsdf where the alpha chars are entirely random.
>> >>When I use the pattern [0-9]..[0-9] I get a match count of 1. if i >> >>use [0-9]... I get 2 as a match count.
>> >>I want my regexp to match against 1234, 2345, 3456, 4567 and get a >> >>count of 4?
>> >>Can someone tell me the correct pattern to use? (without counting >> >>on the surrounding characters).
>> > This works in JavaScript (Firefox 3.0.15); perhaps it can be >> > translated:
>> > St = "aaa1234567b45454bb" ; T = [] >> > RE = /(\d\d\d\d)/gi
>> why the i ?
>> > RE.lastIndex = 0 // Seems needed in FF & Op, to repeat >> > while (true) { >> > A = RE.exec(St) ; RE.lastIndex -= 3 >> > if (!A) break >> > T.push(A[1]) >> > }
>> > Result is in T.
>> This can be done with match():
>> <script type='text/javascript'> >> var s = "aaa1234567b45454bb"; >> var r = s.match(/\d{4}/g);
>> document.write(r); // 1234,4545 >> </script>
>> translated into vbs with regEx.Execute():
>> <script type='text/vbscript'> >> s = "aaa1234567b45454bb" >> Set regEx = New RegExp >> regEx.Pattern = "\d{4}" >> regEx.Global = True >> Set Matches = regEx.Execute(s)
>> r = "" >> For Each Match in Matches >> if r <> "" Then r = r & "," >> r = r & Match.Value >> Next >> document.write r >> </script>
>> [making quite a point of the ease of Javascript]
[please do not quote signatures on usenet]
> Go all the way back to Robert's OP. The formulation you post does not > meet the original requirements - and that's why the rest of the > discussion ensued.
I do not think so, Tom, the OP is included.
And even if it were, that is usenet, discussion drifting is part of the fun.
-- Evertjan. The Netherlands. (Please change the x'es to dots in my emailaddress)
> Tom Lavedas wrote on 21 nov 2009 in microsoft.public.scripting.vbscript:
> > On Nov 21, 5:04 am, "Evertjan." <exjxw.hannivo...@interxnl.net> wrote: > >> Dr J R Stockton wrote on 20 nov 2009 in > >> > <A3A5975E-7677-4B93-948B- C61DC30FE...@microsoft.com>, Thu, 19 Nov > >> > 2009 05:31:01, Robert <Rob...@discussions.microsoft.com> posted: > >> >>I want to get all the 4 digit numbers from the following > >> >>sfsdf1234567sdfsdf where the alpha chars are entirely random.
> >> >>When I use the pattern [0-9]..[0-9] I get a match count of 1. if i > >> >>use [0-9]... I get 2 as a match count.
> >> >>I want my regexp to match against 1234, 2345, 3456, 4567 and get a > >> >>count of 4?
> >> >>Can someone tell me the correct pattern to use? (without counting > >> >>on the surrounding characters).
> >> > This works in JavaScript (Firefox 3.0.15); perhaps it can be > >> > translated:
> >> > St = "aaa1234567b45454bb" ; T = [] > >> > RE = /(\d\d\d\d)/gi
> >> why the i ?
> >> > RE.lastIndex = 0 // Seems needed in FF & Op, to repeat > >> > while (true) { > >> > A = RE.exec(St) ; RE.lastIndex -= 3 > >> > if (!A) break > >> > T.push(A[1]) > >> > }
> >> > Result is in T.
> >> This can be done with match():
> >> <script type='text/javascript'> > >> var s = "aaa1234567b45454bb"; > >> var r = s.match(/\d{4}/g);
> >> <script type='text/vbscript'> > >> s = "aaa1234567b45454bb" > >> Set regEx = New RegExp > >> regEx.Pattern = "\d{4}" > >> regEx.Global = True > >> Set Matches = regEx.Execute(s)
> >> r = "" > >> For Each Match in Matches > >> if r <> "" Then r = r & "," > >> r = r & Match.Value > >> Next > >> document.write r > >> </script>
> >> [making quite a point of the ease of Javascript]
> [please do not quote signatures on usenet]
> > Go all the way back to Robert's OP. The formulation you post does not > > meet the original requirements - and that's why the rest of the > > discussion ensued.
> I do not think so, Tom, the OP is included.
> And even if it were, that is usenet, > discussion drifting is part of the fun.
> -- > Evertjan. > The Netherlands. > (Please change the x'es to dots in my emailaddress)
Sure, drift is what happens. I agree, its part of the experience. However, my point was that the pattern you posted does not return all four digit permutation as the OP requested. Your own commented code says as much:
document.write(r); // 1234,4545
The request, as I (and others) read it was for output like this ...
1234, 2345, 3456, 4567, 4545, 5454 _____________________ Tom Lavedas
> On Nov 21, 7:56 am, "Evertjan." <exjxw.hannivo...@interxnl.net> wrote: >> Tom Lavedas wrote on 21 nov 2009 in >> microsoft.public.scripting.vbscript:
>> > On Nov 21, 5:04 am, "Evertjan." <exjxw.hannivo...@interxnl.net> >> > wrote >: >> >> Dr J R Stockton wrote on 20 nov 2009 in >> >> > <A3A5975E-7677-4B93-948B- C61DC30FE...@microsoft.com>, Thu, 19 >> >> > Nov 2009 05:31:01, Robert <Rob...@discussions.microsoft.com> >> >> > posted: >> >> >>I want to get all the 4 digit numbers from the following >> >> >>sfsdf1234567sdfsdf where the alpha chars are entirely random.
>> >> >>When I use the pattern [0-9]..[0-9] I get a match count of 1. if >> >> >>i use [0-9]... I get 2 as a match count.
>> >> >>I want my regexp to match against 1234, 2345, 3456, 4567 and get >> >> >>a count of 4?
>> >> >>Can someone tell me the correct pattern to use? (without >> >> >>counting on the surrounding characters).
>> >> > This works in JavaScript (Firefox 3.0.15); perhaps it can be >> >> > translated:
>> >> > St = "aaa1234567b45454bb" ; T = [] >> >> > RE = /(\d\d\d\d)/gi
>> >> why the i ?
>> >> > RE.lastIndex = 0 // Seems needed in FF & Op, to repeat >> >> > while (true) { >> >> > A = RE.exec(St) ; RE.lastIndex -= 3 >> >> > if (!A) break >> >> > T.push(A[1]) >> >> > }
>> >> > Result is in T.
>> >> This can be done with match():
>> >> <script type='text/javascript'> >> >> var s = "aaa1234567b45454bb"; >> >> var r = s.match(/\d{4}/g);
>> >> <script type='text/vbscript'> >> >> s = "aaa1234567b45454bb" >> >> Set regEx = New RegExp >> >> regEx.Pattern = "\d{4}" >> >> regEx.Global = True >> >> Set Matches = regEx.Execute(s)
>> >> r = "" >> >> For Each Match in Matches >> >> if r <> "" Then r = r & "," >> >> r = r & Match.Value >> >> Next >> >> document.write r >> >> </script>
>> >> [making quite a point of the ease of Javascript]
>> [please do not quote signatures on usenet]
>> > Go all the way back to Robert's OP. The formulation you post does >> > no > t >> > meet the original requirements - and that's why the rest of the >> > discussion ensued.
>> I do not think so, Tom, the OP is included.
>> And even if it were, that is usenet, >> discussion drifting is part of the fun.
And again: [please do not quote signatures on usenet]
> Sure, drift is what happens. I agree, its part of the experience. > However, my point was that the pattern you posted does not return all > four digit permutation as the OP requested. Your own commented code > says as much:
> document.write(r); // 1234,4545
> The request, as I (and others) read it was for output like this ...
> 1234, 2345, 3456, 4567, 4545, 5454
I see, well I did not read it that way.
In Javascript this is simple too:
<script type='text/javascript'> var a = 'aaa1234567b45454bb'.split(/\D+/); for (var i=0;i<a.length;i++) for (var j=0;j<a[i].length-3;j++) document.write(a[i].substr(j,4)+','); </script>
writes: 1234,2345,3456,4567,4545,5454,
-- Evertjan. The Netherlands. (Please change the x'es to dots in my emailaddress)
As does: <script type='text/javascript'> document.write("aaa1234567b45454bb". replace(/.*?(\d)(?=(\d{3}))|.+$/g, "$1$2,").replace(/,+$/,"")) </script>
The second replace merely gets rid of trailing commas.
The first replace, doing the heavy lifting, says: eat chars until there's a digit (that's the (\d), which we'll call $1) followed by three other digits (which we'll call $2). When that happens, replace $1 (and the prior characters) with: $1 followed by $2 followed by ","
Characters now continue to be eaten starting from the char following the $1 (in the original string). The eating does not start from after the $2 because $2 was embedded within a forward lookahead, namely the (?=...), so that the $2 chars did not get eaten in the prior part. Finally, that |.+$ is there because once we get to the last set of 4 contiguous digits, we want to replace the remainder of the string with the empty string (or more precisely, a comma), so those final characters have to get eaten somehow (and the plus ensures there is at least one character to eat).
>Dr J R Stockton wrote on 20 nov 2009 in >microsoft.public.scripting.vbscript:
>> In microsoft.public.scripting.vbscript message >> <A3A5975E-7677-4B93-948B- C61DC30FE...@microsoft.com>, Thu, 19 Nov >> 2009 05:31:01, Robert <Rob...@discussions.microsoft.com> posted: >>>I want to get all the 4 digit numbers from the following >>>sfsdf1234567sdfsdf where the alpha chars are entirely random.
>>>When I use the pattern [0-9]..[0-9] I get a match count of 1. if i use >>>[0-9]... I get 2 as a match count.
>>>I want my regexp to match against 1234, 2345, 3456, 4567 and get a >>>count of 4?
>>>Can someone tell me the correct pattern to use? (without counting on >>>the surrounding characters).
>> This works in JavaScript (Firefox 3.0.15); perhaps it can be >> translated:
>> St = "aaa1234567b45454bb" ; T = [] >> RE = /(\d\d\d\d)/gi
>why the i ?
Superfluously inherited from the code - now function DayCheck3 in <linxchek.htm> - from which the code for this was derived.
>This can be done with match():
><script type='text/javascript'> > var s = "aaa1234567b45454bb"; > var r = s.match(/\d{4}/g);
> document.write(r); // 1234,4545 ></script>
Only if one does not mind getting a different result.
><script type='text/vbscript'> > s = "aaa1234567b45454bb" > Set regEx = New RegExp > regEx.Pattern = "\d{4}" > regEx.Global = True > Set Matches = regEx.Execute(s)
> r = "" > For Each Match in Matches > if r <> "" Then r = r & "," > r = r & Match.Value > Next > document.write r ></script>
>[making quite a point of the ease of Javascript]
The OP did not want a string, AFAICS.
Rather than using two nested loops, one could use the local equivalent of (after proper testing)
for (J=0, T = [] ; J<St.length-3 ; J++) if ( !/\D/.test(S = St.substr(J, 4) ) ) T.push(S)
-- (c) John Stockton, Surrey, UK. ?...@merlyn.demon.co.uk Turnpike v6.05 MIME. Web <URL:http://www.merlyn.demon.co.uk/> - FAQish topics, acronyms, & links. Proper <= 4-line sig. separator as above, a line exactly "-- " (SonOfRFC1036) Do not Mail News to me. Before a reply, quote with ">" or "> " (SonOfRFC1036)