Gmail Calendar Documents Reader Web more »
Recently Visited Groups | Help | Sign in
Google Groups Home
Regular Expression Question
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  12 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Robert  
View profile  
 More options Nov 19 2009, 1:31 pm
Newsgroups: microsoft.public.scripting.vbscript
From: Robert <Rob...@discussions.microsoft.com>
Date: Thu, 19 Nov 2009 05:31:01 -0800
Local: Thurs, Nov 19 2009 1:31 pm
Subject: Regular Expression Question
I want to get all the 4 digit numbers from the following sfsdf1234567sdfsdf
where the alpha chars are entirely random.

When I use the pattern [0-9]..[0-9] I get a match count of 1. if i use
[0-9]... I get 2 as a match count.

I want my regexp to match against 1234, 2345, 3456, 4567 and get a count of 4?

Can someone tell me the correct pattern to use? (without counting on the
surrounding characters).


    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Tom Lavedas  
View profile  
 More options Nov 19 2009, 2:54 pm
Newsgroups: microsoft.public.scripting.vbscript
From: Tom Lavedas <tglba...@cox.net>
Date: Thu, 19 Nov 2009 06:54:19 -0800 (PST)
Local: Thurs, Nov 19 2009 2:54 pm
Subject: Re: Regular Expression Question
On Nov 19, 8:31 am, Robert <Rob...@discussions.microsoft.com> wrote:

> I want to get all the 4 digit numbers from the following sfsdf1234567sdfsdf
> where the alpha chars are entirely random.

> When I use the pattern [0-9]..[0-9] I get a match count of 1. if i use
> [0-9]... I get 2 as a match count.

> I want my regexp to match against 1234, 2345, 3456, 4567 and get a count of 4?

> Can someone tell me the correct pattern to use? (without counting on the
> surrounding characters).

The best I could do was find all contiguous digits greater than four
and then post process that ...

sMatch = RegExpFind("\d{4,}", "sfsdf1234567sdfsdf")
if sMatch <> "" then
  Redim aMatches(Len(sMatch) - 3)

  aMatches(0) = Left(sMatch, 4)

  For i = 1 to Len(sMatch) - 4
    aMatches(i) = Mid(sMatch, i + 1, 4)
  Next
  wsh.echo Join(aMatches, vbnewline)
else
  wsh.echo "No match found"
end if

' Finds the first match, only
Function RegExpFind(patrn, strng)
   Dim regEx, Matches                      ' Define variables.
   Set regEx        = New RegExp           ' Create a regular
expression.
   regEx.Pattern    = patrn                ' Set pattern.
   regEx.IgnoreCase = False                ' Set case insensitivity.
   regEx.Global     = False                ' Set global applicability.
   Set Matches      = regEx.Execute(strng) ' Execute search.
   RegExpFind       = Matches(0).Value     ' Return match
End Function

It get the answer you were looking for in this case.  I doesn't try to
find disjointed groups of four or more digits.  In that case, the
function would need to return an array and each element would need to
be processed, but that's certainly possible.
_____________________
Tom Lavedas


    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Csaba Gabor  
View profile  
 More options Nov 19 2009, 7:02 pm
Newsgroups: microsoft.public.scripting.vbscript
From: Csaba Gabor <dans...@gmail.com>
Date: Thu, 19 Nov 2009 11:02:11 -0800 (PST)
Local: Thurs, Nov 19 2009 7:02 pm
Subject: Re: Regular Expression Question
On Nov 19, 2:31 pm, Robert <Rob...@discussions.microsoft.com> wrote:

> I want to get all the 4 digit numbers from the following sfsdf1234567sdfsdf
> where the alpha chars are entirely random.
...
> I want my regexp to match against 1234, 2345, 3456, 4567 and get a count of 4?

This was a highly interesting problem because I think it
exposes a bug in VBScript Regular Expression object.  In
particular, the regular expression which finds the number
of such matches is:

(?=\d{4})

That is
subject = "sfsdf1234567sdfsdf"
Set regExM = New RegExp
regExM.Pattern = "(?=\d{4})"
regExM.Global = True
Set Matches = regExM.Execute(subject)
MsgBox Matches.count

This, however, does not return the matches.  To do
that, we should actually capture the 4 lookahead digits.
We can do so as follows: regExM.Pattern = "(?=(\d{4}))"

HOWEVER, if at this point you try to iterate through all
the matches:

For Each Match in Matches
  MsgBox Match.SubMatches(0)
Next

your VBScript will get really whacked out (the bug).
The supposedly captured subpatterns are actually
not available.  There is some major nastiness in there.
One's first reaction might be to think, well you shouldn't
be trying to make captures within a lookahead.  I mean
it doesn't say anywhere explicitly that that should work,
right?

But actually, capturing within forward lookahead does
work in both JScript and VBscript (as the below examples
demonstrate) if you approach it in an alternate fashion.
Here is the VBScript code which will show the number of
matches along with the actual matches:

subject = "sfsdf1234567sdfsdf"
Set regExR = New RegExp
regExR.Pattern = "(\d)(?=(\d{3}))"
regExR.Global = True

Set regExM = New RegExp
regExM.Pattern = "\d{4}"
regExM.Global = True

Set Matches = regExM.Execute(regExR.replace(subject, "$1$2 "))
res = Matches.count & " matches:"
For Each Match in Matches
  res = res & vbCrLf & Match.Value
Next
MsgBox res

Finally, here is a slightly more compact javascript
way to do the same thing (assuming it's in a script
element in a web page):

var subject = "abcd1234567wqe";
sub2 = subject.replace(/(\d)(?=(\d{3}))/g, "$1$2 ").match(/\d{4}/g);
alert (sub2.length + " matches:\n" + sub2.join("\n"));

Csaba Gabor from Vienna
Replace alert with WScript.Echo if you want to place the above
three JScript lines into a .js file to run from the command line.

PS.  I would be curious to know whether the .NET version of the
RegExp engine exhibits the same flaw.


    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Paul Randall  
View profile  
 More options Nov 20 2009, 3:11 am
Newsgroups: microsoft.public.scripting.vbscript
From: "Paul Randall" <paulr...@cableone.net>
Date: Thu, 19 Nov 2009 20:11:34 -0700
Local: Fri, Nov 20 2009 3:11 am
Subject: Re: Regular Expression Question

"Csaba Gabor" <dans...@gmail.com> wrote in message

news:d1dd3386-f6c9-454d-993a-68a464b49ada@v30g2000yqm.googlegroups.com...
On Nov 19, 2:31 pm, Robert <Rob...@discussions.microsoft.com> wrote:

> I want to get all the 4 digit numbers from the following
> sfsdf1234567sdfsdf
> where the alpha chars are entirely random.
...
> I want my regexp to match against 1234, 2345, 3456, 4567 and get a count
> of 4?

This was a highly interesting problem because I think it
exposes a bug in VBScript Regular Expression object.  In
particular, the regular expression which finds the number
of such matches is:

(?=\d{4})

That is
subject = "sfsdf1234567sdfsdf"
Set regExM = New RegExp
regExM.Pattern = "(?=\d{4})"
regExM.Global = True
Set Matches = regExM.Execute(subject)
MsgBox Matches.count

This, however, does not return the matches.  To do
that, we should actually capture the 4 lookahead digits.
We can do so as follows: regExM.Pattern = "(?=(\d{4}))"

HOWEVER, if at this point you try to iterate through all
the matches:

For Each Match in Matches
  MsgBox Match.SubMatches(0)
Next

your VBScript will get really whacked out (the bug).
The supposedly captured subpatterns are actually
not available.  There is some major nastiness in there.
One's first reaction might be to think, well you shouldn't
be trying to make captures within a lookahead.  I mean
it doesn't say anywhere explicitly that that should work,
right?

But actually, capturing within forward lookahead does
work in both JScript and VBscript (as the below examples
demonstrate) if you approach it in an alternate fashion.
Here is the VBScript code which will show the number of
matches along with the actual matches:

subject = "sfsdf1234567sdfsdf"
Set regExR = New RegExp
regExR.Pattern = "(\d)(?=(\d{3}))"
regExR.Global = True

Set regExM = New RegExp
regExM.Pattern = "\d{4}"
regExM.Global = True

Set Matches = regExM.Execute(regExR.replace(subject, "$1$2 "))
res = Matches.count & " matches:"
For Each Match in Matches
  res = res & vbCrLf & Match.Value
Next
MsgBox res

Finally, here is a slightly more compact javascript
way to do the same thing (assuming it's in a script
element in a web page):

var subject = "abcd1234567wqe";
sub2 = subject.replace(/(\d)(?=(\d{3}))/g, "$1$2 ").match(/\d{4}/g);
alert (sub2.length + " matches:\n" + sub2.join("\n"));

Csaba Gabor from Vienna
Replace alert with WScript.Echo if you want to place the above
three JScript lines into a .js file to run from the command line.

PS.  I would be curious to know whether the .NET version of the
RegExp engine exhibits the same flaw.
-------------------------------------------

Hi, Csaba
Download and install Regular Expression Workbench:
http://code.msdn.microsoft.com/RegexWorkbench/Release/ProjectReleases...
It uses the dot net engine.
I find it especially handy for parsing a regular expression into chunks that
I can understand, and for testing a regular expression I'm trying to build
for a VBScript application.  I have to be especially careful to remember
that just because it works in RE Workbench (dot net) does not mean that it
works in VBScript.  I don't know of any valid VBScript regular expressions
that it does not correctly parse.

-Paul Randall


    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Dr J R Stockton  
View profile  
 More options Nov 20 2009, 9:43 pm
Newsgroups: microsoft.public.scripting.vbscript
From: Dr J R Stockton <reply0...@merlyn.demon.co.uk>
Date: Fri, 20 Nov 2009 21:43:50 +0000
Local: Fri, Nov 20 2009 9:43 pm
Subject: Re: Regular Expression Question
In microsoft.public.scripting.vbscript message <A3A5975E-7677-4B93-948B-
C61DC30FE...@microsoft.com>, Thu, 19 Nov 2009 05:31:01, Robert
<Rob...@discussions.microsoft.com> posted:

>I want to get all the 4 digit numbers from the following sfsdf1234567sdfsdf
>where the alpha chars are entirely random.

>When I use the pattern [0-9]..[0-9] I get a match count of 1. if i use
>[0-9]... I get 2 as a match count.

>I want my regexp to match against 1234, 2345, 3456, 4567 and get a count of 4?

>Can someone tell me the correct pattern to use? (without counting on the
>surrounding characters).

This works in JavaScript (Firefox 3.0.15); perhaps it can be translated:

St = "aaa1234567b45454bb" ; T = []
  RE = /(\d\d\d\d)/gi
  RE.lastIndex = 0 // Seems needed in FF & Op, to repeat
  while (true) {
    A = RE.exec(St) ; RE.lastIndex -= 3
    if (!A) break
    T.push(A[1])
    }

Result is in T.

--
 (c) John Stockton, nr London, UK. ?...@merlyn.demon.co.uk  Turnpike v6.05  MIME.
 Web  <URL:http://www.merlyn.demon.co.uk/> - FAQqish topics, acronyms & links;
  Astro stuff via astron-1.htm, gravity0.htm ; quotings.htm, pascal.htm, etc.
 No Encoding. Quotes before replies. Snip well. Write clearly. Don't Mail News.


    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Evertjan.  
View profile  
 More options Nov 21 2009, 10:04 am
Newsgroups: microsoft.public.scripting.vbscript
From: "Evertjan." <exjxw.hannivo...@interxnl.net>
Date: 21 Nov 2009 10:04:47 GMT
Local: Sat, Nov 21 2009 10:04 am
Subject: Re: Regular Expression Question
Dr J R Stockton wrote on 20 nov 2009 in
microsoft.public.scripting.vbscript:

why the i ?

>   RE.lastIndex = 0 // Seems needed in FF & Op, to repeat
>   while (true) {
>     A = RE.exec(St) ; RE.lastIndex -= 3
>     if (!A) break
>     T.push(A[1])
>     }

> Result is in T.

This can be done with match():

<script type='text/javascript'>
        var s = "aaa1234567b45454bb";
        var r = s.match(/\d{4}/g);

        document.write(r); // 1234,4545
</script>

translated into vbs with regEx.Execute():

<script type='text/vbscript'>
   s = "aaa1234567b45454bb"
   Set regEx = New RegExp
   regEx.Pattern = "\d{4}"
   regEx.Global = True
   Set Matches = regEx.Execute(s)

   r = ""
   For Each Match in Matches
      if r <> "" Then r = r & ","
      r = r & Match.Value
   Next
   document.write r
</script>

[making quite a point of the ease of Javascript]

--
Evertjan.
The Netherlands.
(Please change the x'es to dots in my emailaddress)


    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Tom Lavedas  
View profile  
 More options Nov 21 2009, 11:57 am
Newsgroups: microsoft.public.scripting.vbscript
From: Tom Lavedas <tglba...@cox.net>
Date: Sat, 21 Nov 2009 03:57:22 -0800 (PST)
Local: Sat, Nov 21 2009 11:57 am
Subject: Re: Regular Expression Question
On Nov 21, 5:04 am, "Evertjan." <exjxw.hannivo...@interxnl.net> wrote:

Go all the way back to Robert's OP.  The formulation you post does not
meet the original requirements - and that's why the rest of the
discussion ensued.
_____________________
Tom Lavedas

    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Evertjan.  
View profile  
 More options Nov 21 2009, 12:56 pm
Newsgroups: microsoft.public.scripting.vbscript
From: "Evertjan." <exjxw.hannivo...@interxnl.net>
Date: 21 Nov 2009 12:56:45 GMT
Local: Sat, Nov 21 2009 12:56 pm
Subject: Re: Regular Expression Question
Tom Lavedas wrote on 21 nov 2009 in microsoft.public.scripting.vbscript:

[please do not quote signatures on usenet]

> Go all the way back to Robert's OP.  The formulation you post does not
> meet the original requirements - and that's why the rest of the
> discussion ensued.

I do not think so, Tom, the OP is included.

And even if it were, that is usenet,
discussion drifting is part of the fun.

--
Evertjan.
The Netherlands.
(Please change the x'es to dots in my emailaddress)


    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Tom Lavedas  
View profile  
 More options Nov 21 2009, 2:03 pm
Newsgroups: microsoft.public.scripting.vbscript
From: Tom Lavedas <tglba...@cox.net>
Date: Sat, 21 Nov 2009 06:03:59 -0800 (PST)
Local: Sat, Nov 21 2009 2:03 pm
Subject: Re: Regular Expression Question
On Nov 21, 7:56 am, "Evertjan." <exjxw.hannivo...@interxnl.net> wrote:

Sure, drift is what happens.  I agree, its part of the experience.
However, my point was that the pattern you posted does not return all
four digit permutation as the OP requested.  Your own commented code
says as much:

         document.write(r); // 1234,4545

The request, as I (and others) read it was for output like this ...

  1234, 2345, 3456, 4567, 4545, 5454
_____________________
Tom Lavedas


    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Evertjan.  
View profile  
 More options Nov 21 2009, 4:10 pm
Newsgroups: microsoft.public.scripting.vbscript
From: "Evertjan." <exjxw.hannivo...@interxnl.net>
Date: 21 Nov 2009 16:10:03 GMT
Local: Sat, Nov 21 2009 4:10 pm
Subject: Re: Regular Expression Question
Tom Lavedas wrote on 21 nov 2009 in microsoft.public.scripting.vbscript:

And again:
[please do not quote signatures on usenet]

> Sure, drift is what happens.  I agree, its part of the experience.
> However, my point was that the pattern you posted does not return all
> four digit permutation as the OP requested.  Your own commented code
> says as much:

>          document.write(r); // 1234,4545

> The request, as I (and others) read it was for output like this ...

>   1234, 2345, 3456, 4567, 4545, 5454

I see, well I did not read it that way.

In Javascript this is simple too:

<script type='text/javascript'>
var a = 'aaa1234567b45454bb'.split(/\D+/);
for (var i=0;i<a.length;i++)
  for (var j=0;j<a[i].length-3;j++)
        document.write(a[i].substr(j,4)+',');
</script>

writes: 1234,2345,3456,4567,4545,5454,

--
Evertjan.
The Netherlands.
(Please change the x'es to dots in my emailaddress)


    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Csaba Gabor  
View profile  
 More options Nov 21 2009, 8:29 pm
Newsgroups: microsoft.public.scripting.vbscript
From: Csaba Gabor <dans...@gmail.com>
Date: Sat, 21 Nov 2009 12:29:08 -0800 (PST)
Local: Sat, Nov 21 2009 8:29 pm
Subject: Re: Regular Expression Question
As does:
<script type='text/javascript'>
document.write("aaa1234567b45454bb".
  replace(/.*?(\d)(?=(\d{3}))|.+$/g, "$1$2,").replace(/,+$/,""))
</script>

The second replace merely gets rid of trailing commas.

The first replace, doing the heavy lifting, says: eat chars
until there's a digit (that's the (\d), which we'll call $1)
followed by three other digits (which we'll call $2).  When
that happens, replace $1 (and the prior characters) with:
$1 followed by $2 followed by ","

Characters now continue to be eaten starting from the char
following the $1 (in the original string).  The eating does
not start from after the $2 because $2 was embedded within
a forward lookahead, namely the (?=...), so that the $2
chars did not get eaten in the prior part.  Finally,
that |.+$ is there because once we get to the last set of 4
contiguous digits, we want to replace the remainder of the
string with the empty string (or more precisely, a comma),
so those final characters have to get eaten somehow (and
the plus ensures there is at least one character to eat).

This is a slight adaptation from the last part
of my Nov. 19 post in this same thread:
http://groups.google.com/group/microsoft.public.scripting.vbscript/br...

Csaba Gabor from Vienna


    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Dr J R Stockton  
View profile  
 More options Nov 22 2009, 6:42 pm
Newsgroups: microsoft.public.scripting.vbscript
From: Dr J R Stockton <reply0...@merlyn.demon.co.uk>
Date: Sun, 22 Nov 2009 18:42:19 +0000
Local: Sun, Nov 22 2009 6:42 pm
Subject: Re: Regular Expression Question
In microsoft.public.scripting.vbscript message <Xns9CCA70B5C810Feejj99@1
94.109.133.242>, Sat, 21 Nov 2009 10:04:47, Evertjan. <exjxw.hannivoort@
interxnl.net> posted:

Superfluously inherited from the code - now function DayCheck3 in
<linxchek.htm> - from which the code for this was derived.

>This can be done with match():

><script type='text/javascript'>
>       var s = "aaa1234567b45454bb";
>       var r = s.match(/\d{4}/g);

>       document.write(r); // 1234,4545
></script>

Only if one does not mind getting a different result.

The OP did not want a string, AFAICS.

Rather than using two nested loops, one could use the local equivalent
of (after proper testing)

        for (J=0, T = [] ; J<St.length-3 ; J++)
          if ( !/\D/.test(S = St.substr(J, 4) ) ) T.push(S)

--
 (c) John Stockton, Surrey, UK.  ?...@merlyn.demon.co.uk   Turnpike v6.05   MIME.
 Web  <URL:http://www.merlyn.demon.co.uk/> - FAQish topics, acronyms, & links.
 Proper <= 4-line sig. separator as above, a line exactly "-- " (SonOfRFC1036)
 Do not Mail News to me. Before a reply, quote with ">" or "> " (SonOfRFC1036)


    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »

Create a group - Google Groups - Google Home - Terms of Service - Privacy Policy
©2010 Google