Gmail Calendar Documents Reader Web more »
Recently Visited Groups | Help | Sign in
Google Groups Home
Remove extra Double codes from CSV Files (vb Script Language)
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  12 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Learn4Develop  
View profile  
 More options Nov 16 2009, 11:20 am
Newsgroups: microsoft.public.scripting.vbscript
From: Learn4Develop <Learn4Deve...@discussions.microsoft.com>
Date: Mon, 16 Nov 2009 03:20:01 -0800
Local: Mon, Nov 16 2009 11:20 am
Subject: Remove extra Double codes from CSV Files (vb Script Language)
On daily bases I have received files with double quotes comma separated, but
in some records I found 2 times double quotes and some time different squence
of double quotes placements.

I just want to remove all unnecessary (extra) double quotes (see the
following values where i want to remove extra double quotes.

"0123x",""Company D-Val"","Class D, sector N","DD5894"
"4894D",""Recycle" Rubbish, C class","Class D, Sector F, Block N","D870GH"
"AB8679",""AB Ltd"",""Need" Clean Drive Way, stores","GF0347"

I am looking the output like that.

"0123x","Company D-Val","Class D, sector N","DD5894"
"4894D","Recycle Rubbish, C class","Class D, Sector F, Block N","D870GH"
"AB8679","AB Ltd","Need Clean Drive Way, stores","GF0347"

Following code i used is work fine with ""Company D-Val"" but didn't work
with ""Recycle" Rubbish, C class", The code is

code:-
Do While inFile.AtEndOfStream = False
record = InFile.ReadLine()
record = Replace(record, """", """") 'replace "" with single space
outFile.WriteLine record & vbclrf
Loop


    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Tom Lavedas  
View profile  
 More options Nov 16 2009, 3:28 pm
Newsgroups: microsoft.public.scripting.vbscript
From: Tom Lavedas <tglba...@cox.net>
Date: Mon, 16 Nov 2009 07:28:03 -0800 (PST)
Local: Mon, Nov 16 2009 3:28 pm
Subject: Re: Remove extra Double codes from CSV Files (vb Script Language)
On Nov 16, 6:20 am, Learn4Develop

{snip}

It was tricky getting the indexing right, but this seemed to do what
you want on the test sample you provided ...

Do Until inFile.AtEndOfStream
  record = InFile.ReadLine
  nPos = Instr(record, Chr(34) & Chr(34))
  do while nPos > 0
    record = Left(record, nPos - 1) & Mid(record, nPos + 1)
    nPos = Instr(nPos + 1, record, Chr(34))
    record = Left(record, nPos - 1) & Mid(record, nPos + 1)
    nPos = Instr(nPos + 1, record, Chr(34) & Chr(34))
  loop
  outFile.WriteLine record
Loop

The approach removes the first of a pair of double quotes and the very
next double quote that follows; thereby removing the starting and
ending quotes that enclose text, when there are too many.
_____________________
Tom Lavedas


    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Pegasus [MVP]  
View profile  
 More options Nov 16 2009, 8:54 pm
Newsgroups: microsoft.public.scripting.vbscript
From: "Pegasus [MVP]" <n...@microsoft.com>
Date: Mon, 16 Nov 2009 21:54:18 +0100
Local: Mon, Nov 16 2009 8:54 pm
Subject: Re: Remove extra Double codes from CSV Files (vb Script Language)

"Learn4Develop" <Learn4Deve...@discussions.microsoft.com> wrote in message

news:C072F055-9B82-4D2D-A4D2-216403F47814@microsoft.com...

You could do it like this for every line of text:
1. Replace every "," with your own delimiter. I use chr(0).
2. Remove all remaining double quotes.
3. Replace your own delimiters with ",".
4. Surround the line with double quotes.
This code appears to work for your data sample:

Q = """"
Z = Chr(0)
sDelimiter = Q & "," & Q
Set  oFSO = CreateObject("Scripting.FileSystemObject")
Set oFile = oFSO.OpenTextFile("d:\test.txt")
While Not oFile.AtEndOfStream
   sLine = Replace(oFile.ReadLine, sDelimiter, Z)
   sLine = Replace(sLine, Q, "")
   sLine = Q & Replace(sLine, Z, sDelimiter) & Q
   WScript.Echo sLine
Wend


    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Pegasus [MVP]  
View profile  
 More options Nov 16 2009, 9:15 pm
Newsgroups: microsoft.public.scripting.vbscript
From: "Pegasus [MVP]" <n...@microsoft.com>
Date: Mon, 16 Nov 2009 22:15:54 +0100
Local: Mon, Nov 16 2009 9:15 pm
Subject: Re: Remove extra Double codes from CSV Files (vb Script Language)
Strange - my initial reply appears to have disappeared into the never-never.
I'll try to post it once more:

You could do it like this for every line of text:
1. Replace every "," with your own delimiter. I use chr(0).
2. Remove all remaining double quotes.
3. Replace your own delimiters with ",".
4. Surround the line with double quotes.
This code appears to work for your data sample:

Q = """"
Z = Chr(0)
sDelimiter = Q & "," & Q
Set  oFSO = CreateObject("Scripting.FileSystemObject")
Set oFile = oFSO.OpenTextFile("d:\test.txt")
While Not oFile.AtEndOfStream
   sLine = Replace(oFile.ReadLine, sDelimiter, Z)
   sLine = Replace(sLine, Q, "")
   sLine = Q & Replace(sLine, Z, sDelimiter) & Q
   WScript.Echo sLine
Wend


    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Al Dunbar  
View profile  
 More options Nov 17 2009, 3:48 am
Newsgroups: microsoft.public.scripting.vbscript
From: "Al Dunbar" <aland...@hotmail.com>
Date: Mon, 16 Nov 2009 20:48:10 -0700
Local: Tues, Nov 17 2009 3:48 am
Subject: Re: Remove extra Double codes from CSV Files (vb Script Language)

"Learn4Develop" <Learn4Deve...@discussions.microsoft.com> wrote in message

news:C072F055-9B82-4D2D-A4D2-216403F47814@microsoft.com...

> On daily bases I have received files with double quotes comma separated,
> but
> in some records I found 2 times double quotes and some time different
> squence
> of double quotes placements.

> I just want to remove all unnecessary (extra) double quotes (see the
> following values where i want to remove extra double quotes.

> "0123x",""Company D-Val"","Class D, sector N","DD5894"
> "4894D",""Recycle" Rubbish, C class","Class D, Sector F, Block N","D870GH"
> "AB8679",""AB Ltd"",""Need" Clean Drive Way, stores","GF0347"

The rubbish line looks like exactly that. the first paired double quotes are
not matched by another such pair. In the context of your data, what does it
mean to have a doubled double-quote?

> I am looking the output like that.

> "0123x","Company D-Val","Class D, sector N","DD5894"
> "4894D","Recycle Rubbish, C class","Class D, Sector F, Block N","D870GH"
> "AB8679","AB Ltd","Need Clean Drive Way, stores","GF0347"

> Following code i used is work fine with ""Company D-Val"" but didn't work
> with ""Recycle" Rubbish, C class", The code is

> code:-
> Do While inFile.AtEndOfStream = False
> record = InFile.ReadLine()
> record = Replace(record, """", """") 'replace "" with single space

That replace call seems to change each double quote with exactly one double
quote.

Perhaps you just need to change each pair of double quotes with a single
double quote.

/Al


    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Todd Vargo  
View profile  
 More options Nov 17 2009, 4:16 am
Newsgroups: microsoft.public.scripting.vbscript
From: "Todd Vargo" <tlva...@sbcglobal.netz>
Date: Mon, 16 Nov 2009 23:16:34 -0500
Local: Tues, Nov 17 2009 4:16 am
Subject: Re: Remove extra Double codes from CSV Files (vb Script Language)

record = Replace(record, Chr(34)&Chr(34), Chr(34)) 'replace "" with single "

--
Todd Vargo
(Post questions to group only. Remove "z" to email personal messages)


    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Learn4Develop  
View profile  
 More options Nov 17 2009, 4:46 pm
Newsgroups: microsoft.public.scripting.vbscript
From: Learn4Develop <Learn4Deve...@discussions.microsoft.com>
Date: Tue, 17 Nov 2009 08:46:11 -0800
Local: Tues, Nov 17 2009 4:46 pm
Subject: Re: Remove extra Double codes from CSV Files (vb Script Language)

Hi Tom
  Thanks for the code, its working fine with the above criteria, but gives
me error if there is Null value between double quotes ("") i.e.

("AB8679","""",""Need" Clean Drive Way, stores","")

As i have some columns in CSV file where the value is Null "". How i can
resolve it.

Thanks again for your great help.


    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Pegasus [MVP]  
View profile  
 More options Nov 17 2009, 4:49 pm
Newsgroups: microsoft.public.scripting.vbscript
From: "Pegasus [MVP]" <n...@microsoft.com>
Date: Tue, 17 Nov 2009 17:49:34 +0100
Local: Tues, Nov 17 2009 4:49 pm
Subject: Re: Remove extra Double codes from CSV Files (vb Script Language)

"Learn4Develop" <Learn4Deve...@discussions.microsoft.com> wrote in message

news:70A7DE23-C1A4-417B-9B34-E0BA09CDF49C@microsoft.com...

Try my suggestion. It should beable to cope with empty fields.

    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Tom Lavedas  
View profile  
 More options Nov 17 2009, 6:52 pm
Newsgroups: microsoft.public.scripting.vbscript
From: Tom Lavedas <tglba...@cox.net>
Date: Tue, 17 Nov 2009 10:52:59 -0800 (PST)
Local: Tues, Nov 17 2009 6:52 pm
Subject: Re: Remove extra Double codes from CSV Files (vb Script Language)
On Nov 17, 11:46 am, Learn4Develop

Try this (though Pegusus' approach is also workable) ...

Do Until inFile.AtEndOfStream
  record = InFile.ReadLine
  nPos = Instr(record, Chr(34) & Chr(34))
  do while nPos > 0
    record = Left(record, nPos) & Mid(record, nPos + 2)
    nPos = Instr(nPos + 1, record, Chr(34))
    if nPos > 0 then
      record = Left(record, nPos - 1) & Mid(record, nPos + 1)
      nPos = Instr(nPos + 1, record, Chr(34) & Chr(34))
    else
      record = record & Chr(34)
    end if
  loop
  outFile.WriteLine record
Loop
_____________________
Tom Lavedas


    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Christoph Basedau  
View profile  
 More options Nov 17 2009, 11:31 pm
Newsgroups: microsoft.public.scripting.vbscript
From: Christoph Basedau <cbase...@hotmail.de>
Date: Wed, 18 Nov 2009 00:31:32 +0100
Local: Tues, Nov 17 2009 11:31 pm
Subject: Re: Remove extra Double codes from CSV Files (vb Script Language)
Learn4Develop schrieb:

If you want to implement the appropriate algorithm, you cannot
use a replace-mechanism based on literals, you always have to be
aware of the quoting and delimiting context.
The right way, j2mc, is to parse the string char by char, count the
DQs and check whether a DQ is followed by a delimiter.
If the delimiter appears after an even number of DQs it truly terminates
the token, if the number is odd it's an "inline-delimiter".
A DQ on the other hand is kept if its the first one (obvious,
as opener) or one with an even ordinal number followed by the delimiter
(terminator).

Christoph

In VBS a function that follows these rules would like this:

'---------------------
Function unquote(record)

        Const DQ = """"
        Const CM = ","

        Dim newRecord
        Dim dqCount
        Dim char
        Dim nextChar
        Dim i
        Dim keep

        newRecord = ""
        dqCount = 0

        For i = 1 To Len(Record)

                char = Mid(record, i, 1)
                nextChar = Mid(record, i+1, 1)
                keep = 0

                If char = DQ Then
                        If dqCount = 0 Then
                                'beginning of token
                                keep = 1
                                dqCount = 1

                        ElseIf (dqCount Mod 2 = 1) And _
                                (nextChar = CM  OR nextChar = "") Then

                                'end of token marked by ", OR "\r\n
                                keep = 1
                                dqCount = 0                            
                        Else
                                'inline "
                                keep = 0
                                dqCount = dqCount + 1
                        End If
                Else
                        'char other then "
                        keep = 1
                End If
                If keep = 1 Then
                        newRecord = newRecord & char
                End If
                'WSH.Echo dqCount, i, keep, char, nextchar, newRecord
        Next
        unquote = newRecord
End Function
'---------------------

To test the results, run:

'---------------------
Option Explicit
Dim records
records =  Array ( _
"""0123x"",""""Company D-Val"""",""Class D, sector N"",""DD5894""" _
, """4894D"",""""Recycle"" Rubbish, C class"",""Class D, Sector F, Block
N"",""D870GH""" _
, """AB8679"",""""AB Ltd"""",""""Need"" Clean Drive Way,
stores"",""GF0347""" _
, """""AB8679"", Test New"",""""""AB Ltd"""""",""""Need"" Clean Drive
Way, stores"",""GF0347""" _
)

Dim Record

For Each Record in records
        WSH.Echo record
        WSH.Echo unquote(record)
        WSH.Echo
Next

Function unquote(record)
...
End Function
'---------------------


    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Todd Vargo  
View profile  
 More options Nov 17 2009, 11:50 pm
Newsgroups: microsoft.public.scripting.vbscript
From: "Todd Vargo" <tlva...@sbcglobal.netz>
Date: Tue, 17 Nov 2009 18:50:30 -0500
Local: Tues, Nov 17 2009 11:50 pm
Subject: Re: Remove extra Double codes from CSV Files (vb Script Language)

Learn4Develop wrote:
> Hi Tom
>   Thanks for the code, its working fine with the above criteria, but gives
> me error if there is Null value between double quotes ("") i.e.

> ("AB8679","""",""Need" Clean Drive Way, stores","")

> As i have some columns in CSV file where the value is Null "". How i can
> resolve it.

> Thanks again for your great help.

ISTM, only strings containing a comma  should need quoted. But since you say
it gives you an error, it begs the questions. What is giving the error? And
what is creating these incorrect csv files?

--
Todd Vargo
(Post questions to group only. Remove "z" to email personal messages)


    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Paul Randall  
View profile  
 More options Nov 18 2009, 4:08 am
Newsgroups: microsoft.public.scripting.vbscript
From: "Paul Randall" <paulr...@cableone.net>
Date: Tue, 17 Nov 2009 21:08:00 -0700
Subject: Re: Remove extra Double codes from CSV Files (vb Script Language)

"Learn4Develop" <Learn4Deve...@discussions.microsoft.com> wrote in message

news:C072F055-9B82-4D2D-A4D2-216403F47814@microsoft.com...

I agree with Christoph that either the mechanism that creates this
non-standard quoting of a CSV file should be made more robust, or a more
robust mechanism should be used to read the poorly implimented CSV file.
Perhaps a simple VBScript using ADO as the robust file reading mechanism
could be used.

-Paul Randall


    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »

Create a group - Google Groups - Google Home - Terms of Service - Privacy Policy
©2010 Google