- Interrogating and printing line numbers and records
- Posted by Walter Cohen on June 23rd, 2008
Hello.
I have a text file that contains records with certain known leading
identifiers. The records appear in groups or sets.
For instance the records in a group are as follows:
//X12j988..........
01k763o93......
02......
04.....
73......
74......
73.....
74.....
96.....
//X12k76988
01l023939
So, each group always starts with an '//X12'.
What I'm trying to do is identify the groups of records that have repeating
records within the group. So in the example I want to be able to flag the
group 'X12j988' and also print the repeating records which are 73...,
74...., 73...., 74....
I am able to identify the repeating records I'm looking for in the group
(i.e. 73, 74)
Any help is appreciated.
Thanks,
Walter
- Posted by foxidrive on June 23rd, 2008
On Mon, 23 Jun 2008 08:28:00 -0400, "Walter Cohen" <w_cohen@hotmail.com>
wrote:
The structure of the records and the makeup of characters is important - as
the code will often utilise the structure.
Can you provide a couple of records to work with?
You wanted to print them - to a printer or to the screen or to a file? Only
the duplicated records, or are the first 2 characters the only significant
characters?
- Posted by Walter Cohen on June 23rd, 2008
I'd like to be able to capture the records to a file.
Also, I'd like to be able to capture the group '//STX12' record as well as
the duplicate records including the records they were duplicated from.
So something like this:
//STX12.....
73...
74...
73...
74...
Here is some test data:
//STX12//888WINDOW DISTRIBUTORS 999999 L57383-99
VS1
0120050110999999 20050109L57383-99
02H73913006
04BM 008498003M
09PE9
100048053750000
30 +00000000020CA+000000000004.73000 UA
31007300090008 UI
3207300090008
44F
453.25Z STURDY LOCK 4/10
49+000004
73AF510 +0000000000028640 +0000000.50
74 02 -00000.7700 RATE/CA
73AB570 +0000000000020870 +0000000.45
74 02 -00005.6500 RATE/CA
30 +00000000460CA+000000000001.17000 UA
3100430009000W UI
320430009000W
44F
4566.75Z PUSH PINS 8/10
49+000004
73AF510 +0000000000012720 +0000002.77
74 02 -00002.7700 RATE/CA
73AB570 +0000000000012400 +0000001.65
74 02 -00001.6500 RATE/CA
30 +00000000240CA+000000000006.56000 UA
31004300095387 UI
3204300095387
44F
4555.25Z WINDOW DECALS 10/10
49+000004
88+0000000001589360 +0000000001589360 M
89 PINCPINC CC
91AT525 +0000000000055090
92 02 C/A 2% CUSTOMER ALLOWANCE
94+00000001421CA+000003673.2LB +0000000000000013+00000043750
//STX12//888STICKY APPLICATIONS 999999 S00203-16
VS1
"foxidrive" <gotcha@woohoo.invalid> wrote in message
news:8m6v545acehq2q00fbgv6q5f1ese7vn43g@4ax.com...
- Posted by billious on June 24th, 2008
"Walter Cohen" <w_cohen@hotmail.com> wrote in message
news:48603601$0$7331$607ed4bc@cv.net...
A few little things first.
Top-posting may be fine for emails, but it's bad netiquette and some
respondents will ignore top-postings. Always append your further comments to
the END of messages on usenet.
It's not clear what your structure and requirements are. You did say you
wanted a file output, but ignored the question about whether the first 2
characters in the line were the significant "duplicate" indicator.
You've still only provided one record, and it's not clear whether the "VS1"
is wrapped from the "//STX12" line or whether it's a line on its own.
FIND /N /V "" <filein.txt >fileout.txt
will conveniently produce a bracketed line-number which should clear up any
confusion about which lines are new and which wrapped.
It's not clear whether you want any output at all from record-groups that
contain no "duplicates"
No guarantees on this solution, but it appeared to work for me:
This solution developed using XP
It may work for NT4/2K
----- batch begins -------
[1]@echo off
[2]del outfile.txt 2>nul
[3]del "%temp%\xyzwq.txt" 2>nul
[4]set ybs=//
[5]set yds=
[6]for /f "tokens=*" %%i in (data.txt) do set ydl=%%~i&call
:nextf&>>"%temp%\xyzwq.txt" echo\%%i
[7]call :new
[8]goto :eof
[9]
[10]:nextf
[11]:: grab first 2 chars from Your Data Line
[12]set ydl=%ydl:~0,2%
[13]:: If they are "//" then this is a new item
[14]if "//"=="%ydl%" goto new
[15]:: See whether this two-character sequence occurs in Your
Detected-Strings
[16]echo " %yds%"|findstr /c:"%ydl%" >nul
[17]:: If not, add to the detected strings (first occurrence)
[18]if errorlevel 1 set yds=%yds% %ydl%&goto :eof
[19]:: If so, is a duplicate so it becomes one of Your Begin-Strings
[20]set ybs=%ybs% %ydl%
[21]goto :eof
[22]
[23]:new
[24]:: File will not be built for first "//" detected
[25]if not exist "%temp%\xyzwq.txt" goto :eof
[26]:: Don't know whether you want the "//" record for
[27]:: "no duplicated begin-strings" or not.
[28]if "%ybs%"=="//" goto notreq
[29]find "//" <"%temp%\xyzwq.txt" >>outfile.txt
[30]findstr /b /l "%ybs:~3%" <"%temp%\xyzwq.txt" >>outfile.txt
[31]:notreq
[32]set ybs=//
[33]set yds=
[34]del "%temp%\xyzwq.txt" 2>nul
[35]goto :eof
[36]
------ batch ends --------
Lines start [number] - any lines not starting [number] have been wrapped and
should be rejoined. The [number] that starts the line should be removed
The label :eof is defined in NT+ to be end-of-file but MUST be expressed as
:eof
If you DO want each "//..." line, even if there are no "duplicate" lines
following, then you would need to delete or comment-out [28] AND change [30]
to
[30]if not "%ybs%"=="//" findstr......
Actually, I developed this batch on the assumption that FINDSTR could accept
"//" as a search-string. FINDSTR appears to have problems with this - hence
the "%ybs:~3%" caper in [30]
It's actually not necessary to have "//" in ybs - had findstr been able to
handle it (it appears to interpret "//" as a switch-indicator) then it would
have been convenient and [29] would not be required.
This is a simplification since this FINDSTR quirk appears to be an obstacle:
----- batch begins -------
[1]@echo off
[2]del outfile.txt 2>nul
[3]del "%temp%\xyzwq.txt" 2>nul
[4]set ybs=
[5]set yds=
[6]for /f "tokens=*" %%i in (data.txt) do set ydl=%%~i&call
:nextf&>>"%temp%\xyzwq.txt" echo\%%i
[7]call :new
[8]goto :eof
[9]
[10]:nextf
[11]:: grab first 2 chars from Your Data Line
[12]set ydl=%ydl:~0,2%
[13]:: If they are "//" then this is a new item
[14]if "//"=="%ydl%" goto new
[15]:: See whether this two-character sequence occurs in Your
Detected-Strings
[16]echo " %yds%"|findstr /c:"%ydl%" >nul
[17]:: If not, add to the detected strings (first occurrence)
[18]if errorlevel 1 set yds=%yds% %ydl%&goto :eof
[19]:: If so, is a duplicate so it becomes one of Your Begin-Strings
[20]set ybs=%ybs% %ydl%
[21]goto :eof
[22]
[23]:new
[24]:: File will not be built for first "//" detected
[25]if not exist "%temp%\xyzwq.txt" goto :eof
[26]:: Don't know whether you want the "//" record for
[27]:: "no duplicated begin-strings" or not.
[28]if not defined ybs goto notreq
[29]find "//" <"%temp%\xyzwq.txt" >>outfile.txt
[30]findstr /b /l "%ybs%" <"%temp%\xyzwq.txt" >>outfile.txt
[31]:notreq
[32]set ybs=
[33]set yds=
[34]del "%temp%\xyzwq.txt" 2>nul
[35]goto :eof
[36]
------ batch ends --------