Topics

Changing single ASCII characters in ASCII strings within files

Martin G0HDB
 

Hello, and a belated Happy New Year to all.

I've recently returned to fiddling with BB4W programs for manipulating my amateur radio logbook files and am unsure about how to go about implementing one particular function I want...

My 'input file' comprises a set of 'records' each of which comprises a single string of 290 ASCII characters.  I already have a program that opens the input file (using OPENIN), reads each record and then creates records derived from the sub-strings in the input file in a separate output file - thanks to the help given via this forum this all works beautifully.

I now want to be able to modify a single byte (always the same one) within some of the 290-byte records in the input file depending on whether or not certain strings in another file match a sub-string in the input file.

I assume I'll need to open the input file using the 'OPENUP' command instead of 'OPENIN' as I do at present, but beyond that I'm stuck!  If I read the first 290-byte record in the input file and do the checks for a matching string in the related file, what command(s) should I then use to change byte 205 in the 290-byte string from an ASCII "N" to an ASCII "E" (or whatever) before moving on the reading the next record in the input file and doing the same things all over again, all the way down to EOF of the input file? 

Should I use the BPUT# command, after calculating the position of the byte to be changed in the entire file from the number of the record that contains it plus the 205-byte offset from the start of the record, to change the byte from "N" to "E" (or possibly something else), or is there another (possibly better!) way of changing a byte within some of the 290-byte strings and then writing the modified input file back to disk?

My inexperience as a programmer is clearly evident(!), so all help and guidance anyone can give will be much appreciated.

Thanks in advance,

--
Martin

Richard Russell
 

On Mon, Jan 21, 2019 at 02:02 PM, Martin G0HDB wrote:
I now want to be able to modify a single byte (always the same one) within some of the 290-byte records in the input file
So you literally want to modify the existing file, not transfer its contents to another file with changes?  In that case you are exactly right: open the file with OPENUP, move the pointer (PTR#) to the location in the file you want to change, and BPUT# the new byte there (overwriting the old byte).  If the records are fixed-length and it's always the same byte in the record that you want to change, the calculation of where to position the pointer will be trivial.

J.G.Harston
 

Richard Russell wrote:
So you literally want to modify the existing file, not transfer its
contents to another file with changes?
And even then, if it's a one-off, I'd tend to output to a seperate file,
and then after checking replace the original. Eg:

in%=OPENIN(oldfile$)
out%=OPENOUT(newfile$)
REPEAT
A%=BGET#in%:IF A%=ASC"E" THEN A%=ASC"N"
BPUT#out%,A%
UNTIL EOF#in%
CLOSE#out%
CLOSE#in%

Manually inspect the resultant file. Delete old file, rename new file.

--
J.G.Harston - jgh@... - mdfs.net/jgh

Richard Russell
 

On Mon, Jan 21, 2019 at 03:03 PM, J.G.Harston wrote:
if it's a one-off, I'd tend to output to a seperate file,
The code you listed doesn't do what the OP asked for.

Anyway, if your concern is that the process be non-destructive of the original data, it would be far faster and easier to make a copy of the original file first (using *COPY) and then change the required bytes using the OPENUP, PTR# and BPUT# method I described.  It's wasteful to transfer the entire file through the interpreter when only a few bytes need to be changed.

Martin G0HDB
 

On Mon, Jan 21, 2019 at 02:18 PM, Richard Russell wrote:
So you literally want to modify the existing file, not transfer its contents to another file with changes?  In that case you are exactly right: open the file with OPENUP, move the pointer (PTR#) to the location in the file you want to change, and BPUT# the new byte there (overwriting the old byte).  If the records are fixed-length and it's always the same byte in the record that you want to change, the calculation of where to position the pointer will be trivial.
Hi Richard (and also J.G.), thanks for your inputs.  Yes, I think it will be 'preferable' to modify the original file rather than create a new one with the changes made but I'll keep thinking about the latter approach.

The original file currently holds approximately 30,000 records each of which is a fixed length of 292 bytes (290 data, all ASCII characters, plus the CR/LF terminators) and the byte to be tested in each record is no 205 so as explained by Richard the calculation of the pointer number is straightforward.  If byte 205 contains an ASCII "N" (or possibly is null) then I need to search another related file for strings that match some of the sub-strings in the record in the original file; if a string match is found then byte 205 in the original file needs to be changed to an "E" before moving on to the next record in the original file.  The number of changes to be made in the original file will usually be of the order of 200 but it might be fewer or occasionally perhaps as many as 300-400 - these numbers would make manual inspection of the changes a bit tedious!

Now that I believe I understand how to use the BPUT# command I just need to fathom out how to scan the second, related file for strings that match the sub-strings in the original file - the records in the second file (not created by me) aren't so tidily structured and the strings aren't of fixed lengths so there are some challenges to explore...!

Thanks again,

--
Martin

Richard Russell
 

On Tue, Jan 22, 2019 at 04:16 PM, Martin G0HDB wrote:
the records in the second file (not created by me) aren't so tidily structured and the strings aren't of fixed lengths so there are some challenges to explore...!
If you don't care about the context in which the strings appear, but only whether they appear somewhere in the file, then read the entire file into a string variable and search it with INSTR().  I expect you know how to read an entire file into a string variable, but for the record:

      file% = OPENIN(path$)
      entire$ = GET$#file% BY EXT#file%
      CLOSE #file%

Obviously this technique will only work if the entire file can be loaded into memory at once, which in practice means the file would need to be smaller than about 200 Mbytes (BB4W) or 100 Mbytes (BBCSDL).  A larger file than that would have to be searched sequentially, which could be very slow.

Martin G0HDB
 

On Tue, Jan 22, 2019 at 05:06 PM, Richard Russell wrote:
If you don't care about the context in which the strings appear, but only whether they appear somewhere in the file, then read the entire file into a string variable and search it with INSTR().  I expect you know how to read an entire file into a string variable, but for the record:

      file% = OPENIN(path$)
      entire$ = GET$#file% BY EXT#file%
      CLOSE #file%

Obviously this technique will only work if the entire file can be loaded into memory at once, which in practice means the file would need to be smaller than about 200 Mbytes (BB4W) or 100 Mbytes (BBCSDL).  A larger file than that would have to be searched sequentially, which could be very slow.
Hi again Richard, many thanks for the above.  Your statement in the first para about a sub-string extracted from the first, larger file appearing somewhere in the second file has made me realise that this won't necessarily be the case so I'm going to have to rethink the solution - I might have to extract the relevant sub-strings from each record in the second file and then search for the corresponding record in the first, larger file and then change the selected byte in the first file from "N" to "E" (or "Y").  I know that the sub-strings from the second file will definitely appear somewhere in the first file because the second file, which is created by a web-based app, will only ever contain records that correspond to entries in the first file.

Re: the file sizes, both the first, original file that I want to update and the second, related file are both (currently) <10Mbytes and are unlikely ever to grow to more than say 20-30Mbytes so would presumably both fit into memory.  However, I'm not unduly concerned about achieving ultimate speed because even if 'automating' the updating of the specific bytes in the first file takes a couple of mins it's going be an order or two of magnitude faster than manually updating the records in the original file using the program that created the file, which is what I've had to do to date!

--
Martin