Topics

Stripping LF's from end-of-record

Martin G0HDB
 

Hello again all, I hope someone can help me solve a problem that has me baffled and frustrated...

I'm still trying to do things with my amateur radio logbook files, in plain ol' ASCII, that comprise multiple records each of which terminates in a CR/LF pair.  Each record in a file is exactly 292 bytes long; this total includes the CR (0D) and LF (0A) terminator characters.

I have a very simple routine that I'm trying to use to count the number of records in a file; the routine includes the code to strip the LF character from each line as per the example given on the 'Reading and writing plain text files' Wikispaces web page and is as follows:

eqflog%=OPENIN "*.LOG" : REM This allows the user to select the log file to use
linecount = 0
LF = 10
REPEAT
  INPUT#eqflog%,qsoline$
  REM Now strip off the LF character on the end of each line
  IF ASC(qsoline$) = LF THEN qsoline$ = MID$(qsoline$,2)
  linecount = linecount + 1
  REM Loop round until end-of-file for the input file
UNTIL EOF#eqflog%
PRINT "Number of records in the log file is ",linecount
CLOSE#eqflog%
END

The problem I'm having is that this seemingly very rudimentary bit of code always gives a line count that is one more than the actual number of lines in the input file - it doesn't matter if the input file has 3 or 25,713 lines in it!  In these cases, the routine above always gives line-count results of 4 and 25,714 respectively.

I've checked the sizes of the input files I'm trying to use and they're all *exact* multiples of the 292 byte record length - for example, the file that contains 25,713 records is 7,508,196 bytes long.  When I examine this file, or any other of the input files, using a hex editor I can see that every file terminates correctly with the CR/LF (0D 0A) pair, so there are no spurious or additional LFs to confuse things.

I've also found that if I use the hex editor to remove the final LF from a file then the little line-counting routine above gives the correct answer for the number of lines in the input file.

I'm sure there's a very straightforward explanation for why the little routine above is giving the incorrect line-count value (of actual + 1) every time but I'm blowed if I can see where the problem lies, so I'll be gratefu if someone can point out what I'm doing wrong...  :-)

Thanks in advance,

--
Martin

Richard Russell
 

On Thu, Oct 5, 2017 at 01:43 pm, Martin G0HDB wrote:
Each record in a file is exactly 292 bytes long; this total includes the CR (0D) and LF (0A) terminator characters.
I have a very simple routine that I'm trying to use to count the number of records in a file;
You are over-complicating things!  You know that the records are all the same length (in this case 292 bytes) so the calculation of the number of records is simply this:

      record_length = 292
      number_of_records = EXT#file DIV record_length

There is no need to read the file sequentially simply to determine the number of records!

Richard.

 

On Thu, 5 Oct 2017, at 21:43, Martin G0HDB wrote:
eqflog%=OPENIN "*.LOG" : REM This allows the user to select the log file
to use
linecount = 0
LF = 10
REPEAT
  INPUT#eqflog%,qsoline$
  REM Now strip off the LF character on the end of each line
  IF ASC(qsoline$) = LF THEN qsoline$ = MID$(qsoline$,2)
  linecount = linecount + 1
  REM Loop round until end-of-file for the input file
UNTIL EOF#eqflog%
PRINT "Number of records in the log file is ",linecount
CLOSE#eqflog%
END
I'm sure there's a very straightforward explanation for why the little
routine above is giving the incorrect line-count value (of actual + 1)
every time but I'm blowed if I can see where the problem lies, so I'll be
gratefu if someone can point out what I'm doing wrong...  :-)
Of course Richard is right; division is the quick way to do this, for
files
with all their records having the same length.

Otherwise, the way to solve a probelm like this is to test your code
with
an empty file, a one line file, a two line file etc and print out what
you get
each time around the loop.

Consider a file that contains "3 lines" which you think of as this:

line one CR LF
line two CR LF
line three CR LF

The first read reads up until the CR, ie reads: line one CR
and stores in the qsoline$ variable "line one"

The next read reads up until the next CR ie: LF line two CR
and stores: "LF line two"

The next read reads up until the next CR ie: LF line three CR
and stores "LF line three"

The next read I expect just reads the final LF. There's no CR for
it to see but (I assume) it won't read past the end of file pointer.
Sine you remove the LF from the string I expect you see an
empty last line in the file.

If you've removed the file's final LF then the fourth read wouldn't
happen at all.

The problem is that INPUT# is not reading right up to what you regard
as the end of a line. Really the layout of the "three line" file (as
far as
INPUT# goes) is

line one CR
LF line two CR
LF line three CR
LF

- so 4 lines, not 3.

--
Jeremy Nicoll - my opinions are my own.

Richard Russell
 

On Thu, Oct 5, 2017 at 03:02 pm, Jeremy Nicoll wrote:
The problem is that INPUT# is not reading right up to what you regard
as the end of a line.
Indeed so.  It's arguable that INPUT isn't the best statement to use here, because it expects a CR termination whereas the last character in each record is LF.  In 'traditional' BBC BASIC there isn't a statement or function that reads until a LF character (you'd need to read the file one byte at a time using BGET, which could be slow)  but of course BB4W and BBCSDL have no such limitation:

      record$ = GET$#file TO &A

Richard.

Martin G0HDB
 

Very many thanks Richard and Jeremy, that's extremely helpful in several ways.

Firstly, the explanation of why using the INPUT# function is giving the wrong result for the linecount value is pretty obvious now you've explained it - I'd surmised the problem was something like that but hadn't realised or fully appreciated that the final LF terminator was being treated as, and counted as, an extra line in the file.

As Richard has pointed out, calculating the number of lines in the input file using the DIV method is less complicated, and faster, than doing the sequential line count - I'll adopt this method in my program, although the speed isn't a particular issue.

Finally, and most importantly, changing from using the INPUT# function to the GET$# TO &A function will almost certainly overcome another problem I was having in the main part of my program, which reads the contents of the 30 separate fields of differing but fixed lengths in each line in the input file and then uses the field contents to create 'ADIF' records in an ouptut file - the procedure always created a blank final ADIF record in the output file because of the existence of the final LF character.  I'm confident that changing to the GET$# function will get rid of this anomaly.

Many thanks again for the help and guidance!

--
Martin

Richard Russell
 

On Fri, Oct 6, 2017 at 02:52 am, Martin G0HDB wrote:
I'm confident that changing to the GET$# function will get rid of this anomaly
It would perhaps be appropriate to remind people of the FNreadline() function listed at the Wiki which will correctly read a record from a 'plain text' file irrespective of whether it uses Windows-style (CRLF), Linux-style (LF), ancient-Mac-style (CR) or even the rarely-encountered LFCR line terminations.  This can be useful when wanting to read data from a text file from an 'unknown' source.

Richard.

Neil Murray
 

Another situation to be aware of if you create or edit your input file with, for example Notepad, is that you remember to use the 'enter key' at the end of the last line before saving, otherwise your last record will not have a termination character at all. No CR/LF.

Martin G0HDB
 

Hi folks, just a quick update and to say thank you again for the advice and guidance provided, especially by Richard.

I'm pleased to report that using the GET$# function works beautifully in my program, which no longer creates a blank ADIF record for the final LF in the input file.

I haven't tried the FNreadline() function; I might have a play with it sometime for future reference!

Also, thanks to Neil for the reminder that a file edited with eg. Notepad must include a final CR/LF.  I don't envisage ever having a need to edit any of the input files that my program uses, but if ever I do I'll bear in mind the need to hit 'Enter' at the end of the final line in the file.

--
Martin

Richard Russell
 

On Sun, Oct 8, 2017 at 03:17 am, Martin G0HDB wrote:
I'm pleased to report that using the GET$# function works beautifully in my program
Hopefully this is obvious, but if you use GET$#file TO &A with a file having CRLF terminations, the resulting string will contain a CR as its last character (since you have told it to read everything up to, but excluding, the LF).  This can easily be remedied using, for example, LEFT$(GET$#file TO &A) which strips off the last character - a useful special case of LEFT$ - but is something to be borne in mind.

Richard.

J.G.Harston
 

Richard Russell wrote:
It would perhaps be appropriate to remind people of the FNREADLINE()
function listed at the Wiki [1] which will correctly read a record
Catching up on this, some time ago I used the agnostic readline function in code to translate the end-of-line characters in a text file into whatever I specified. The core code is:


REM in$=input file
REM out$=output file
REM eol%: 1=CRLF, 2=LFCR, 3=CR, 4=LF
in%=OPENIN(in$):IF in%=0:PRINT"File '"in$"' not found":PROCexit(214)
out%=OPENOUT(out$):IF out%=0:PRINT"Can't save '"out$"'":PROCexit(192)
REPEAT
BPUT#out%,FNrd(in%);
IF eol% AND 1:BPUT#out%,13
IF eol% AND 6:BPUT#out%,10
IF eol%=2:BPUT#out%,13
UNTIL EOF#in%
CLOSE#out%:out%=0
CLOSE#in%:in%=0
PROCexit(0)


DEFFNrd(I%):LOCAL A$
A$=GET$#I%:IFA$="":IFPTR#I%>1:PTR#I%=PTR#I%-2:IFBGET#I%<>BGET#I%:A$=GET$#I%
=A$

--
J.G.Harston - jgh@... - mdfs.net/jgh

Martin G0HDB
 

Hi again Richard, thanks for the reminder that when using GET$#file TO &A the resulting string will still contain a CR as its last character.  I had realised this and wondered what the effect might be, but it doesn't seem to have any because the rest of my program only reads up to byte 290 in the string that's now 291 bytes long, including the final CR.

Nevertheless, to be on the safe side I might adopt the remedy of using LEFT$(GET$#file TO &A) to strip off the final CR so that the input string will only ever be 290bytes long and won't contain any control characters; I'll have a play with it to see what happens and if it makes any difference to the rest of my program.

Thanks again,

--
Martin

Previous Topic Next Topic