Hong-Phuc Bui on Wed, 29 Oct 2025 14:27:23 +0100


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: [EXTERN] Re: Re: Re: Re: Definition of tokens in GP language


Hi,

I took a look into the file lang.l. I think here is discrepancy of the lexical rules and what the gp prompt does:
The rule

HEXINTEGER 0x[0-9A-Za-z]([0-9A-Za-z ]*[0-9A-Za-z])?

does accpets 0x1 2 3 zzz. But gp prompt does not:

? 0x1 2 3 zzz
  ***   syntax error, unexpected variable name, expecting end of file: 0x123zzz
  ***

I would change the rule to be

HEXINTEGER 0x[0-9A-Fa-f]([0-9A-Fa-f ]*[0-9A-Fa-f])?


Best wish
Hong-Phuc

On 27/10/2025 19:15, Bill Allombert wrote:
On Mon, Oct 27, 2025 at 04:52:18PM +0100, Hong-Phuc Bui wrote:
Hi, I'm there again :)

Thanks for sharing information. It motivates me more to write a lexer for Pygments.
Now I'm reading both files: the file lang.l in gp2c-repository and the function pari_lex() in the file anal.c.
If I understand correctly, I have two choices:

1) The function pari_lex() works fully correctly and can handle all conner cases in GP language,
but it's written by hand.
=> Porting to Python is not as easy as I wish (well writing a lexer was never easy :)).

2) The generated lexer from lang.l can now also handle all conner cases, but is not yet solid-rock as the function pari_lex() for now.
=> Porting in python, for example by using PLY[1] or RegexLexer[2] with
State-Management, may be easier, but the Python lexer may not handle all
corner cases?

Well hopefuly, I should be able to fix the lex parser if we find other bugs.

But how you define a token depends on how you want to use it. Will you feed
them to a parser or will you them directly as a base for syntax hilighting ?

There are special constructs that are not handled as token for parsing purpose
but are semantically tokens:

Some time <- is 'less minus' some time it is 'left_arrow' depending whether there is
a preceding |

[a|b<-c] tokens are             [ a | b < - c ]
but this is to be understood as [ a | b <- c ]
while
[a,b<-c] tokens are [ a , b < - c ]
but this is just    [ a , b < - c ]

In the other direction )-> is a token but for most purpose it should be read as ) 'right_arrow'
so that it matches the previous (

Cheers,
Bill.

--
Hochschule für Technik und Wirtschaft des Saarlandes
University of Applied Sciences

Fakultät für Ingenieurwissenschaften
School of Engineering

Hong-Phuc Bui, M.Sc.
Informatik

Campus Alt-Saarbrücken
Goebenstraße 40
66117 Saarbrücken

+49 (0) 681 58 67 - 804
hong-phuc.bui@htwsaar.de
www.htwsaar.de