| Hong-Phuc Bui on Wed, 29 Oct 2025 14:27:23 +0100 |
[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]
| Re: [EXTERN] Re: Re: Re: Re: Definition of tokens in GP language |
Hi, I took a look into the file lang.l. I think here is discrepancy of the lexical rules and what the gp prompt does: The rule HEXINTEGER 0x[0-9A-Za-z]([0-9A-Za-z ]*[0-9A-Za-z])? does accpets 0x1 2 3 zzz. But gp prompt does not: ? 0x1 2 3 zzz *** syntax error, unexpected variable name, expecting end of file: 0x123zzz *** I would change the rule to be HEXINTEGER 0x[0-9A-Fa-f]([0-9A-Fa-f ]*[0-9A-Fa-f])? Best wish Hong-Phuc On 27/10/2025 19:15, Bill Allombert wrote:
On Mon, Oct 27, 2025 at 04:52:18PM +0100, Hong-Phuc Bui wrote:Hi, I'm there again :) Thanks for sharing information. It motivates me more to write a lexer for Pygments. Now I'm reading both files: the file lang.l in gp2c-repository and the function pari_lex() in the file anal.c. If I understand correctly, I have two choices: 1) The function pari_lex() works fully correctly and can handle all conner cases in GP language, but it's written by hand. => Porting to Python is not as easy as I wish (well writing a lexer was never easy :)). 2) The generated lexer from lang.l can now also handle all conner cases, but is not yet solid-rock as the function pari_lex() for now. => Porting in python, for example by using PLY[1] or RegexLexer[2] with State-Management, may be easier, but the Python lexer may not handle all corner cases?Well hopefuly, I should be able to fix the lex parser if we find other bugs. But how you define a token depends on how you want to use it. Will you feed them to a parser or will you them directly as a base for syntax hilighting ? There are special constructs that are not handled as token for parsing purpose but are semantically tokens: Some time <- is 'less minus' some time it is 'left_arrow' depending whether there is a preceding | [a|b<-c] tokens are [ a | b < - c ] but this is to be understood as [ a | b <- c ] while [a,b<-c] tokens are [ a , b < - c ] but this is just [ a , b < - c ] In the other direction )-> is a token but for most purpose it should be read as ) 'right_arrow' so that it matches the previous ( Cheers, Bill.
-- Hochschule für Technik und Wirtschaft des Saarlandes University of Applied Sciences Fakultät für Ingenieurwissenschaften School of Engineering Hong-Phuc Bui, M.Sc. Informatik Campus Alt-Saarbrücken Goebenstraße 40 66117 Saarbrücken +49 (0) 681 58 67 - 804 hong-phuc.bui@htwsaar.de www.htwsaar.de