您现在的位置：程式師世界 >> 編程語言 > >> 更多編程語言 >> Python

A Lexical Analyzer for High-level Languages Based on Python

編輯：Python

1. Requirements Analysis

Requirement: Describe the function to be completed by the lexical analysis system

Design and implement a lexical analyzer for high-level languages. The basic functions are as follows:

The following types of words are recognized:
- Identifier (consisting of uppercase and lowercase letters, numbers, and underscores, but must start with a letter or underscore)
- Keywords (① Type keywords: integer, floating point, boolean, record; ② if and else in branch structure; ③ do and while in loop structure; ④ procedure declaration and callkeywords)
- Operators (① arithmetic operators; ② relational operators; ③ logical operations)
- Delimiter (① Delimiter used in assignment statement, such as "="; ② Delimiter used at the end of sentence, such as ";"; ③ Delimiter used in array representation, such as "["and "]"; ④ delimiter "." for floating point number representation)
- Constants (unsigned integers (including octal and hexadecimal numbers), floating point numbers (including scientific notation), string constants, etc.)
- Comment (/…/ form)
Able to perform simple error handling, i.e. identify illegal characters in test cases.When the program outputs the error message, it needs to output the specific error type (ie lexical error), the location of the error (source program line number) and the relevant description text, the format is:

Lexical error at Line [line number]: [description text].

There are no specific requirements for the content of the description text (for example: illegal characters), but the error type and line number of the error must be correct, because this is the only criterion for judging whether the output error message is correct.

The input form of the system: It is required to be able to import test cases through files.Test cases should cover the types of words listed in "Experimental Content".
The output form of the system: print out the token sequence corresponding to the test case.

2. Grammar Design

Requirements: Expand a description of the following content

Give a description of the lexical rules (regular grammar or regular expressions) for each type of word

Identifier:

[_ | [a-z]][\w*]

Keywords:

r'((auto){1}|(double){1}|(int){1}|(if){1}|' \r'(#include){1}|(return){1}|(char){1}|(stdio\.h){1}|(const){1})'

Operator:

r'(\+\+|\+=|\+|--|-=|-|\*=|/=|/|%=|%)'

Delimiter:

r'([,:\{}:)(<>])'

Constant:

r'(\d+[.]?\d+)'

Translation diagram of various words

The rest of the word conversion diagrams are simpler

Constant:

3. System Design

Requirements: It is divided into system outline design and system detailed design.

System outline design: Provide the necessary macro-level design diagrams of the system, such as system frame diagrams, data flow diagrams, function module diagrams, etc., as well as corresponding text descriptions.

Function modules:

Detailed system design: expand the description of the following work

Design of core data structure

Lists using Python list[]

Main function function description

def is_blank(self, index):Determine whether it is a whitespace character

def skip_blank(self, index):skip whitespace

def is_keyword(self, value):Determine whether it is a keyword

def main(self):The main program of lexical analysis

Program flow chart of the core part of the program

4. System implementation and result analysis

Requirements: Expand a description of the following.

Problems encountered during system implementation;

The system's recognition of hexadecimal numbers is not taken into account.

The solution is to judge whether the first number of the constant is 0 when judging the constant, then judge whether the following letter is X, if so, judge whether the following string is a series of 0-9 orA-F, if it is, the word is considered constant.