
-《lex & yacc 2nd》:下载地址参考 http://blog.csdn.net/a_flying_bird/article/details/52486815





lex文件通常使用的后缀名: .l, .ll, .lex。——实际上,可以是任意的名称。


文件内容分为三部分,各个部分之间以 %% 分隔:

%{/* part 1: Definition Section. e.g.: Global declaration of C. */
%}%%/* part 2: Rules section. Rule = Pattern + Action. */%%part 3: C codes.

注意,%} 不要写成 }% 了,否则 premature EOF。

%{ 和 %} 之间的内容会原封不动地拷贝到最后生成的c文件中,所以这里可以是任何合法的C代码。通常而言,这里放lex文件后面C代码要用到的一些东西。



lex simplest.l
gcc lex.yy.c -ll -o test



对应En Page 2.


.|\n ECHO;


$ ls
$ lex simplest.l
$ ls
lex.yy.c    simplest.l
$ gcc lex.yy.c -ll -o test
$ ./test
The simplest lex program. ------ 键盘输入内容
The simplest lex program. ------ 程序回显结果

识别单词 Recognizing words

这个例子可以识别指定的这些单词,其他的不认识的直接回显。- 对应原书 ch1-02.l


%{/** this sample demonstrates (very) simple recognition:* a verb, or not a verb.*/%}%%[\t ]+   /* ignore whitespace */ ;is |
am |
are |
were |
was |
be |
being |
been |
do |
does |
did |
will |
would |
should |
can |
could |
has |
have |
had |
go     {printf("%s: is a verb\n", yytext);}[a-zA-Z]+ {printf("%s: is not verb\n", yytext);}.|\n  {ECHO; /* normal default anyway */ }%%int main()
{yylex();return 0;


$ lex recoginzing_word.l
$ gcc lex.yy.c -ll -o test
$ ./test
I am a student. You are a teacher. ------ 键盘输入内容
I: is not verb
am: is a verb
a: is not verb
student: is not verb
.You: is not verb
are: is a verb
a: is not verb
teacher: is not verb


lex文件的三部分:definition section, rules section, user subroutines section.

definition section可以有一段”%{“和”%}”,这中间用来放C代码,比如#include,函数原型,全局变量等等。在由lex生成lex.yy.c的时候,这部分原封不动拷贝到C文件中。

rules section: 每个规则由两部分组成,即 pattern + action. 两者由空格分开。其中pattern是正则表达式语法。lexer在识别到某个pattern后,就会执行其对应的action。——action: { C codes. }

user subroutines section: 拷贝到.c文件的最后。


  • “;”: 同C语言的空余句,即什么也不做。——直接忽略这些输入
  • “ECHO;”: 缺省行为,将匹配的字符串打印到输出文件中(stdout,回显)。
  • “|”: 使用下一个pattern的action。——注意 | action的语法,会在pattern后面有一个空格。而作为正则表达式的|则不会有空格。

注意1: ;和ECHO;的区别:前者是忽略输入,后者是打印到输出。可以将示例中的ECHO;改成;后观察输出的变化情况。

注意2: | action不能像下面这种方法写到同一行:

had | go     {printf("%s: is a verb\n", yytext);}


  • yytext: 存储的是匹配到的字符串,其类型可以在生成的.c中看到,即 extern char *yytext;

无歧义规则:每个输入仅匹配一次 + 最长匹配。英文描述如下:

  1. Lex patterns only match a given input characer or string once.
  2. Lex executes th action for the longest possible match for the current input.





quit {printf("Program will exit normally.\n"); return 0;}

注意:这句话写到a-zA-Z]+的前面,否则 warning, rule cannot be matched。



[\t ]+  {printf("%s: white space\n", yytext);}
.|\n  {printf("%s: Invalid word\n", yytext);}


I am a student. You are a teacher. !@#$%^&*
I: is not  verb: white space
am: is a verb: white space
a: is not  verb: white space
student: is not  verb
.: Invalid word: white space
You: is not  verb: white space
are: is a verb: white space
a: is not  verb: white space
teacher: is not  verb
.: Invalid word: white space
!: Invalid word
@: Invalid word
#: Invalid word
$: Invalid word
%: Invalid word
^: Invalid word
&: Invalid word
*: Invalid word: Invalid word


对应 ch1-03.l



%{/** this sample demonstrates (very) simple recognition:* a verb, or not a verb.*/%}%%[\t ]+   {printf("%s: white space\n", yytext);}is |
am |
are |
were |
was |
be |
being |
been |
do |
does |
did |
will |
would |
should |
can |
could |
has |
have |
had |
go     {printf("%s: is a verb\n", yytext);}very |
simple |
gently |
quietly |
calmly |
angrily  {printf("%s: is an adverb\n", yytext);}to |
from |
behind |
above |
below |
between {printf("%s: is a preposition\n", yytext);}if |
then |
and |
but |
or {printf("%s: is a conjunction\n", yytext);}their |
my |
your |
his |
her |
its {printf("%s: is a adjective\n", yytext);}I |
you |
he |
she |
we |
they {printf("%s: is a pronoun\n", yytext);}QUIT {printf("Program will exit normally.\n"); return 0;}[a-zA-Z]+ {printf("%s: don't recognize\n", yytext);}.|\n  {printf("%s: Invalid word\n", yytext);}%%int main()
{yylex();return 0;


he is a student. and he is a teacher. QUIT (ENTER)
he: is a pronoun: white space
is: is a verb: white space
a: don't recognize: white space
student: don't recognize
.: Invalid word: white space
and: is a conjunction: white space
he: is a pronoun: white space
is: is a verb: white space
a: don't recognize: white space
teacher: don't recognize
.: Invalid word: white space
Program will exit normally.

动态定义单词表 lexer with symbol table

对应 ch1-03.l, 这个例子说明如何在lex中写更复杂的C代码。



noun dog cat horse cow
verb chew eat lick


  • 定义:即首字母表示词性,接下来是一系列属于该词性的单词;
  • 识别:同前一个例子,要求识别出每个单词的词性。


%{#include <stdbool.h>
#include <string.h>
#include <stdio.h>
#include <stdlib.h>/** Word recognizer with a symbol table.*/enum {LOOKUP = 0, /* default - looking rather than defining. */VERB,ADJ,ADV,NOUN,PREP,PRON,CONJ
};int state; // global variable, default to 0(LOOKUP).bool add_word(int type, char *word);
int lookup_word(char *word);%}%%[\t ]+ ; /* ignore whitespace */\n {state = LOOKUP;} // end of line, return to default state./* Whenever a line starts with a reserved part of speech name *//* start defining words of that type */
^verb {state = VERB;}
^adj {state = ADJ;}
^adv {state = ADV;}
^noun {state = NOUN;}
^prep {state = PREP;}
^pron {state = PRON;}
^conj {state = CONJ;}/* a normal word, define it or look it up */
[a-zA-Z]+ {if (state != LOOKUP) {/* define the current word */add_word(state, yytext);} else {switch(lookup_word(yytext)) {case VERB: printf("%s: verb\n", yytext); break;case ADJ:  printf("%s: adjective\n", yytext); break;case ADV:  printf("%s: adverb\n", yytext); break;case NOUN: printf("%s: noun\n", yytext); break;case PREP: printf("%s: preposition\n", yytext); break;case PRON: printf("%s: pronoun\n", yytext); break;case CONJ: printf("%s: conjunction\n", yytext); break;default:printf("%s: don't recognize\n", yytext);break;}}}[,:.] {printf("%s: punctuation, ignored.\n", yytext);}
. {printf("%s: invalid char\n", yytext);}%% int main()
{yylex();return 0;
}/* define a linked list of words and types */
struct word {char *word_name;int word_type;struct word *next;
};struct word *word_list; /* first element in word list */bool add_word(int type, char *word)
{struct word *wp; // wp: word pointer if (lookup_word(word) != LOOKUP) {printf("!!! warning: word %s already defined.\n", word);return false;}/* word not there, allocate a new entry and link it on the list */wp = (struct word*)malloc(sizeof(struct word));wp->next = word_list;wp->word_name = (char*)malloc(strlen(word) + 1);strcpy(wp->word_name, word);wp->word_type = type;word_list = wp;return true;
}int lookup_word(char *word)
{struct word *wp = word_list;for (; wp; wp = wp->next) {if (strcmp(wp->word_name, word) == 0) {return wp->word_type;}}return LOOKUP;


  1. 状态。缺省是LOOKUP状态,即对当前输入行的每个单词,在词库/链表中查找其词性(lookup_word),然后打印出来。但如果每一行的第一个单词是noun/verb等保留字,则说明要进入defining状态(细分为VERB等状态),保留字后续的各个单词将会添加到词库/链表中(add_word)。——在添加词库的时候,会先检查该单词是否已经入库。
  2. 类型:词库中,每个单词每个单词对应的词性用VERB等表示。


noun pet dog cat cats [ENTER]
verb is are [ENTER]
adj my his their [ENTER]
my pet is dog. their pets are cats. that's ok. [ENTER]
my: adjective
pet: noun
is: verb
dog: noun
.: punctuation, ignored.
their: adjective
pets: don't recognize
are: verb
cats: noun
.: punctuation, ignored.
that: don't recognize
': invalid char
s: don't recognize
ok: don't recognize
.: punctuation, ignored.



  • 词法分析:从输入字符流中识别出一个个单词,就是所谓的词法分析,输出是token。其关键就是定义词法规则(正则表达式);
  • 语法分析:在得到一个个单词(包括词性)之后,就是做更高级的分析,比如某些词连在一起是否构成了一个正确的句子。——各个token如何组合或搭配在一起。对于不同的token 组合执行不同的action。



  • 主语:(假定只能是)名词或代词,即 subject -> noun | pronoun
  • 宾语:(假定只能是)名词,即 object -> noun
  • 句子(主谓宾):谓语只支持动词形式,即 sentence -> subject verb object.





int yylex (void);



既然yacc和lex基于token通信,自然就需达成一致的规定。这就是所谓的token codes,即每一类token规定一个token code。在yacc&lex系统中,是由yacc来定义token codes,然后lex的代码include进来。具体地,

  1. 在yacc中用%token NOUN VERB语法定义token codes
  2. yacc -d test.y 会生成y.tab.c和y.tab.h两个文件,其中后者就包括了token codes的宏定义
  3. 在lex中include这个y.tab.h文件。

注:取值为0的token code表示结束输入(a logical end of input)。



%{/** We now build a lexical analyzer to be used by a higher-level parser.*/#include <stdbool.h>
#include <string.h>
#include <stdio.h>
#include <stdlib.h>#include "y.tab.h"#define LOOKUP 0 /* default - looking rather than defining. */int state; // global variable, default to 0(LOOKUP).bool add_word(int type, char *word);
int lookup_word(char *word);
const char* get_word_type(int type);
%}%%[\t ]+ ; /* ignore whitespace */\n {state = LOOKUP;} // end of line, return to default state.\.\n {state = LOOKUP;return 0; // end of sentence.}/* Whenever a line starts with a reserved part of speech name *//* start defining words of that type */
^verb {state = VERB;}
^adj {state = ADJECTIVE;}
^adv {state = ADVERB;}
^noun {state = NOUN;}
^prep {state = PREPOSITION;}
^pron {state = PRONOUN;}
^conj {state = CONJUNCTION;}/* a normal word, define it or look it up */
[a-zA-Z]+ {if (state != LOOKUP) {/* define the current word */add_word(state, yytext);} else {int type = lookup_word(yytext);printf("%s: %s\n", yytext, get_word_type(type));switch(type) {case VERB:case ADJECTIVE:case ADVERB:case NOUN:case PRONOUN:case PREPOSITION:case CONJUNCTION:return type;default://printf("%s: don't recognize\n", yytext);break; // don't return, just ignore it.}}}. {printf("%s: ----\n", yytext);} // ignore it%% /* define a linked list of words and types */
struct word {char *word_name;int word_type;struct word *next;
};struct word *word_list; /* first element in word list */bool add_word(int type, char *word)
{struct word *wp; // wp: word pointer if (lookup_word(word) != LOOKUP) {printf("!!! warning: word %s already defined.\n", word);return false;}/* word not there, allocate a new entry and link it on the list */wp = (struct word*)malloc(sizeof(struct word));wp->next = word_list;wp->word_name = (char*)malloc(strlen(word) + 1);strcpy(wp->word_name, word);wp->word_type = type;word_list = wp;return true;
}int lookup_word(char *word)
{struct word *wp = word_list;for (; wp; wp = wp->next) {if (strcmp(wp->word_name, word) == 0) {return wp->word_type;}}return LOOKUP;
}const char* get_word_type(int type)
{switch(type) {case VERB: return "verb";case ADJECTIVE: return "adjective";case ADVERB: return "adverb";case NOUN: return "noun";case PREPOSITION: return "preposition";case PRONOUN: return "pronoun";case CONJUNCTION: return "conjunction";default: return "unknown";}


/** A lexer for the basic grammer to use for recognizing English sentence.*/
#include <stdio.h>  extern int yylex (void);
void yyerror(const char *s, ...);
sentence: subject VERB object {printf("Sentence is valid.\n");};subject: NOUN|  PRONOUN;object:  NOUN;%%extern FILE *yyin;int main()
{//while(!feof(yyin)) {yyparse();//}
}void yyerror(const char *s, ...)
{fprintf(stderr, "%s\n", s);



/* A Bison parser, made by GNU Bison 2.3.  *//* Skeleton interface for Bison's Yacc-like parsers in C...This special exception was added by the Free Software Foundation inversion 2.2 of Bison.  *//* Tokens.  */
# define YYTOKENTYPE/* Put the tokens into the symbol table, so that GDB and other debuggersknow about them.  */enum yytokentype {NOUN = 258,PRONOUN = 259,VERB = 260,ADVERB = 261,ADJECTIVE = 262,PREPOSITION = 263,CONJUNCTION = 264};
/* Tokens.  */
#define NOUN 258
#define PRONOUN 259
#define VERB 260
#define ADVERB 261
#define ADJECTIVE 262
#define PREPOSITION 263
#define CONJUNCTION 264#if ! defined YYSTYPE && ! defined YYSTYPE_IS_DECLARED
typedef int YYSTYPE;
# define yystype YYSTYPE /* obsolescent; will be withdrawn */
#endifextern YYSTYPE yylval;


noun dogs
noun dog
verb is are
pron they it
it is dog.
it: pronoun
is: verb
dog: noun
Sentence is valid.
it is dog.
it: pronoun
syntax error



sentence: subject verb object {printf("Sentence is valid.\n");};subject: NOUN {printf("subject of a noun.\n");}|  PRONOUN {printf("subject of a pronoun.\n");};verb: VERB {printf("verb.\n");};object:  NOUN {printf("object of a noun.\n");};


noun dog
verb is
pron it
it is dog
it: pronoun
subject of a pronoun.
is: verb
dog: noun
object of a noun.
Sentence is valid.


noun dog dogs
verb is are
pron it they
it is dog they are dogs.
it: pronoun
subject of a pronoun.
is: verb
dog: noun
object of a noun.
Sentence is valid.
they: pronoun
syntax error


extern FILE *yyin;int main()
{while(!feof(stdin/*yyin*/)) {yyparse();}


$ ./test
noun dog
verb is
pron it
it is dog.
it: pronoun
is: verb
dog: noun
Sentence is valid.it is dog.
it: pronoun
is: verb
dog: noun
Sentence is valid.noun dogs
verb are
pron they
they are dogs.
they: pronoun
are: verb
dogs: noun
Sentence is valid.


int main()
{//while(!feof(stdin/*yyin*/)) {for (;;) {yyparse();}


$ ./test
noun dog
verb is
pron it
it is dog
it: pronoun
is: verb
dog: noun
Sentence is valid.it is dog
it: pronoun
syntax error
is: verb
syntax error
dog: noun



//extern FILE *yyin;int main()
{FILE* f = NULL;f = fopen("test.txt", "rb");if (NULL == f) {printf("Open file failed.\n");return 1;}printf("Open file successfully.\n");while(!feof(f)) {yyparse();}



extern FILE *yyin;int main()
{yyin = fopen("test.txt", "rb");if (NULL == yyin) {printf("Open file failed.\n");return 1;}printf("Open file successfully.\n");while(!feof(yyin)) {yyparse();}


noun dog dogs
verb is are
pron it theyit is dog.
they are dogs.


$ ./test
Open file successfully.
it: pronoun
is: verb
dog: noun
Sentence is valid.
they: pronoun
syntax error


对应 ch1-06.y


/** A lexer for the basic grammer to use for recognizing English sentence.*/
#include <stdio.h>  extern int yylex (void);
void yyerror(const char *s, ...);
%}%token NOUN PRONOUN VERB ADVERB ADJECTIVE PREPOSITION CONJUNCTION%%sentence: simple_sentence { printf("Parsed a simple sentence.\n"); }| compound_sentence { printf("Parsed a compound sentence.\n"); };simple_sentence: subject verb object {printf("simple sentence of type 1.\n");}|        subject verb object prep_phrase {printf("simple sentence of type 2.\n");};compound_sentence: simple_sentence CONJUNCTION simple_sentence {printf("compound sentence of type 1.\n");}|          compound_sentence CONJUNCTION simple_sentence {printf("compound sentence of type 2.\n");};subject: NOUN|  PRONOUN|  ADJECTIVE subject;verb:    VERB|  ADVERB VERB |  verb VERB;object:  NOUN|  ADJECTIVE object;prep_phrase:  PREPOSITION NOUN;%%extern FILE *yyin;int main()
{yyin = fopen("test.txt", "rb");if (NULL == yyin) {printf("Open file failed.\n");return 1;}printf("Open file successfully.\n");while(!feof(yyin)) {yyparse();}fclose(yyin);yyin = NULL;return 0;
}void yyerror(const char *s, ...)
{fprintf(stderr, "%s\n", s);


noun dog dogs China
verb is are
pron it they
adj pretty
conj and
prep init is a pretty dog and they are dogs in China and it is dog.


Open file successfully.
it: pronoun
is: verb
a: unknown
pretty: adjective
dog: noun
and: conjunction
simple sentence of type 1.
they: pronoun
are: verb
dogs: noun
in: preposition
China: noun
simple sentence of type 2.
compound sentence of type 1.
and: conjunction
it: pronoun
is: verb
dog: noun
.: ----
simple sentence of type 1.
compound sentence of type 2.
Parsed a compound sentence.


