CHAPTER 11 Syntactic Parsing

Speech and Language Processing ed3 读书笔记

Syntactic parsing is the task of recognizing a sentence and assigning a syntactic structure to it. Parse trees are directly useful in applications such as grammar checking in word-processing systems: a sentence that cannot be parsed may have grammatical errors (or at least be hard to read). More typically, however, parse trees serve as an important intermediate stage of representation for semantic analysis (as we show in Chapter 15) and thus play an important role in applications like question answering and information extraction.

11.1 Ambiguity

Structural ambiguity occurs when the grammar can assign more than one parse to a sentence. Two common kinds of ambiguity are attachment ambiguity and coordination ambiguity.

A sentence has an attachment ambiguity if a particular constituent can be attached to the parse tree at more than one place.

In coordination ambiguity different sets of phrases can be conjoined by a conjunction like and.

Effective syntactic disambiguation algorithms require statistical, semantic, and contextual knowledge sources that vary in how well they can be integrated into parsing algorithms.

11.2 CKY Parsing: A Dynamic Programming Approach

The dynamic programming advantage arises from the context-free nature of our grammar rules — once a constituent has been discovered in a segment of the input we can record its presence and make it available for use in any subsequent derivation that might require it. This provides both time and storage efficiencies since subtrees can be looked up in a table, not reanalyzed.

11.2.1 Conversion to Chomsky Normal Form

Let’s start with the process of converting a generic CFG into one represented in CNF. Assuming we’re dealing with an ϵ\epsilonϵ-free grammar, there are three situations we need to address in any generic grammar: rules that mix terminals with non-terminals on the right-hand side, rules that have a single non-terminal on the right-hand side, and rules in which the length of the right-hand side is greater than 2.

The remedy for rules that mix terminals and non-terminals is to simply introduce a new dummy non-terminal that covers only the original terminal. For example, a rule for an infinitive verb phrase such as INF-VP →\to→ to VP​ would be replaced by the two rules INF-VP →\to→ TO VP and TO →\to→ to.

Rules with a single non-terminal on the right are called unit productions. We can eliminate unit productions by rewriting the right-hand side of the original rules with the right-hand side of all the non-unit production rules that they ultimately lead to. More formally, if A⇒∗BA \overset{*}{\Rightarrow}BA⇒∗B by a chain of one or more unit productions and B→γB \to \gammaB→γ is a non-unit production in our grammar, then we add A→γA \to \gammaA→γ for each such rule in the grammar and discard all the intervening unit productions. This can lead to a substantial flattening of the grammar and a consequent promotion of terminals to fairly high levels in the resulting trees.

Rules with right-hand sides longer than 2 are normalized through the introduction of new non-terminals that spread the longer sequences over several new rules. In our current grammar, the rule S→AuxNPVPS \to Aux\ NP\ VPS→Aux NP VP would be replaced by the two rules S→X1VPS \to X1\ VPS→X1 VP and X1→AuxNPX1 \to Aux\ NPX1→Aux NP.

The entire conversion process can be summarized as follows:

  1. Copy all conforming rules to the new grammar unchanged.
  2. Convert terminals within rules to dummy non-terminals.
  3. Convert unit-productions.
  4. Make all rules binary and add them to new grammar.

Figure 11.3 shows the results of applying this entire conversion procedure to the L1\mathscr{L}_1L1​ grammar introduced earlier on page 224. Note that this figure doesn’t show the original lexical rules; since these original lexical rules are already in CNF, they all carry over unchanged to the new grammar. Figure 11.3 does, however, show the various places where the process of eliminating unit productions has, in effect, created new lexical rules. For example, all the original verbs have been promoted to both VPs and to Ss in the converted grammar.

11.2.2 CKY Recognition

With our grammar now in CNF, each non-terminal node above the part-of-speech level in a parse tree will have exactly two daughters. A two-dimensional matrix can be used to encode the structure of an entire tree. For a sentence of length n, we will work with the upper-triangular portion of an (n+1)×(n+1)(n+1)\times(n+1)(n+1)×(n+1) matrix. Each cell [i,j][i, j][i,j] in this matrix contains the set of non-terminals that represent all the constituents that span positions iii through jjj of the input. Since our indexing scheme begins with 0, it’s natural to think of the indexes as pointing at the gaps between the input words (as in 0Book1that2flight3_0\ Book\ _1\ that\ _2\ flight\ _30​ Book 1​ that 2​ flight 3​). It follows then that the cell that represents the entire input resides in position [0,n][0, n][0,n] in the matrix.

Since each non-terminal entry in our table has two daughters in the parse, it follows that for each constituent represented by an entry [i,j][i, j][i,j], there must be a position in the input, kkk, where it can be split into two parts such that i&lt;k&lt;ji &lt; k &lt; ji<k<j. Given such a position k, the first constituent [i,k][i, k][i,k] must lie to the left of entry [i,j][i, j][i,j] somewhere along row iii, and the second entry [k,j][k, j][k,j] must lie beneath it, along column jjj.

To make this more concrete, consider the following example with its completed parse matrix, shown in Fig. 11.4.

(11.3) Book the flight through Houston.

The superdiagonal row in the matrix contains the parts of speech for each input word in the input. The subsequent diagonals above that superdiagonal contain constituents that cover all the spans of increasing length in the input.

Given this setup, CKY recognition consists of filling the parse table in the right way. To do this, we’ll proceed in a bottom-up fashion so that at the point where we are filling any cell [i,j][i, j][i,j], the cells containing the parts that could contribute to this entry (i.e., the cells to the left and the cells below) have already been filled. The algorithm given in Fig. 11.5 fills the upper-triangular matrix a column at a time working from left to right, with each column filled from bottom to top, as the right side of Fig. 11.4 illustrates. This scheme guarantees that at each point in time we have all the information we need (to the left, since all the columns to the left have already been filled, and below since we’re filling bottom to top). It also mirrors online parsing since filling the columns from left to right corresponds to processing each word one at a time.

words="Book the flight through Houston".split()
grammars=['S :- NP VP','S :- X1 VP','X1 :- Aux NP','S :- book | include | prefer','S :- Verb NP','S :- X2 PP','S :- Verb PP','S :- VP PP','NP :- I | she | me','NP :- NWA | Houston','NP :- Det Nominal','Nominal :- book | flight | meal | money','Nominal :- Nominal Noun','Nominal :- Nominal PP','VP :- book | include | prefer','VP :- Verb NP','VP :- X2 PP','X2 :- Verb NP','VP :- Verb PP','VP :- VP PP','PP :- Preposition NP','Det :- that | this | the | a','Noun :- book | flight | meal | money','Verb :- book | include | prefer','Pronoun :- I | she | me','Proper-Noun :- Houston | NWA','Aux :- does','Preposition :- from | to | on | near | through']def find_in_set(set_find,element):for e in set_find:if element.lower()==e.lower():return Truereturn Falsetable=[[set([])]*(n+1) for i in range(n+1)]
for j in range(1,6):for grammar in grammars:ind=grammar.find(' :-')if(grammar[ind+2:].lower().find(words[j-1].lower())!=-1):table[j-1][j]=table[j-1][j]|set([grammar[:ind].strip()])for i in range(j-2,-1,-1):for k in range(i+1,j):for grammar in grammars:ind=grammar.find(':-')BC=grammar[ind+2:].lower().split()if len(BC)==2 and find_in_set(table[i][k],BC[0]) and find_in_set(table[k][j],BC[1]):table[i][j]=table[i][j]|set([grammar[:ind].strip()])
>>>table
[[set(),{'Nominal', 'Noun', 'S', 'VP', 'Verb'},set(),{'S', 'VP', 'X2'},set(),{'S', 'VP', 'X2'}],[set(), set(), {'Det'}, {'NP'}, set(), {'NP'}],[set(), set(), set(), {'Nominal', 'Noun'}, set(), {'Nominal'}],[set(), set(), set(), set(), {'Preposition'}, {'PP'}],[set(), set(), set(), set(), set(), {'NP', 'Proper-Noun'}],[set(), set(), set(), set(), set(), set()]]

The outermost loop of the algorithm given in Fig. 11.5 iterates over the columns, and the second loop iterates over the rows, from the bottom up. The purpose of the innermost loop is to range over all the places where a substring spanning iii to jjj in the input might be split in two. As kkk ranges over the places where the string can be split, the pairs of cells we consider move, in lockstep, to the right along row iii and down along column jjj. Figure 11.6 illustrates the general case of filling cell [i,j][i, j][i,j]. At each such split, the algorithm considers whether the contents of the two cells can be combined in a way that is sanctioned by a rule in the grammar. If such a rule exists, the non-terminal on its left-hand side is entered into the table.

Figure 11.7 shows how the five cells of column 5 of the table are filled after the word Houston is read. The arrows point out the two spans that are being used to add an entry to the table. Note that the action in cell [0, 5] indicates the presence of three alternative parses for this input, one where the PP modifies the flight, one where it modifies the booking, and one that captures the second argument in the original VP→VerbNPPPVP \to Verb\ NP\ PPVP→Verb NP PP rule, now captured indirectly with the VP→X2PPVP \to X2\ PPVP→X2 PP rule.

11.2.3 CKY Parsing

The algorithm given in Fig. 11.5 is a recognizer, not a parser; for it to succeed, it simply has to find an S in cell [0*,* n]. To turn it into a parser capable of returning all possible parses for a given input, we can make two simple changes to the algorithm: the first change is to augment the entries in the table so that each non-terminal is paired with pointers to the table entries from which it was derived (more or less as shown in Fig. 11.7), the second change is to permit multiple versions of the same non-terminal to be entered into the table (again as shown in Fig. 11.7).

With these changes, the completed table contains all the possible parses for a given input. Returning an arbitrary single parse consists of choosing an S from cell [0*,* n] and then recursively retrieving its component constituents from the table. Of course, returning all the parses for a given input may incur considerable cost since an exponential number of parses may be associated with a given input. In such cases, returning all the parses will have an unavoidable exponential cost. Looking forward to Chapter 12, we can also think about retrieving the best parse for a given input by further augmenting the table to contain the probabilities of each entry. Retrieving the most probable parse consists of running a suitably modified version of the Viterbi algorithm from Chapter 8 over the completed parse table.

11.2.4 CKY in Practice

Finally, we should note that while the restriction to CNF does not pose a problem theoretically, it does pose some non-trivial problems in practice. Obviously, as things stand now, our parser isn’t returning trees that are consistent with the grammar given to us by our friendly syntacticians. In addition to making our grammar developers unhappy, the conversion to CNF will complicate any syntax-driven approach to semantic analysis.

One approach to getting around these problems is to keep enough information around to transform our trees back to the original grammar as a post-processing step of the parse. This is trivial in the case of the transformation used for rules with length greater than 2. Simply deleting the new dummy non-terminals and promoting their daughters restores the original tree. In the case of unit productions, it turns out to be more convenient to alter the basic CKY algorithm to handle them directly than it is to store the information needed to recover the correct trees. Exercise 11.3 asks you to make this change. Many of the probabilistic parsers presented in Chapter 12 use the CKY algorithm altered in just this manner. Another solution is to adopt a more complex dynamic programming solution that simply accepts arbitrary CFGs. The next section presents such an approach.

11.3 Partial Parsing

Many language processing tasks do not require complex, complete parse trees for all inputs. For these tasks, a partial
parse
, or shallow parse, of input sentences may be sufficient.

There are many different approaches to partial parsing. Some make use of cascades of finite state transducers to produce tree-like representations.

An alternative style of partial parsing is known as chunking. Chunking is the process of identifying and classifying the flat, non-overlapping segments of a sentence that constitute the basic non-recursive phrases corresponding to the major content-word parts-of-speech: noun phrases, verb phrases, adjective phrases, and prepositional phrases. Since chunked texts lack a hierarchical structure, a simple bracketing notation is sufficient to denote the location and the type of the chunks in a given example:

(11.4) [NP The morning flight] [PP from] [NP Denver] [VP has arrived.]

This bracketing notation makes clear the two fundamental tasks that are involved in chunking: segmenting (finding the non-overlapping extents of the chunks) and labeling (assigning the correct tag to the discovered chunks).

Some standard guidelines are followed in most systems. First and foremost, base phrases of a given type do not recursively contain any constituents of the same type. Eliminating this kind of recursion leaves us with the problem of determining the boundaries of the non-recursive phrases. In most approaches, base phrases include the headword of the phrase, along with any pre-head material within the constituent, while crucially excluding any post-head material. Eliminating post-head modifiers obviates the need to resolve attachment ambiguities. This exclusion does lead to certain oddities, such as PPs and VPs often consisting solely of their heads.

11.3.1 Machine Learning-Based Approaches to Chunking

State-of-the-art approaches to chunking use supervised machine learning to train a chunker by using annotated data as a training set and training any sequence labeler. It’s common to model chunking as IOB tagging. In IOB tagging we introduce a tag for the beginning (B) and inside (I) of each chunk type, and one for tokens outside (O) any chunk. The number of tags is thus 2n + 1 tags, where n is the number of chunk types. IOB tagging can represent exactly the same information as the bracketed notation. The following example shows the bracketing notation of (11.4) on page 232 reframed as a tagging task:

(11.7) The morning flight from Denver has arrived
B_NP I_NP I_NP B_PP B_NP B_VP I_VP

The same sentence with only the base-NPs tagged illustrates the role of the O tags.

(11.8) The morning flight from Denver has arrived
B_NP I_NP I_NP O B_NP O O

Since annotation efforts are expensive and time consuming, chunkers usually rely on existing treebanks like the Penn Treebank (Chapter 10), extracting syntactic phrases from the full parse constituents of a sentence, finding the appropriate heads and then including the material to the left of the head, ignoring the text to the right. This is somewhat error-prone since it relies on the accuracy of the head-finding rules described in Chapter 10.

Given a training set, any sequence model can be used. Figure 11.8 shows an illustration of a simple feature-based model, using features like the words and parts-of-speech within a 2 word window, and the chunk tags of the preceding inputs in the window. In training, each training vector would consist of the values of 13 features; the two words to the left of the decision point, their parts-of-speech and chunk tags, the word to be tagged along with its part-of-speech, the two words that follow along with their parts-of speech, and the correct chunk tag, in this case, I NP. During classification, the classifier is given the same vector without the answer and assigns the most appropriate tag from its tagset. Viterbi decoding is commonly used.

11.3.2 Chunking-System Evaluations

Chunkers are evaluated according to the notions of precision, recall, and the F-measure borrowed from the field of information retrieval.

Precision measures the percentage of system-provided chunks that were correct. Correct here means that both the boundaries of the chunk and the chunk’s label are correct. Precision is therefore defined as
Precision:=Number of correct chunks given by systemTotal number of chunks given by system\textbf{Precision:} = \frac{\textrm{Number of correct chunks given by system}}{\textrm{Total number of chunks given by system}} Precision:=Total number of chunks given by systemNumber of correct chunks given by system​
Recall measures the percentage of chunks actually present in the input that were correctly identified by the system. Recall is defined as
Recall:=Number of correct chunks given by systemTotal number of actual chunks in the text\textbf{Recall:} =\frac{\textrm{ Number of correct chunks given by system}}{\textrm{ Total number of actual chunks in the text}} Recall:= Total number of actual chunks in the text Number of correct chunks given by system​
The F-measure (van Rijsbergen, 1975) provides a way to combine these two measures into a single metric.

11.4 Summary

The two major ideas introduced in this chapter are those of parsing and partial parsing. Here’s a summary of the main points we covered about these ideas:

  • Structural ambiguity is a significant problem for parsers. Common sources of structural ambiguity include PP-attachment, coordination ambiguity, and noun-phrase bracketing ambiguity.
  • Dynamic programming parsing algorithms, such as CKY, use a table of partial parses to efficiently parse ambiguous sentences.
  • CKY restricts the form of the grammar to Chomsky normal form (CNF).
  • Many practical problems, including information extraction problems, can be solved without full parsing.
  • Partial parsing and chunking are methods for identifying shallow syntactic constituents in a text.
  • State-of-the-art methods for partial parsing use supervised machine learning techniques.

CHAPTER 11 Syntactic Parsing相关推荐

  1. Chapter 11 替代变量

    Chapter 11 替代变量 select empno, ename, sal from emp where sal >= 1500 select empno, ename, sal from ...

  2. halcon算子盘点:Chapter 11 :Morphology1

    Chapter 11 :Morphology 11.1 Gray-Values 1. dual_rank  功能:打开.取中值和关闭圆和矩形掩码. 2. gen_disc_se  功能:为灰度形态学生 ...

  3. 句法分析(syntactic parsing)在NLP领域的应用是怎样的

    转载自   句法分析(syntactic parsing)在NLP领域的应用是怎样的 句法分析(syntactic parsing)在NLP领域的应用是怎样的? 文章整理自郭江师兄问题回答(被收录于知 ...

  4. Week3 Syntactic Parsing(句法分析)

    一.Syntax 句法(Syntax)的英语单词来源于希腊单词sýntaxis,它的意思是布置(arrangement)和放在一起(setting out together),也就是研究怎么样把词汇组 ...

  5. Chapter 11 特征选择和稀疏学习

    Chapter 11 特征选择和稀疏学习 1 子集搜索与评价 一个样本通常有多个属性,如西瓜有色泽,根蒂,颜色等.将属性称之为特征,对一个学习任务而言,有用的特征称之为"相关特征" ...

  6. Risk Management and Financial Institution Chapter 11 —— Correlations and Copulas

    typora-copy-images-to: Risk Management and Financial Institution 文章目录 typora-copy-images-to: Risk Ma ...

  7. 自然语言处理从入门到应用——自然语言处理的基础任务:词性标注(POS Tagging)和句法分析(Syntactic Parsing)

    分类目录:<自然语言处理从入门到应用>总目录 词性标注 词性是词语在句子中扮演的语法角色,也被称为词类(Part-Of-Speech,POS).例如,表示抽象或具体事物名字(如" ...

  8. Vol3 Chapter 11 缓存控制

    文章目录 11.1 内部缓存.TLB和缓冲区 11.2 相关术语 11.3 可用的缓存方法 11.3.1 写合并内存地址的缓存 11.3.2 如何选择内存类型 11.3.3 在不可缓存内存中获取代码 ...

  9. UNP Chapter 11 - 高级名字与地址转换

    11.1. 概述 函数gethostbyname和gethostbyaddr是依赖于协议的,使用前一个函数时,我们必须知道放置结果的套接口地址结构的成员是哪一种(举例来说,IPv4使用sin_addr ...

最新文章

  1. yolov5 多版本共存
  2. CentOS 6.9下OpenLDAP 的安装与配置
  3. 简单分析几个常见的排序算法(C语言)
  4. Gauss 消元法求解线性方程组
  5. linux安卓主线程同步,Android解决:使用多线程和Handler同步更新UI
  6. python笔记之文件的基本操作和os模块
  7. 以太坊账户 相关知识
  8. 00030_ArrayList集合
  9. 聊聊近期的感受和10月文章精选!
  10. python将字符串s和换行符写入文件fp_Python 文件操作
  11. SpringBoot整合springDataJPA
  12. 4.深入分布式缓存:从原理到实践 --- Ehcache 与 Guava Cache
  13. HDU 2841 容斥 或 反演
  14. KITTI数据集简析
  15. python 通达信公式函数_通达信,文华财经,非常实用的主图均线变色指标
  16. 如何实现微信扫码登录--OAuth2
  17. Win10 GTX1050 TI 下NVIDIA驱动 CUDA和CUDNN的安装(超详细)!亲测有效!
  18. rust被禁播还能玩吗_被强制下架的5部剧,后2部因“尺度太大”被禁播,如今已恢复上架...
  19. Linux云计算虚拟化-使用rancher搭建k8s集群并发布电商网站
  20. js异步和同步的区别

热门文章

  1. 12306抢票(1)主体部分
  2. 有哪些比较好的企业内部管理软件?公认的5个高效管理软件介绍
  3. 老男孩网络安全2021
  4. Oracle导入英文日期格式数据出现问题的解决
  5. 丙烯颜料试用心得和丙烯绘画入门
  6. 计算机怎么搜索特定格式文件,win10 查找指定类型文件方法_win10怎么查找指定类型文件-win7之家...
  7. 在64位ubuntu gcc 编译 -m32报错
  8. seo伪原创文章优化(怎么找可以做伪原创的文章)
  9. 如何增加百度收录有什么方法
  10. 美图手机怎么投屏到电脑