ValueError: [E030] Sentence boundaries unset. You can add the 'sentencizer' component to the pipelin

错误代码：

nlp = English()
nlp.add_pipe(nlp.create_pipe('sentencizer'))

def normalize(text):
text = text.lower().strip()
doc = nlp(text)
filtered_sentences = []
for sentence in tqdm(doc.sents):#错误在这

错误：

ValueError: [E030] Sentence boundaries unset. You can add the 'sentencizer' component to the pipeline with: nlp.add_pipe(nlp.create_pipe('sentencizer')) Alternatively, add the dependency parser, or set sentence boundaries by setting doc[i].is_sent_start.

原因：

This is currently a limitation of the sentencizer, because the is_sentenced property is based on whether the Token.is_sent_start properties were changed. However, for the first token in a sentence, this will always default to True. So if the sentence only contains one token, there's no way for spaCy to tell whether the sentence boundaries have been set or not.

As a workaround, you could trick spaCy into ignoring this by setting doc.is_parsed = True, i.e. by making it believe that the dependency parse was assigned and sentence boundaries were applied this way.

这当前是sentencizer的限制，因为is_sentenced属性基于Token.is_sent_start属性是否已更改。但是，对于句子中的第一个标记，它将始终默认为True。因此，如果句子只包含一个标记，则spaCy无法判断是否已设置句子边界。

作为一种解决方法，你可以通过设置doc.is_parsed = True来欺骗spaCy忽略它，即通过让它相信分配了依赖关系解析并以这种方式应用了句子边界。

解决办法：spacy版本问题，2.1.3换成2.1.0

pip uninstall spacy

pip install spacy==2.1.0

艹怼死我了这个问题

ValueError: [E030] Sentence boundaries unset. You can add the 'sentencizer' component to the pipelin相关推荐

机器学习管道模型_使用连续机器学习来运行您的ml管道
机器学习管道模型 Vaithy NarayananVaithy Narayanan Follow跟随 Jul 15 7月15 使用连续机器学习来运行ML管道 (Using Continuous Mac ...
Tokenisation word segmentation sentence segmentation
David D. Palmer Chapter 2: Tokenisation and SentenceSegmentation.2000 https://scholar.google.com/cit ...
vue indev.html,webpack - Can't add script tag to Vue component files ( *.vue ) - Stack Overflow
I'm new to vue.js. I'm trying to render vue component file but when I add to vue component, for exam ...
谷歌BERT预训练源码解析（一）：训练数据生成
目录预训练源码结构简介输入输出源码解析参数主函数创建训练实例下一句预测&实例生成随机遮蔽输出结果一览预训练源码结构简介关于BERT,简单来说,它是一个基于Transfo ...
react 错误边界_React with GraphQL和错误边界中的自定义错误页面
react 错误边界 by Abi Noda 通过Abi Noda React with GraphQL和错误边界中的自定义错误页面 (Custom error pages in React with ...
ai css 线条粗细_如何训练AI将您的设计模型转换为HTML和CSS
ai css 线条粗细 by Emil Wallner 埃米尔·沃尔纳(Emil Wallner) 如何训练AI将您的设计模型转换为HTML和CSS (How you can train an AI ...
前端要完！人工智能已经能实现自动编写 HTML 和 CSS
本文转载自:CSDN 资讯 [编者按]一个月前,我们曾发表过一篇标题为<三年后,人工智能将彻底改变前端开发?>的文章,其中介绍了一个彼时名列 GitHub 排行榜 TOP 1 的项目 -- ...
基于词典的正向最大匹配中文分词算法，能实现中英文数字混合分词
基于词典的正向最大匹配中文分词算法,能实现中英文数字混合分词.比如能分出这样的词:bb霜.3室.乐phone.touch4.mp3.T恤第一次写中文分词程序,欢迎拍砖. publicclass MM ...
基于词典的逆向最大匹配中文分词算法，更好实现中英文数字混合分词
基于词典的逆向最大匹配中文分词算法,能实现中英文数字混合分词.比如能分出这样的词:bb霜.3室.乐phone.touch4.mp3.T恤.实际分词效果比正向分词效果好 publicclass RMM ...

ValueError: [E030] Sentence boundaries unset. You can add the 'sentencizer' component to the pipelin

ValueError: [E030] Sentence boundaries unset. You can add the 'sentencizer' component to the pipelin相关推荐

最新文章

热门文章