标准库之正则表达式3-前后向管理

更加易读的正则表达式

Python如何写出比较详细易读的正则表达式?
需要借助一个选项: re.VERBOSE
观察如下的三个版本:
第一版本，复杂，不易读，不好改。

#  用来匹配邮箱地址 : first.last@example.com
re.compile('[\w\d.+-]+@([\w\d.]+\.)+(com|org|edu)')

第二版本, 容易读，功能和第一版本相同:

re.compile('''[\w\d.+-]+       # 邮箱前缀@([\w\d.]+\.)+    # 域名(com|org|edu)    # 顶级域名''',re.VERBOSE)

使用re.VERBOSE 构建正则表达式:

用换行分割匹配的各个部分。
可以使用注释让正则更加清晰。
使用前后项管理、自引用或逻辑的管理更加高级的功能。

高级功能

前向和后向断言。

对某处前边或后边的模式进行预设定，能匹配的是符合模式的内容。(不占用符号)

模式	含义
(?=pattern)	肯定前向断言
(?!pattern)	否定后向断言
(?<!pattern)	肯定后向断言

自引用管理

在模式中匹配前面已经匹配的内容，使用变量引用前面匹配的内容
有num和命名两种方式

模式	含义
\num	自引用表达式做多支持两位99个
(?P=name)	自引用指定的一个命名组的值 key-value的形式

逻辑判断管理:

(?(id)yes-expression|no-expression) 查看id的组是否匹配如果匹配就用 yes-expression
来匹配后面的内容否则 no-expression 来匹配后面的内容。类似if else.
if 的条件是前面某个id是否匹配。

高级功能Demo测试

前向断言:

address = re.compile('''# ((?P<name>[\w.,]+)\s+)# 前向管理 肯定前向断言 断言两种形式 包含(?= (<.*>$)       # 包含尖角号|             # 或([^<].*[^>]$) # 不包含尖角号)<? # 前后向断言不消耗符号  这里可选的匹配尖角号# jie.zhang@example.com>(?P<email>[\w\d.+-]+       # 用户名@([\w\d.]+\.)+    # 域名(com|org|edu))>? # 前后向断言不消耗符号  这里可选的匹配尖角号''',re.VERBOSE)     # re.VERBOSE 构建详细内容的正则表达式candidates = [u'张杰 <jie.zhang@example.com>',  # 正例u'周杰伦 jielun.zhou@example.com',  # 正例u'张杰 <jie.zhang@example.com',  # 反例u'周杰伦 jielun.zhou@example.com>',   # 反例
]for candidate in candidates:print('例子:', candidate)match = address.search(candidate)if match:print('  Name :', match.groupdict()['name'])print('  Email:', match.groupdict()['email'])else:print('  No match')

输出:

否定后项断言:

address = re.compile('''^# 否定后向断言  这个断言的内容 不能匹配到(?!未知@.*$)[\w\d.+-]+       # 用户名@([\w\d.]+\.)+    # 域名(com|org|edu)$''',re.VERBOSE)candidates = [u'jie.zhang@example.com',u'未知@example.com',
]for candidate in candidates:print('例子:', candidate)match = address.search(candidate)if match:print('  Match:', candidate[match.start():match.end()])else:print('  No match')

输出:

肯定后向断言：

# 肯定后向断言
twitter = re.compile('''# 后面的匹配项中  匹配到@开头的(?<=@)([\w\d_]+)       # 字母数字下划线组合''',re.VERBOSE)text = '''This text includes two Twitter handles.
One for @ThePSF, and one for the author, @doughellmann. thank@you
'''print(text)
for match in twitter.findall(text):print('Handle:', match)

输出:

\num 形式的自引用表达式最多支持两位99个----> 也是该模式的一种限制
代码:

address = re.compile(r'''# First Last <first.last@example.com>(\w+)               # First\s+(([\w.]+)\s+)?      # optional middle name or initial(\w+)               # Last\s+<# The address: first_name.last_name@domain.tld(?P<email>\1               # First\.\4               # Last@([\w\d.]+\.)+(com|org|edu))>''',re.VERBOSE | re.IGNORECASE)   # re.VERBOSE 构建详细内容的正则表达式 和忽略大小写candidates = [u'First Last <first.last@example.com>',u'Different Name <first.last@example.com>',u'First Middle Last <first.last@example.com>',u'First M. Last <first.last@example.com>',
]for candidate in candidates:print('Candidate:', candidate)match = address.search(candidate)if match:print('  Match name :', match.group(1), match.group(4))print('  Match email:', match.group(5))else:print('  No match')

输出:

自引用指定的一个命名组的值 key-value的形式:

address = re.compile('''# First Last <first.last@example.com>(?P<first_name>\w+) # First\s+(([\w.]+)\s+)?(?P<last_name>\w+) # Last\s+<# The address: first_name.last_name@domain.tld(?P<email>(?P=first_name) # 更加的直观\.(?P=last_name)@([\w\d.]+\.)+(com|org|edu))>''',re.VERBOSE | re.IGNORECASE)candidates = [u'First Last <first.last@example.com>',  # 正例u'Different Name <first.last@example.com>'  # 反例
]for candidate in candidates:print('Candidate:', candidate)match = address.search(candidate)if match:print('  Match name :', match.groupdict()['first_name'],end=' ')print(match.groupdict()['last_name'])print('  Match email:', match.groupdict()['email'])else:print('  No match')

输出:

使用命名组的形式，使用上文匹配模式更加的简洁方便，而且没有99个数量上的限制。
(?(id)yes-expression|no-expression) 根据前面某一个是否匹配，来选择后面的使用哪种模式。

例子的逻辑:
没有姓名标注的邮件不用加尖角号，有姓名标注的邮件需要加尖角号

例子代码:

address = re.compile('''^# 命名组 name(?P<name>([\w.]+\s+)*[\w.]+)?\s*# 判断命名组 name 是否匹配(?(name)# 匹配上执行的模式  同时监控brackets 命名组是否匹配上 否则任意空白(?P<brackets>(?=(<.*>$)))  # 肯定前向断言|# 未匹配上执行的模式(?=([^<].*[^>]$))   # 肯定前向断言)# brackets 命名组匹配上 加上< 尖角好的占位(?(brackets)<?|\s*)# The address itself: username@domain.tld(?P<email>[\w\d.+-]+       # username@([\w\d.]+\.)+    # domain name prefix(com|org|edu)    # limit the allowed top-level domains)# brackets 命名组匹配上 加上> 尖角好的占位  否则任意空白(?(brackets)>?|\s*)$''',re.VERBOSE)candidates = [u'First Last <first.last@example.com>',u'No Brackets first.last@example.com',u'Open Bracket <first.last@example.com',u'Close Bracket first.last@example.com>',u'no.brackets@example.com',
]for candidate in candidates:print('例子:', candidate)match = address.search(candidate)if match:print('  Match name :', match.groupdict()['name'])print('  Match email:', match.groupdict()['email'])else:print('  No match')

输出为:

部分例子引用: https://pymotw.com/3/re/index.html

希望给读者能带来更直接的思考和吸收！很多时候我们不是缺少某种知识本身，我们只是对某类知识缺少勇气和模型的抽象。我们一起勇往直前。

标准库之正则表达式3-前后向管理相关推荐

Python标准库01 正则表达式 (re包)
摘要:Python正则表达式标准库介绍我将从正则表达式开始讲Python的标准库.正则表达式是文字处理中常用的工具,而且不需要额外的系统知识或经验.我们会把系统相关的包放在后面讲解. 正则表达式(r ...
python守护进程进程池_Python3标准库：multiprocessing像线程一样管理进程
Python Python开发 Python语言 Python3标准库:multiprocessing像线程一样管理进程 1. multiprocessing像线程一样管理进程 multiproces ...
Python常用标准库之正则表达式
Python常用标准库之正则表达式 1.re模块常用函数 1.1 匹配对象以及group()和groups()方法 1.2 match()与search():匹配单个目标 1.3 findall(): ...
Python标准库01 正则表达式(re包)
python正则表达式基础简单介绍正则表达式并不是python的一部分.正则表达式是用于处理字符串的强大工具,拥有自己独特的语法及一个独立的处理引擎,效率上可能不如str自带的方法,但功能十分强大 ...
Python标准库之正则表达式（re库）
目录正则表达式正则表达式语法 [] + * ? $ ^ {n} {m,n} \d \D \s \S \w \W \b \B . | () 模块内容 re.match(pattern[过滤模式],s ...
python标准库学习笔记
原创:python标准库学习笔记数据结构 bisect 模块里实现了一个向列表插入元素时也会顺便排序的算法. struct - 二进制数据结构:用途:在 Python 基本数据类型和二进制数据之间进 ...
C++ Primer 5th笔记（chap 17 标准库特殊设施）正则表达式类和输入序列类型
1. 多种类型的输入可以搜索多种类型的输入序列. RE 库为这些不同的输入序列类型都定义了对应的类型,eg. 普通 char 数据.wchar_t 数据字符可以保存在标准库string或是 cha ...
C++ Primer 5th笔记（chap 17 标准库特殊设施）正则表达式错误
1. 指定或使用正则表达式时的错误如果我们编写的正则表达式存在错误, 则在运行时标准库会抛出一个类型为regex 的异常 eg. try {// alnum右括号少了一个, 构造函数会抛出异常reg ...
python3多线程第三方库_Python3标准库：concurrent.futures管理并发任务池
Python Python开发 Python语言 Python3标准库:concurrent.futures管理并发任务池 1. concurrent.futures管理并发任务池 concurren ...
c语言标准库内存分配监控,C语言的本质（25）——C标准库之内存管理
程序中需要动态分配一块内存时怎么办呢?我们可以定义一个缓冲区数组,但是这种方法不够灵活,C89要求定义的数组是固定长度的,而程序往往在运行时才知道要动态分配多大的内存,例如: void foo(cha ...

标准库之正则表达式3-前后向管理

更加易读的正则表达式

高级功能

高级功能Demo测试

标准库之正则表达式3-前后向管理相关推荐

最新文章

热门文章