一、re模块介绍

python标准库—re模块，建议详细查看 re.py源码，模块所有的方法见如下__all__：

__all__ = ["match", "fullmatch", "search", "sub", "subn", "split","findall", "finditer", "compile", "purge", "template", "escape","error", "A", "I", "L", "M", "S", "X", "U","ASCII", "IGNORECASE", "LOCALE", "MULTILINE", "DOTALL", "VERBOSE","UNICODE",
]__version__ = "2.2.1"

re.py文件中，如下文档详细介绍了正则表达式API、正则表达式等，见如下：

r"""Support for regular expressions (RE).This module provides regular expression matching operations similar to
those found in Perl.  It supports both 8-bit and Unicode strings; both
the pattern and the strings being processed can contain null bytes and
characters outside the US ASCII range.Regular expressions can contain both special and ordinary characters.
Most ordinary characters, like "A", "a", or "0", are the simplest
regular expressions; they simply match themselves.  You can
concatenate ordinary characters, so last matches the string 'last'.The special characters are:"."      Matches any character except a newline."^"      Matches the start of the string."$"      Matches the end of the string or just before the newline atthe end of the string."*"      Matches 0 or more (greedy) repetitions of the preceding RE.Greedy means that it will match as many repetitions as possible."+"      Matches 1 or more (greedy) repetitions of the preceding RE."?"      Matches 0 or 1 (greedy) of the preceding RE.*?,+?,?? Non-greedy versions of the previous three special characters.{m,n}    Matches from m to n repetitions of the preceding RE.{m,n}?   Non-greedy version of the above."\\"     Either escapes special characters or signals a special sequence.[]       Indicates a set of characters.A "^" as the first character indicates a complementing set."|"      A|B, creates an RE that will match either A or B.(...)    Matches the RE inside the parentheses.The contents can be retrieved or matched later in the string.(?aiLmsux) Set the A, I, L, M, S, U, or X flag for the RE (see below).(?:...)  Non-grouping version of regular parentheses.(?P<name>...) The substring matched by the group is accessible by name.(?P=name)     Matches the text matched earlier by the group named name.(?#...)  A comment; ignored.(?=...)  Matches if ... matches next, but doesn't consume the string.(?!...)  Matches if ... doesn't match next.(?<=...) Matches if preceded by ... (must be fixed length).(?<!...) Matches if not preceded by ... (must be fixed length).(?(id/name)yes|no) Matches yes pattern if the group with id/name matched,the (optional) no pattern otherwise.The special sequences consist of "\\" and a character from the list
below.  If the ordinary character is not on the list, then the
resulting RE will match the second character.\number  Matches the contents of the group of the same number.\A       Matches only at the start of the string.\Z       Matches only at the end of the string.\b       Matches the empty string, but only at the start or end of a word.\B       Matches the empty string, but not at the start or end of a word.\d       Matches any decimal digit; equivalent to the set [0-9] inbytes patterns or string patterns with the ASCII flag.In string patterns without the ASCII flag, it will match the wholerange of Unicode digits.\D       Matches any non-digit character; equivalent to [^\d].\s       Matches any whitespace character; equivalent to [ \t\n\r\f\v] inbytes patterns or string patterns with the ASCII flag.In string patterns without the ASCII flag, it will match the wholerange of Unicode whitespace characters.\S       Matches any non-whitespace character; equivalent to [^\s].\w       Matches any alphanumeric character; equivalent to [a-zA-Z0-9_]in bytes patterns or string patterns with the ASCII flag.In string patterns without the ASCII flag, it will match therange of Unicode alphanumeric characters (letters plus digitsplus underscore).With LOCALE, it will match the set [0-9_] plus characters definedas letters for the current locale.\W       Matches the complement of \w.\\       Matches a literal backslash.This module exports the following functions:match     Match a regular expression pattern to the beginning of a string.fullmatch Match a regular expression pattern to all of a string.search    Search a string for the presence of a pattern.sub       Substitute occurrences of a pattern found in a string.subn      Same as sub, but also return the number of substitutions made.split     Split a string by the occurrences of a pattern.findall   Find all occurrences of a pattern in a string.finditer  Return an iterator yielding a match object for each match.compile   Compile a pattern into a RegexObject.purge     Clear the regular expression cache.escape    Backslash all non-alphanumerics in a string.Some of the functions in this module takes flags as optional parameters:A  ASCII       For string patterns, make \w, \W, \b, \B, \d, \Dmatch the corresponding ASCII character categories(rather than the whole Unicode categories, which is thedefault).For bytes patterns, this flag is the only availablebehaviour and needn't be specified.I  IGNORECASE  Perform case-insensitive matching.L  LOCALE      Make \w, \W, \b, \B, dependent on the current locale.M  MULTILINE   "^" matches the beginning of lines (after a newline)as well as the string."$" matches the end of lines (before a newline) as wellas the end of the string.S  DOTALL      "." matches any character at all, including the newline.X  VERBOSE     Ignore whitespace and comments for nicer looking RE's.U  UNICODE     For compatibility only. Ignored for string patterns (itis the default), and forbidden for bytes patterns.This module also defines an exception 'error'."""

二、re模块详情

1. 正则表达式介绍

我习惯将正则表达式分为四类：字符符号，位置符号，数量符号，分组符号：

字符类型符号：

正则表达式符号	说明
.	匹配任意字符（不包括换行符）
\d	匹配一个数字，相当于 [0-9]
\D	匹配非数字,相当于 [^0-9]
\s	匹配任意空白字符，相当于 [ \t\n\r\f\v]
\S	匹配非空白字符，相当于 [^ \t\n\r\f\v]
\w	匹配数字、字母、下划线中任意一个字符，相当于 [a-zA-Z0-9_]
\W	匹配非数字、字母、下划线中的任意字符，相当于 [^a-zA-Z0-9_]
\\	转义字符，跟在其后的字符将失去作为特殊元字符的含义，例如\\.只能匹配.，不能再匹配任意字符
[]	字符集，一个字符的集合，可匹配其中任意一个字符
\|	逻辑表达式或，比如 a\|b 代表可匹配 a 或者 b

位置符号:

正则表达式符号	说明
^	匹配开始位置，多行模式下匹配每一行的开始
$	匹配结束位置，多行模式下匹配每一行的结束
\A	匹配字符串开始位置，忽略多行模式
\Z	匹配字符串结束位置，忽略多行模式
\b	匹配位于单词开始或结束位置的空字符串
\B	匹配不位于单词开始或结束位置的空字符串

数量符号：

正则表达式符号	说明
*	匹配前一个元字符1到多次
+	匹配前一个元字符1到多次
?	匹配前一个元字符0到1次
*?,+?,??	非贪婪模式，匹配最少
{m,n}	匹配前一个元字符m到n次
{m,n}?	匹配前一个元字符m到n次，非贪婪模式，匹配最少

分组符号：

正则表达式符号	说明
(...)	分组，默认为捕获，即被分组的内容可以被单独取出，默认每个分组有个索引，从 1 开始，按照"("的顺序决定索引值
(?aiLmsux)	分组中可以设置模式，iLmsux之中的每个字符代表一个模式,用法参见模式 I
(?:...)	分组的不捕获模式，计算索引时会跳过这个分组
(?P<name>...)	分组的命名模式，取此分组中的内容时可以使用索引也可以使用name
(?P=name)	分组的引用模式，可在同一个正则表达式用引用前面命名过的正则
(?#...)	注释，不影响正则表达式其它部分,用法参见模式
(?=...)	顺序肯定环视，表示所在位置右侧能够匹配括号内正则
(?!...)	顺序否定环视，表示所在位置右侧不能匹配括号内正则
(?<=...)	逆序肯定环视，表示所在位置左侧能够匹配括号内正则
(?<!...)	逆序否定环视，表示所在位置左侧不能匹配括号内正则
(?(id/name)yes\|no)	若前面指定id或name的分区匹配成功则执行yes处的正则，否则执行no处的正则
\number	匹配和前面索引为number的分组捕获到的内容一样的字符串

2. re模块api

api如下（示例）：

api	功能
compile	初始化正则表达式
template	模板化正则表达式，没搞懂干啥的

match	只从字串的开始位置进行匹配，如果失败，它就此放弃
fullmatch	检测整个字符串与正则匹配，从头到尾
search	则会锲而不舍地完全遍历整个字串中所有可能的位置，直到成功地找到一个匹配，或者搜索完字串，以失败告终。
findall	找出所有可能的匹配，以列表的形式返回
finditer	找出所有可能的匹配，以迭代器形式返回

sub	搜索整个字符串，将所有匹配的用指定的字符串替换
subn	搜索整个字符串，将所有匹配的用指定的数目进行替换,返回tuple
split	搜索整个字符串，将字符串按照匹配上的字符进行分割，可以指定按照几个匹配的字符串进行分割

purge	清除正则表达式的缓存，尽量使用compile区创建，可以复用
escape	转义模式中除ASCII字母、数字和'_'以外的所有字符
error	错误异常

如上方法中可以指定匹配模式，flags参数，六种模式如下：

模式简写	全称	功能
I	IGNORECASE	不区分大小写
L	LOCALE	字符集本地化，根据当地的大小写等规则匹配
M	MULTILINE	多行模式, 改变 ^ 和 $ 的行为
S	DOTALL	此模式下 '.' 的匹配不受限制，可匹配任何字符，包括换行符，也就是默认是不能匹配换行符
X	VERBOSE	冗余模式，此模式忽略正则表达式中的空白和#号的注释
U	UNICODE	UNICODE规则匹配，目前python3都是unicode
A	ASCII	python3使用unicode表示字符串，而在python2用ASCII表示，

三、re代码示例

# -*- coding:utf-8 -*-import re#解读 re正则表达式
s = 'The launch of shenzhou 13 manned spacecraft was a complete success! 12 by 2021.10.15' s1 = '''
<div class="contson" id="contson16e9c75bf6d1"><span style="color:#B00815;">男儿何不带吴钩，收取关山五十州。</span><br>请君暂上凌烟阁，若个书生万户侯？
</div>
'''
s2 = '''2021.10.16, Apple Apach
phone: 0041-123-158-7710(intel), 158-7777-2455(china)
email: caontcat-123@cnte.com,caontcat-123@163.com
'''regex = re.compile('\w') #compile 初始化正则表达式
re_match = re.match(regex, s) #从头开始匹配一次，一次匹配失败，则终止，返回None
print(type(re_match)) #<class '_sre.SRE_Match'>  C库类型
print(re_match) #<_sre.SRE_Match object; span=(0, 1), match='T'>，匹配位置，匹配文本
print(re_match.group(0)) #使用group函数取出匹配结果
print(re_match.span()) #返回匹配位置regex = re.compile('.*')
re_fullmatch = re.fullmatch(regex, s) #检测整个字符串与正则匹配，从头到尾。
print(re_fullmatch.span())
print(re_match.group(0)) #第一个匹配的字符regex = re.compile('\d')
re_search = re.search(regex, s) #搜索整个字符串，直到找到第一个匹配的返回
print(re_search)#<_sre.SRE_Match object; span=(23, 24), match='1'>
print(re_search.group(0))regex = re.compile('\d+')
re_findall = re.findall(regex, s) #搜索整个字符串，以列表的方式返回所有匹配
print(re_findall) #<class 'list'> #['13', '12', '2021', '10', '15'], 贪婪匹配regex = re.compile('\d+')
re_finditer = re.finditer(regex, s)#搜索整个字符串，返回迭代器
print(type(re_finditer))#<class 'callable_iterator'>
for  x in re_finditer:print(x) #<_sre.SRE_Match object; span=(23, 24), match='1'>print(x.group()) #all value: 13/12/2021/10/15regex = re.compile('\d+')
re_sub = re.sub(regex, 'xxxx', s) #搜索整个字符串，将所有匹配的用指定的字符串替换
print(re_sub)#The launch of shenzhou xxx manned spacecraft was a complete success! xxx by xxx.xxx.xxxregex = re.compile('\d+')
re_subn = re.subn(regex, 'XXX', s) #搜索整个字符串，将所有匹配的用指定的数目进行替换,返回tuple
print(re_subn) #('The launch of shenzhou XXX manned spacecraft was a complete success! XXX by XXX.XXX.XXX', 5)
re_subn = re.subn(regex, 'XXX', s, 2)
print(re_subn)#('The launch of shenzhou XXX manned spacecraft was a complete success! XXX by 2021.10.15', 2)regex = re.compile('\d+')
re_split = re.split(regex, s) #搜索整个字符串，将字符串按照匹配上的字符进行分割，
print(re_split)
re_split = re.split(regex, s, 2)# 可以指定按照几个匹配的字符串进行分割
print(re_split)re.purge() #清除正则表达式的缓存，尽量使用compile区创建，可以复用# # regex_template = re.template('%d+')  这应该是一个正则的一个模板，不知道干啥# re.escape(regex) #文档中写：转义模式中除ASCII字母、数字和'_'以外的所有字符。e = re.error #错误异常
print(e)

总结

提示：这里对文章进行总结：
例如：以上就是今天要讲的内容，本文仅仅简单介绍了pandas的使用，而pandas提供了大量能使我们快速便捷地处理数据的函数和方法。

python正则表达式【标准库—re】相关推荐

Python常用标准库之正则表达式
Python常用标准库之正则表达式 1.re模块常用函数 1.1 匹配对象以及group()和groups()方法 1.2 match()与search():匹配单个目标 1.3 findall(): ...
Python使用标准库zipfile+re提取docx文档中超链接文本和链接地址
推荐教材: <Python程序设计实用教程>,ISBN:978-7-5635-6065-3,董付国,北京邮电大学出版社教材封面: 全国各地新华书店有售京东购买链接: 配套资源:教学大纲 ...
CSDN21天学习挑战赛——Python常用标准库概述
活动地址:CSDN21天学习挑战赛 Python有一套标准库,随着python一起安装在电脑中,是python的一个组成部分. 一.os操作系统库 os模块提供了很多与操作系统相关联的函数. 在导入 ...
python 3标准库道格_《PYTHON 3标准库 [美] 道格·赫尔曼》[美] 道格·赫尔曼（Doug Hellmann）著【摘要书评在线阅读】-苏宁易购图书...
商品参数作者: [美] 道格·赫尔曼(Doug Hellmann)著出版社:机械工业出版社出版时间:. 版次:. 印次:. 印刷时间:. 页数:. 开本:. ISBN:9787111608950 ...
python 常用标准库
python 常用标准库 1.文本 string:通用字符串操作 re:正则表达式操作 difflib:差异计算工具 textwrap:文本填充 unicodedata:Unicode字符数据库 st ...
python 的标准库模块glob使用教程，主要为glob.glob()使用与glob.iglob()使用
欢迎大家关注笔者,你的关注是我持续更博的最大动力原创文章,转载告知,盗版必究 python 的标准库模块glob使用教程,主要为glob.glob函数使用与glob.iglob函数使用文章目录: ...
python 使用标准库连接linux实现scp和执行命令
import stat import pexpect 只显示关键代码: sqldb = localpath+database //获取database名字 if os.path.exists(sqld ...
python之标准库html
python之标准库html html库是用于解析HTML的一个工具,是python自带的标准库之一. html库位置: __init__.py文件提供两个函数: __all__ = ['escape ...
python常用标准库的基本用法_Python常用标准库之fileinput
Python常用标准库之fileinput fileinput模块用于对标准输入或多个文件进行逐行遍历.这个模块的使用非常简单,相比open()方法批量处理文件,fileinput模块可以对文件.行号 ...
【博学谷学习记录】超强总结，用心分享 | 人工智能编程语言Python常用标准库（上）
Python常用标准库上 sys库 time库 random库 math库 os库 shutil库 Python语言的急速发展很大程度上得益于其开放共享的特点和良好的社区支持和计算生态,拥有超过十几 ...

python正则表达式【标准库—re】