Python破解滑动验证码（极验/无背景图）

在使用Python突破人机验证时，验证码乃第一大关卡。本文针对破解滑动验证码展开分析。对于能够直接获取滑块小图与背景图的滑动验证码，通过使用cv2模块的matchTemplate函数，可以准确地计算出缺口位置。但是，在一些网站的滑动验证码中，已将滑块小图与背景图进行加密隐藏，无法直接获取。于是，本文主要针此类滑动验证码进行分析。出于网络安全考虑，本文不展示全部代码，仅截取部分代码进行思路分享。

matchTemplate函数介绍：在整个图像区域发现与给定子图像匹配的小块区域。

在开始之前，先看下效果

如下图，识别的缺口已用红色方框绘制，自动识别准确率达99.5%。

那么，python是如何排除干扰元素进行缺口识别的？

1、先裁剪出滑动验证码大图

def save_element_image(self):"""对指定元素进行截图、裁剪，返回保存路径self.page_image_file : 页面截图保存路径:return: self.element_image_file（元素截图保存路径）"""# 开始截取整个网页，保存self.browser.save_screenshot(self.page_image_file)# 获得元素（x/y/width/height）参数left = self.element.location['x']top = self.element.location['y']element_width = left + self.element.size['width']element_height = top + self.element.size['height']picture = Image.open(self.page_image_file)# 从网页截图中，裁剪element元素部分picture = picture.crop((left, top, element_width, element_height))# 保存元素图片picture.save(self.element_image_file)# 返回截取元素的图片路径return self.element_image_file

2、保存滑动验证码大图如下

3、确定“滑块图”与“大图”的长宽

关于滑块图与大图的长宽数据，可从以上保存的截图中，取值得出。
无论验证码如何刷新，滑块的起始x坐标值、长度与宽度均是固定不变的。

class SlideVerificationCode:def __init__(self, browser, element):# browser : selenium浏览器self.browser = browser# element = browser.find_element(By.XPATH,'/html/body/div[4]/div[2]')self.element = element# 衡量已知：大图宽度self.big_image_width = 260# 衡量已知：大图高度self.big_image_height = 160# 衡量已知：滑块/缺口宽度self.small_image_width = 50# 衡量已知：滑块/缺口宽度self.small_image_height = 48# 衡量已知：滑块起始x坐标总是5像素self.small_image_x = 6# 滑块/缺口y坐标值未知self.small_image_y = 0# 图片保存路径self.image_save_path = "image"self.failed_path = os.path.join(self.image_save_path, "detection_failed")# 是否开启展示图片函数，默认为Falseself.switch_show_image = Falseif not os.path.exists(self.image_save_path):os.makedirs(self.image_save_path)if not os.path.exists(self.failed_path):os.makedirs(self.failed_path)# 获取时间作为文件名time_stamp = datetime.datetime.now()time_for_filename = time_stamp.strftime('%y-%m-%d_%H%M%S')# 定义裁剪的粉色区域与蓝色区域文件名self.big_image_file = os.path.join(self.image_save_path, '1_big_image_cut.png')self.small_image_file = os.path.join(self.image_save_path, '1_small_image_cut.png')# 定义页面截图与元素截图文件名self.page_image_basename = 'page_image' + time_for_filename+'.png'self.page_image_file = os.path.join(self.image_save_path, self.page_image_basename)self.element_image_basename = 'element_image' + time_for_filename+'.png'self.element_image_file = os.path.join(self.image_save_path, self.element_image_basename)

4、关于获取滑块的X与Y坐标

如果能够准确定位滑块的位置，就能通过使用模板匹配（matchTemplate）函数，从而计算出最佳匹配的缺口的位置。

那么，在存在干扰元素的情况下，如何确定【滑块小图】的位置？

5、思路

因滑块的x坐标值、y坐标值、宽度与高度这四种数据中，仅有y坐标值是未知的。如果能够计算得出y坐标值，那么就可以确定滑块的位置（即下图中粉色区域）。
如果滑块的y坐标值能够确定，那么缺口的y坐标值也就确定下来了。因缺口的x坐标值是未知的，可以横向绘制，画出缺口可能出现的区域（即下图中蓝色区域）。
通过获取这两个区域，再使用模板匹配（matchTemplate）函数进行计算最佳匹配位置，再从中取得缺口位置的x坐标值，即为滑块移动到缺口的距离。

6、那么，y坐标值如何获取？

在开始获取之前，先添加展示图片的函数（仅用于展示图片，正式使用时可剔除）

# 展示图片函数1
def cv_show_image1(self, img, show_title):if self.switch_show_image:cv2.imshow(show_title, img)cv2.waitKey(0)cv2.destroyAllWindows()# 展示图片函数2
def cv_show_image2(self, img1, show_title1, img2, show_title2):if self.switch_show_image:cv2.imshow(show_title1, img1)cv2.imshow(show_title2, img2)cv2.waitKey(0)cv2.destroyAllWindows()

7、边缘检测+轮廓检测函数

本函数将绘制出图片中所有轮廓。由于干扰元素的存在，在绘制出滑块轮廓与缺口轮廓的同时，还会出现干扰元素轮廓。

def canny_rect(self, image, show_image, file_string, time_now, scope_num=10, count=0):"""边缘检测+轮廓检测:param image: 传入要进行检测的图片:param show_image: 用于画线展示:param file_string: 作为文件名字符串:param time_now: 时间作为文件名字符串:param scope_num: 轮廓检测范围参数:param count: 计算递归次数:return: y_list_tmp"""y_list_tmp = []# 边缘检测（20和80分别为两个阈值）canny_rect = cv2.Canny(image, 20, 80)# 轮廓检测（返回所有识别的轮廓矩形）counts, _ = cv2.findContours(canny_rect, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)for c in counts:# x/y是矩阵左上点的坐标，w/h是矩阵的宽和高x, y, w, h = cv2.boundingRect(c)# 基于已知小图高度和宽度，去除不符合的高&宽if w >= self.small_image_width + scope_num or w <= self.small_image_width - scope_num:continueif h >= self.small_image_height + scope_num or h <= self.small_image_height - scope_num:continue# 记录所有匹配的y坐标值y_list_tmp.append(y)# 展示：对识别出的矩形，绘黑色线cv2.rectangle(show_image, (x, y), (x + w, y + h), (0, 0, 0), 1)# count开始计数，即上一次无法识别轮廓，保存本次识别轮廓图片，以备统计if count != 0:f_basename = self.element_image_basename.split('.')[0] + '_' + file_stringcheck_ele_image = os.path.join(self.failed_path, f_basename + '.png')self.cv_show_image1(show_image, f'canny_{file_string}_{count}')# 保存本次轮廓检测结果cv2.imwrite(check_ele_image, show_image)# 如果递归至0都无法识别轮廓，为防止无限递归。此处设置终止条件，以及返回值。if self.small_image_width <= 0:print(f'    *无法识别轮廓返回大图图片大小中间值')return [self.big_image_width // 2]# 本次识别没有符合条件的y值。递归，以扩展轮廓识别范围。if not y_list_tmp:scope_num += 2count += 1print(f"    *{count}. 无法检测到轮廓!增加检测大小{str(scope_num)}")return self.canny_rect(image, show_image, file_string, time_now, scope_num, count)else:return y_list_tmp

8、轮廓绘制结果

下图P1为滑块可能出现的区域（经裁剪得到）。然后在其中，对识别的轮廓进行绘线标记（得到黑色框，并记录所有y值）。
下图P2为大图，在其中对识别的轮廓进行绘线标记（得到黑色框，并记录所有y值）。其中包括有：滑块绘制框、缺口绘制框、干扰元素绘制框。
通过对所有绘制框提取y坐标值，合并到y_list列表中，然后取得列表中的中位数（即是我们需要的y值）。

9、绘制粉色区域、蓝色区域

得到y坐标值后，可以裁剪出粉色区域与蓝色区域。

def cropped_image(self, img_path):"""分析图片：裁剪出滑动小块的图片，与需要匹配的目标大图"""# 读取图片, 整个滑动验证码的截图img_imread = cv2.imread(img_path)# 大图：最后展示用img_draw_last = img_imread.copy()# 大图：裁剪&保存用img_copy_for_save1 = img_imread.copy()# 小图：裁剪&保存用img_copy_for_save2 = img_imread.copy()# 大图：灰度处理img_gray = cv2.cvtColor(img_imread, cv2.COLOR_BGR2GRAY)# 大图：高斯模糊img_blur = cv2.GaussianBlur(img_gray, (5, 5), 0)# 裁剪出小图可能出现的区域(上下左右收缩5像素，按已知宽度self.small_image_width裁剪)img_small_blur = img_blur[5:self.big_image_height - 5,self.small_image_x - 5:self.small_image_width + self.small_image_x + 8]# 对小图可能出现的区域进行边缘检测，返回得到的y值，存放在y_listy_list_slide = self.canny_rect(img_small_blur, img_draw_last, 'small', time.time())# 对整个大图进行边缘检测，返回得到的所有y值，存放在y_listy_list_big = self.canny_rect(img_blur, img_draw_last, 'big', time.time())y_list = y_list_slide + y_list_big'''在以上y_list数据中，即使包含干扰矩形的y值，也至少包含一个滑动小块矩形的y值，同时也至少包含一个缺口矩形的y值，由于滑动小块矩形与缺口矩形的y值都是一样的，并且干扰矩形，总是出现在缺口矩形的上方或下方，只需，取所有y的中间值，即是缺口矩形的y值，也是滑动小块的y值'''# 排序y_list.sort()# 取中位数median_y = int(np.median(y_list))self.small_image_y = median_yprint(f'干扰矩形的y值总分布在两端):{str(y_list)}, 所以y_list的中位数为正确值:{str(median_y)}')# 得到y_list的中位数：median_y，即可以开始裁剪小图与大图'''大图裁剪(即蓝色框部分)：x0 = self.small_image_x + self.small_image_width + 2 (已知宽度 + 滑动小块的起始像素 + 容错像素2)x1 = self.big_image_width - 10 (已知大图长度 + - 容错像素10)y0 = median_y (y_list中位数)y1 = median_y + self.small_image_height (y_list中位数 + 已知小图高度)'''# 计算裁剪的大图四个坐标值big_x0, big_x1 = self.small_image_x + self.small_image_width, self.big_image_width - 15big_y0, big_y1 = median_y, median_y + self.small_image_height# 裁剪坐标为[y0:y1, x0:x1]gap_possible_areas = img_copy_for_save1[big_y0:big_y1, big_x0:big_x1]# 裁剪后，保存大图(缺口可能出现的区域)cv2.imwrite(self.big_image_file, gap_possible_areas)'''小图裁剪，(即粉色框部分)：x0 = self.small_image_x (已知其实位置总是self.small_image_x=8)x1 = 已知其实x值self.small_image_x + 已知宽度self.small_image_widthy0 = median_y (y_list中位数)y1 = median_y + self.small_image_height (y_list中位数 + 已知小图高度)'''small_x0, small_x1 = self.small_image_x, self.small_image_x + self.small_image_widthsmall_y0, small_y1 = median_y, median_y + self.small_image_height - 2# 裁剪：裁剪坐标为[y0:y1, x0:x1]slider_possible_areas = img_copy_for_save2[small_y0:small_y1, small_x0:small_x1]# 裁剪后，保存小图(滑动小块可能出现的区域)cv2.imwrite(self.small_image_file, slider_possible_areas)"""最后展示"""# 画出粉色区域(参数：长方形框左上角坐标, 长方形框右下角坐标)cv2.rectangle(img_draw_last, (small_x0, small_y0), (small_x1, small_y1), (255, 155, 255), 2)# 画出蓝色区域(参数：长方形框左上角坐标, 长方形框右下角坐标)cv2.rectangle(img_draw_last, (big_x0, big_y0), (big_x1, big_y1), (220, 20, 60), 2)self.cv_show_image1(img_draw_last, 'img_draw_last')

10、区域绘制结果

粉色区域：见下图即滑块具体位置，裁剪并保存为self.small_image_file。
蓝色区域：见下图即缺口可能出现的区域，裁剪并保存为self.big_image_file。
得到这两部分区域后，再使用模板匹配（matchTemplate）函数处理。

11、matchTemplate函数处理得到x坐标值

对上述保存的粉色区域与蓝色区域进行计算，得出最佳的匹配位置，即是缺口位置。从而取得缺口的x坐标值。

def match_template(self):"""此处使用模板匹配方法（cv2.matchTemplate）：在整个图像区域发现与给定子图像匹配的小块区域self.big_image_file：大图self.small_image_file：小图，即子图:return: 匹配的区域的x坐标"""# 加载大图big_image_rgb = cv2.imread(self.big_image_file)# 大图处理：灰度big_image_gray = cv2.cvtColor(big_image_rgb, cv2.COLOR_RGB2GRAY)# 加载小图small_image_rgb = cv2.imread(self.small_image_file)# 小图处理：灰度small_image_gray = cv2.imread(self.small_image_file, 0)# matchTemplate模板匹配：在整个图像区域发现与给定子图像匹配的小块区域res = cv2.matchTemplate(big_image_gray, small_image_gray, cv2.TM_CCOEFF_NORMED)value = cv2.minMaxLoc(res)# 匹配的x坐标值match_template_x = value[2][0]print("匹配的x坐标值", match_template_x)# 画出矩形框，展示cv2.rectangle(big_image_rgb,(match_template_x, self.small_image_width),(match_template_x + self.small_image_height, 0),(0, 0, 205), 3)self.cv_show_image2(small_image_rgb, "small_image", big_image_rgb, "big_image")# 匹配的x坐标值，加上被裁剪掉的self.small_image_width数值，才是滑块需要移动的距离return match_template_x + self.small_image_width

12、最后

以上match_template函数的返回值，正是滑块移动到缺口需要的距离。获取该值，即可通过python控制滑块的移动，在缺口处准确释放（需模拟人为移动轨迹），从而成功破解滑动验证码，迈出突破人机验证的第一步。

13、补充

另外，下方补充导入的模块，以及最终绘图验证的函数。本文仅分享到这里，后续操作，读者可从其他文章获得，谢谢。

# 导入的模块
import datetime
import os
import random
import time
from time import sleep
from PIL import Image
import cv2
from selenium.webdriver import ActionChains
import numpy as np# 绘制验证函数
def draw_the_gap(self, image_path):"""本函数仅用于验证：加载原大图，获取缺口x值，对缺口绘制红框验证:param image_path: 大图路径"""# 画图验证：匹配x坐标值 + 滑块起始x坐标值draw_x = self.match_template() + self.small_image_xele_image = cv2.imread(image_path)# 绘制红色框，以验证cv2.rectangle(ele_image,(draw_x, self.small_image_y),(draw_x + self.small_image_width, self.small_image_y + self.small_image_height),(0, 0, 205), 3)# 展示self.cv_show_image1(ele_image, 'check')# 定义文件名与保存image_path_base = os.path.basename(image_path)check_ele_image = os.path.join(self.image_save_path, 'check_' + image_path_base)cv2.imwrite(check_ele_image, ele_image)