在做自然场景下的文字检测算法时,第一步就是要标注文字位置的label,如果手动从头开始标注就太累了,所以我们小组总结出一种方法,试了试比较高效,因此分享出来。

先使用一个基础的baseline算法,然后把真实的样本过一遍baseline的模型,每张样本图片会生成相应的bounddingbox信息的txt文件,接着使用脚本将这些txt文件反转成标注软件能识别的xml格式(或者json),然后将这些xml格式的文件导入到标注软件,打开标注软件的时候,就可以看到bounddingbox在图片上的编辑位置,最后就可以对这些方框进行编辑修改,正确的可以忽略,错误的进行调整。

对于标注软件,我们使用过标注精灵助手IphotoDraw两款软件,最后还是觉得IphotoDraw功能比较强大和方便,所以最后选择了IphotoDraw作为标注软件,下面是将baseline模型跑的boundingbox位置点信息txt文件,转成IphotoDraw能识别的xml格式文件:

代码如下:

# -*- coding: utf-8 -*-
"""
Created on Fri Feb  2 09:57:25 2018@author: new
"""
import os
import math
import numpy as np# 将txt中所有四边形角点生成对应xml文件,进行标注def endWith(s,*endstring):array=map(s.endswith,endstring)if True in array:return Trueelse:return Falsedef writeXml(xmlName,txtName,imageName):xml_file=open(xmlName,'w',encoding='utf-8')xml_file.write('<!--Document-->\n')xml_file.write('<Document FileVersion="1.0">\n')xml_file.write('    <ExportImageSettings FileName='+'"'+imageName+'"'+'/>\n')xml_file.write('    <Layers>\n')xml_file.write('    <Layer Name="Layer1" Visible="True" LockedShapesIndex="">\n')xml_file.write('        <Shapes>\n')fopen=open(txtName,encoding='utf-8')for line in fopen.readlines():# txtData = line.split(',')# print(txtData)p,w,h,angle,label = get_coords(line)if w > 0 and h > 0:num = 3text = labelxml_file.write('<Shape Type="Rectangle">\n')xml_file.write('<Settings>\n')xml_file.write('<MiscSettings GroupRendering="Unknown" />\n')xml_file.write('<Font Name="Arial" Size="4" Style="Regular">\n')xml_file.write('<Color Alpha="255" R="0" G="0" B="0" />\n')xml_file.write('</Font>\n')xml_file.write('<Line Width="1" Dash="Solid" Join="Round" OutlineType="Color" DashOffset="False" StartRoundCap="False" EndRoundCap="False">\n')if int(float(num)) % 3 == 0:xml_file.write('<Color Alpha="255" R="26" G="170" B="66" />\n')elif int(float(num)) % 3 == 1:xml_file.write('<Color Alpha="255" R="34" G="72" B="234" />\n')elif int(float(num)) % 3 == 2:xml_file.write('<Color Alpha="255" R="255" G="0" B="0" />\n')xml_file.write('</Line>\n')xml_file.write('<Fill FillType="None">\n')# if int(float(num)) % 3 == 0:#     xml_file.write('<Color Alpha="58" R="26" G="170" B="66" />\n')# elif int(float(num)) % 3 == 1:#     xml_file.write('<Color Alpha="58" R="34" G="72" B="234" />\n')# elif int(float(num)) % 3 == 2:#     xml_file.write('<Color Alpha="58" R="234" G="22" B="30" />\n')xml_file.write('<Color Alpha="255" R="255" G="255" B="255" />\n')xml_file.write('<GradientSettings Type="Linear" Angle="0" HorizontalOffset="0" VerticalOffset="0" StartExtension="0" EndExtension="0" BoundaryResize="100">\n')xml_file.write('<StartingColor Alpha="255" R="0" G="0" B="0" />\n')xml_file.write('<EndingColor Alpha="255" R="255" G="255" B="255" />\n')xml_file.write('<Blend />')xml_file.write('</GradientSettings>')xml_file.write('<EmbeddedImage Align="Center" ImageFillType="Stretch" Alpha="255" FileName="">\n')xml_file.write('<StretchSettings Type="KeepOriginalSize" Align="Center" ZoomFactor="100">\n')xml_file.write('<Offset X="0" Y="0" />\n')xml_file.write('</StretchSettings>\n')xml_file.write('<TileSettings WrapMode="Tile">\n')xml_file.write('<Offset X="0" Y="0" />\n')xml_file.write('</TileSettings>\n')xml_file.write('<ImageOptions Rotation="0">\n')xml_file.write('<Flip HorizontalFlip="False" VerticalFlip="False" />\n')xml_file.write('</ImageOptions>\n')xml_file.write('<ImageData><![CDATA[]]></ImageData>\n')xml_file.write('</EmbeddedImage>\n')xml_file.write('</Fill>\n')xml_file.write('<TextEffect UseTextEffect="False" />\n')xml_file.write('<EffectSettings>\n')xml_file.write('<Shadow UseShadow="False" Angle="45" Offset="5" Size="100" BlurLevel="0">\n')xml_file.write('<Color Alpha="255" R="0" G="0" B="0" />\n')xml_file.write('</Shadow>\n')xml_file.write('<Glow UseGlow="False" BlurLevel="20" Thickness="8">\n')xml_file.write('<Color Alpha="255" R="29" G="199" B="244" />\n')xml_file.write('</Glow>\n')xml_file.write('<WavyLine UseWavyLine="False" WavePattern="CosineSmooth" Ridges="5" Height="20" VerticalFlip="False" OffsetAtStartPoint="0" OffsetAtEndPoint="0" />\n')xml_file.write('</EffectSettings>\n')xml_file.write('</Settings>\n')xml_file.write('<BlockText Align="Center" VerticalAlign="Middle" RightToLeft="Unknown">\n')xml_file.write('<Text>' + text + '</Text>\n')xml_file.write('<Margin Left="0" Top="0" Right="0" Bottom="0" />\n')xml_file.write('</BlockText>\n')xml_file.write('<Data IsRoundCorner="False" RoundCornerRadius="0" Rotation="' + str(angle) +'">\n')xml_file.write('<Extent X=')xml_file.write('"' + str(p[0]) + '"')xml_file.write(' Y=')xml_file.write('"' + str(p[1]) + '"')xml_file.write(' Width=')xml_file.write('"' + str(w) + '"')xml_file.write(' Height=')xml_file.write('"' + str(h) + '"')xml_file.write('/>\n')xml_file.write('</Data>\n')xml_file.write('</Shape>\n')xml_file.write('</Shapes>\n')xml_file.write('</Layer>\n')xml_file.write(' </Layers>\n')xml_file.write('<Snapshots />\n')xml_file.write('</Document>')xml_file.close()fopen.close()def get_new_coord(center_coord,ori_coord,rotate_angle):x_new = (ori_coord[0]-center_coord[0])*math.cos((rotate_angle/180.)*math.pi)+(ori_coord[1]-center_coord[1])*math.sin((rotate_angle/180.)*math.pi)+center_coord[0]y_new = (ori_coord[1]-center_coord[1])*math.cos((rotate_angle/180.)*math.pi)-(ori_coord[0]-center_coord[0])*math.sin((rotate_angle/180.)*math.pi)+center_coord[1]return x_new,y_newdef get_rotation_coord(iphotodraw_result):result = []center_X = 1/2*(iphotodraw_result[0]+iphotodraw_result[0]+iphotodraw_result[2])center_Y = 1/2*(iphotodraw_result[1]+iphotodraw_result[1]+iphotodraw_result[3])x1,y1 = iphotodraw_result[0],iphotodraw_result[1]x2,y2 = iphotodraw_result[0]+iphotodraw_result[2],iphotodraw_result[1]x3,y3 = iphotodraw_result[0]+iphotodraw_result[2],iphotodraw_result[1]+iphotodraw_result[3]x4,y4 = iphotodraw_result[0],iphotodraw_result[1]+iphotodraw_result[3]result.extend(cal_coord((center_X,center_Y),(x1,y1),iphotodraw_result[-1]))result.extend(cal_coord((center_X,center_Y),(x2,y2),iphotodraw_result[-1]))result.extend(cal_coord((center_X,center_Y),(x3,y3),iphotodraw_result[-1]))result.extend(cal_coord((center_X,center_Y),(x4,y4),iphotodraw_result[-1]))return resultdef cal_coord(center_coord,ori_coord,angle):angle = angle*math.pi/180out_x = math.cos(angle)*(ori_coord[0]-center_coord[0])-math.sin(angle)*(ori_coord[1]-center_coord[1])+center_coord[0]out_y = math.sin(angle)*(ori_coord[0]-center_coord[0])+math.cos(angle)*(ori_coord[1]-center_coord[1])+center_coord[1]return [out_x,out_y]def coord_to_iphotodrawFormat(bbox):### bbox shape 1*8angle = math.atan((bbox[3]-bbox[1])/(bbox[2]-bbox[0]))*(180/math.pi)width = math.sqrt((bbox[3]-bbox[1])**2+(bbox[2]-bbox[0])**2)height = math.sqrt((bbox[5]-bbox[3])**2+(bbox[4]-bbox[2])**2)center_coord = [1 / 2 * (bbox[0] + bbox[4]), 1 / 2 * (bbox[1] + bbox[5])]ori_coord = [bbox[0], bbox[1]]ori_coord = cal_coord(center_coord, ori_coord,-math.atan((bbox[3] - bbox[1]) / (bbox[2] - bbox[0])) * 180 / math.pi)return (ori_coord[0],ori_coord[1]),width,height,angledef get_coords(line):"""文本检测结果:param txt_path: CTPN结果路径:return:"""try:label = line.strip().split(',')[-1]point1_x = int(line.strip().split(',')[0]) #右下角点,顺时针point1_y = int(line.strip().split(',')[1])point1 = [point1_x,point1_y]point2_x = int(line.strip().split(',')[2])point2_y = int(line.strip().split(',')[3])point2 = [point2_x, point2_y]point3_x = int(line.strip().split(',')[4])point3_y = int(line.strip().split(',')[5])point3 = [point3_x, point3_y]point4_x = int(line.strip().split(',')[6])point4_y = int(line.strip().split(',')[7])point4 = [point4_x, point4_y]bbox = []bbox.extend(point1)bbox.extend(point2)bbox.extend(point3)bbox.extend(point4)p,w,h,angle = coord_to_iphotodrawFormat(bbox)return p, w, h, angle, labelexcept:return (0,0), 0, 0, 0, '#'def sortPoint(points,center):'''将四边形的四个角点进行排序,返回 左上角,左下角,右下角,右上角:param point_list::return:'''idx_list = np.where(np.array(points)[:, 0] > center[0])[0]right_point = [points[idx] for idx in idx_list]right_point = sorted(right_point, key=lambda x: x[1])right_up = right_point[0]right_bottom = right_point[1]idx_list = np.where(np.array(points)[:, 0] < center[0])[0]left_point = [points[idx] for idx in idx_list]left_point = sorted(left_point, key=lambda x: x[1])left_up = left_point[0]left_bottom = left_point[1]return left_up,left_bottom,right_bottom,right_upimport cv2if __name__=='__main__':#生成的xml在basepath文件夹下basepath = r'xxx'out_path = r'xxx'import shutilfrom PIL import Imageif not os.path.exists(out_path):os.makedirs(out_path)# jpgNames = os.listdir(basepath)# for name in jpgNames:#     if endWith(name,'.txt'):##         xmlName = name.split('.')[0].split('_')[0] + '_data.xml'#         imageName = name.split('.')[0].split('_')[0] + '.jpg'##         if os.path.exists(os.path.join(basepath,imageName)):#             xmlName = os.path.join(basepath,xmlName)#             txtName = os.path.join(basepath,name)#             writeXml(xmlName,txtName,imageName)jpgNames = os.listdir(basepath)for name in jpgNames:if endWith(name, '.txt'):print(name)xmlName = name.replace('.txt', '_data.xml')# IMIMkey = name.replace('_gt_for_xml.txt','').split('_')[-1]imageName = name.replace('.txt', '.jpg')if os.path.exists(os.path.join(basepath, imageName)):xmlName = os.path.join(out_path, xmlName)txtName = os.path.join(basepath, name)# image = Image.open(os.path.join(basepath, imageName))# image = image.convert('RGB')# image = np.array(image)image = cv2.imread(os.path.join(basepath, imageName))cv2.imwrite(os.path.join(out_path, imageName),image)# shutil.copy( imageName,os.path.join(out_path,name.replace('_gt_for_xml.txt', '.jpg') ))writeXml(xmlName, txtName, imageName)

如何更高效的标注文字检测算法的label(二)相关推荐

  1. 基于深度学习的目标检测算法综述(二)

    转自:https://zhuanlan.zhihu.com/p/40020809 基于深度学习的目标检测算法综述(一) 基于深度学习的目标检测算法综述(二) 基于深度学习的目标检测算法综述(三) 本文 ...

  2. 【文字检测算法整理】

    文字检测与其他目标检测的区别: 一.长宽比差异很大,而且普遍较小: 二.文字是以字符为基本单元按照一定空间排列而成的序列,而不是一个单独的目标: 三.文字存在多种粒度和多语言. 传统方法系列: 一.流 ...

  3. SlimYOLOv3:更窄、更快、更好的无人机目标检测算法

    点击我爱计算机视觉标星,更快获取CVML新技术 无人机因为硬件计算能力较弱,要在其上实现实时的目标检测,需要算法参数量小.占用内存少.推断时间短.常见的算法往往难以直接应用. 一种比较直接的做法是对模 ...

  4. 前景检测算法(十二)--基于模糊Choquet积分

    原文: http://www.cnblogs.com/pangblog/p/3303956.html 本文根据论文:Fuzzy Integral for Moving Object Detection ...

  5. ACM MM'21 | 超轻量8.5M!更高效的RGB-D显著性检测模型DFM-Net

    点击上方"3D视觉工坊",选择"星标" 干货第一时间送达 自深度传感器的普及以来,RGB-D显著物体检测(Salient object detection,SO ...

  6. 场景文本检测算法 可微分二值化DBNet原理与代码解析

    目录 原理介绍 Label Generation Loss函数 后处理 论文 https://arxiv.org/abs/1911.08947 代码 https://github.com/WenmuZ ...

  7. 自然场景下的文字检测:从多方向迈向任意形状

    点击我爱计算机视觉标星,更快获取CVML新技术 本文经作者授权转载自知乎旷视Detection组专栏: https://zhuanlan.zhihu.com/p/68058851 旷视检测组在刚刚结束 ...

  8. 【项目实践】中英文文字检测与识别项目(CTPN+CRNN+CTC Loss原理讲解)

    点击上方"小白学视觉",选择加"星标"或"置顶" 重磅干货,第一时间送达 本文转自:opencv学堂 OCR--简介 文字识别也是图像领域一 ...

  9. 大盘点|YOLO 系目标检测算法总览

    点击上方"3D视觉工坊",选择"星标" 干货第一时间送达 YOLO目标检测算法诞生于2015年6月,从出生的那一天起就是"高精度.高效率.高实用性&q ...

最新文章

  1. SRIO学习(六)——Direct I/O 操作(一)
  2. AJAX中日历控件的应用
  3. MongoDB 分布式部署教程
  4. IntelliJ IDEA不好用?那是因为没掌握这些技巧
  5. Azure Cosmos DB技术性解读
  6. jQuery环境搭建
  7. Dijkstra求最短路 II
  8. python webdriver 登录163邮箱发邮件加附件, 外加数据和程序分离,配置文件的方式...
  9. PythonWEB框架之Tornado
  10. html中常见汉字字体的英文名称
  11. python小白的word转excel
  12. 深交所股票交易接口的概述
  13. Oracle 中 将多行列值按照顺序合并成单值输出(connect by)
  14. 中国剩余定理及其代码实现
  15. 求教如何实现VB.NET控件随着窗体的放大而放大
  16. python 字符串方法总结
  17. Day20 python__new__、单态模式、析构方法、常用魔术方法、__str__、__repr__、__bool__ 、__len__...
  18. 【渝粤教育】电大中专电子商务网站建设与维护 (18)作业 题库
  19. 如何轻松把mysql数据表对齐?!正解在这儿
  20. 阿里中台变“厚”,企业中台路在何方?

热门文章

  1. iUI:移动Web应用开发必备的开源框架
  2. 记录一下AD画简单PCB的大致流程
  3. 打通深度学习的“任督二脉”——入门必备
  4. 可视化图表进阶教程:业务数据地图的绘制
  5. 教你在Linux上下载qq 云计算基础阶段14
  6. 【知识图谱】Py2neo操作Neo4j使用教程
  7. Hadoop的应用场景
  8. PTA-时间换算(15分)
  9. 2020届电子信息类专业保研经历分享
  10. [专栏精选]Unity动画系统的IK详解