monk js

计算机视觉 (Computer Vision)

介绍 (Introduction)

This is an article on how Object Detection can help us in predicting various regions of a document. It can be useful in cropping out headlines, paragraphs, tables, images, etc. from a document image that can be later processed to get desired information from them as per the need. We compare the performance of 3 different Object Detection Architectures for this task, i.e., YOLOv3, Faster-RCNN, and SSD512, and use Monk Library to load these models.

这是一篇有关“对象检测”如何帮助我们预测文档各个区域的文章。它可用于从文档图像中裁剪标题，段落，表格，图像等，然后可以对其进行处理以根据需要从中获取所需的信息。我们针对此任务比较了3种不同的对象检测体系结构的性能，即YOLOv3，Faster-RCNN和SSD512，并使用Monk库加载这些模型。

Detailed Tutorial on Github.

关于Github的详细教程。

关于数据集 (About the Dataset)

The training dataset used for this task is PRImA Layout Analysis Dataset. It includes a wide variety of different document types, reflecting various challenges in layout analysis. Particular emphasis is placed on:

用于此任务的训练数据集是PRImA布局分析数据集。它包括各种不同的文档类型，反映了布局分析中的各种挑战。特别强调：

Magazine scans from a variety of mainstream news, business, and technology publications which contain a mixture of simple and complex layouts (e.g. non-Manhattan, with varying font sizes, etc.)杂志扫描来自各种主流新闻，商业和技术出版物，这些出版物包含简单和复杂的布局(例如，非曼哈顿，具有不同的字体大小等)。
Technical articles on a variety of disciplines, including papers in journals and conference proceedings, with both simple and complex layouts present.有关各种学科的技术文章，包括期刊和会议论文集的论文，并提供简单和复杂的版式。

The dataset contains 18 labels, namely, ‘caption’, ‘chart’, ‘credit’, ‘drop-capital’, ‘floating’, ‘footer’, ‘frame’, ‘graphics’, ‘header’, ‘heading’, ‘image’, ‘linedrawing’, ‘maths’, ‘noise’, ‘page-number’, ‘paragraph’, ‘separator’ and ‘table’

数据集包含18个标签，即“标题”，“图表”，“信用”，“首字母大写”，“浮动”，“页脚”，“框架”，“图形”，“页眉”，“标题”， “图像”，“线条图”，“数学”，“噪声”，“页码”，“段落”，“分隔符”和“表格”

It can be downloaded from here.

可以从这里下载。

和尚AI： (Monk AI :)

Monk object detection is a collection of all object detection pipelines. The benefit is two-fold for each pipeline- make the installation compatible for multiple OS, Cuda versions, and python versions, and make it low code with a standardized flow of things. Monk object detection enables a user to solve a computer vision problem in very few lines of code. For this task, we’ll be using 3 different pipelines of this library for 3 different architectures- yolov3, gluoncv_finetune, and mxrcnn.

和尚对象检测是所有对象检测管道的集合。每个管道的好处是双重的-使安装兼容多个OS，Cuda版本和python版本，并通过标准的流程使其成为低代码。和尚对象检测使用户可以用很少的几行代码解决计算机视觉问题。对于这个任务，我们将使用这个库的3个不同的管道3种不同的architectures- yolov3 ， gluoncv_finetune 和 mxrcnn 。

目录 (Table of Contents)

Installing Monk Object Detection Toolkit安装和尚对象检测工具包
Using the Pre-trained model for the Document Layout Analysis Task将预训练模型用于文档布局分析任务
Training your own Model训练自己的模型

Downloading and Pre-Processing Data (Format Conversion, Selective Data Augmentation)

下载和预处理数据(格式转换，选择性数据增强)
Training the model from Scratch

从头开始训练模型

4. Inference and Comparison

4.推论与比较

1.安装和尚对象检测工具包 (1. Installing Monk Object Detection Toolkit)

First of all, clone the library to your system using the following command:

首先，使用以下命令将库克隆到您的系统：

! git clone https://github.com/Tessellate-Imaging/Monk_Object_Detection.git

Then, choose the pipeline that you want to install and the correct requirements file of that pipeline depending on your system’s CUDA version or Colab version. These are the commands for the pipelines that I’ve used for this task:

然后，根据系统的CUDA版本或Colab版本，选择要安装的管道以及该管道的正确要求文件。这些是我用于此任务的管道的命令：

#For yolov3 (used for yolov3 architecture)! cd Monk_Object_Detection/7_yolov3/installation && cat requirements.txt | xargs -n 1 -L 1 pip install#For gluoncv_finetune (used for SSD512 architecture)! cd Monk_Object_Detection/1_gluoncv_finetune/installation && cat requirements_cuda10.1.txt | xargs -n 1 -L 1 pip install#For mxrcnn (used for FasterRCNN architecture)! cd Monk_Object_Detection/3_mxrcnn/installation && cat requirements_cuda10.1.txt | xargs -n 1 -L 1 pip install

For more pipelines or ways to install visit Monk Object Detection Library.

有关更多管道或安装方式的信息，请访问Monk对象检测库。

2.将预训练的模型用于文档布局分析任务 (2. Using the Pre-trained model for the Document Layout Analysis Task)

If you don’t want to train the model on your own, and just want to use the model that we’ve trained for the task, you can use the following piece of code to directly use it:

如果您不想自己训练模型，而只想使用我们为该任务训练的模型，则可以使用以下代码直接使用它：

对于YOLOv3： (For YOLOv3:)

import osimport sysfrom IPython.display import Imagesys.path.append("Monk_Object_Detection/7_yolov3/lib")from infer_detector import Infergtf = Infer()

Download and initialize the pre-trained model:

下载并初始化预训练的模型：

! wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$(wget --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate 'https://docs.google.com/uc?export=download&id=1Si1puABMiijtvLvH-XMnr2pVj4K2lUkO' -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')&id=1Si1puABMiijtvLvH-XMnr2pVj4K2lUkO" -O obj_dla_yolov3_trained.zip && rm -rf /tmp/cookies.txt! unzip -qq obj_dla_yolov3_trained.zip! mv dla_yolov3/yolov3.cfg .f = open("dla_yolov3/classes.txt")class_list = f.readlines()f.close()model_name = "yolov3"weights = "dla_yolov3/dla_yolov3.pt"gtf.Model(model_name, class_list, weights, use_gpu=True, input_size=416)

And you can test it:

您可以对其进行测试：

#change test1 to whatever image you want it to test for.img_path = "test1.jpg"gtf.Predict(img_path, conf_thres=0.3, iou_thres=0.5)Image(filename='output/test1.jpg')

对于SSD512： (For SSD512:)

import osimport syssys.path.append("Monk_Object_Detection/1_gluoncv_finetune/lib/")from inference_prototype import Infer

Download and initialize the pre-trained model:

下载并初始化预训练的模型：

! wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$(wget --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate 'https://docs.google.com/uc?export=download&id=1E6T7RKGwy-v1MUxVJm-rxt5XcRyr2SQ7' -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')&id=1E6T7RKGwy-v1MUxVJm-rxt5XcRyr2SQ7" -O obj_dla_ssd512_trained.zip && rm -rf /tmp/cookies.txt! unzip -qq obj_dla_ssd512_trained.zipmodel_name = "ssd_512_vgg16_atrous_coco";params_file = "dla_ssd512/dla_ssd512-vgg16.params";class_list = ["paragraph", "heading", "credit", "footer", "drop-capital", "floating", "noise", "maths", "header", "caption", "image", "linedrawing", "graphics", "fname", "page-number", "chart", "separator", "table"]gtf = Infer(model_name, params_file, class_list, use_gpu=True)

And you can test it:

您可以对其进行测试：

#change test1 to whatever image you want it to test for.img_name = "test1.jpg"  visualize = True thresh = 0.3output = gtf.run(img_name, visualize=visualize, thresh=thresh)

对于Faster-RCNN： (For Faster-RCNN:)

import osimport syssys.path.append("Monk_Object_Detection/3_mxrcnn/lib/")sys.path.append("Monk_Object_Detection/3_mxrcnn/lib/mx-rcnn")from infer_base import *

Download and initialize the pre-trained model:

下载并初始化预训练的模型：

! wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$(wget --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate 'https://docs.google.com/uc?export=download&id=1TZQSBiMDBrGhcT75AknTbofirSFXprt8' -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')&id=1TZQSBiMDBrGhcT75AknTbofirSFXprt8" -O obj_dla_faster_rcnn_trained.zip && rm -rf /tmp/cookies.txt! unzip -qq obj_dla_faster_rcnn_trained.zipclass_file = set_class_list("dla_fasterRCNN/classes.txt")set_model_params(model_name="vgg16", model_path="dla_fasterRCNN/dla_fasterRCNN-vgg16.params")set_hyper_params(gpus="0", batch_size=1)set_img_preproc_params(img_short_side=300, img_long_side=500, mean=(196.45086004329943, 199.09071480252155, 197.07683846968297), std=(0.25779948968052024, 0.2550292865960972, 0.2553027154941914))initialize_rpn_params()initialize_rcnn_params()sym = set_network()mod = load_model(sym)

And you can test it:

您可以对其进行测试：

#change test1 to whatever image you want it to test for.set_output_params(vis_thresh=0.9, vis=True)Infer("test1.jpg", mod);

3.训练自己的模型 (3. Train Your Own Model)

资料准备 (Data Preparation)

The dataset can be downloaded using the following command:

可以使用以下命令下载数据集：

! wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate 'https://docs.google.com/uc?export=download&id=1iBfafT1WHAtKAW0a1ifLzvW5f0ytm2i_' -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')&id=1iBfafT1WHAtKAW0a1ifLzvW5f0ytm2i_" -O PRImA_Layout_Analysis_Dataset.zip && rm -rf /tmp/cookies.txt! unzip -qq PRImA_Layout_Analysis_Dataset.zip

All the images in the dataset are in TIFF format. Training on TIFF images was over 5x slower than JPEG format images because of their huge size. Therefore, TIFF images were converted to JPEG format images.

数据集中的所有图像均为TIFF格式。由于TIFF图像的尺寸很大，因此其训练速度比JPEG格式的图像慢5倍以上。因此，TIFF图像被转换为JPEG格式的图像。

for name in glob.glob(root_dir+img_dir+'*.tif'):     im = Image.open(name)     name = str(name).rstrip(".tif")     name = str(name).lstrip(root_dir)     name = str(name).lstrip(img_dir)     im.save(final_root_dir+ img_dir+ name + '.jpg', 'JPEG')

The data is present in the VOC format. To use it with various pipelines, we first convert it to Monk format, which is directly compatible with a lot of Monk pipelines, and later on, we can easily convert it to some other format if required. If you want to skip converting to Monk format and want to directly convert it to some other required format, then you can check out that pipelines’ example notebooks here.

数据以VOC格式显示。要在各种管道中使用它，我们首先将其转换为与许多Monk管道直接兼容的Monk格式，然后，如果需要，我们可以轻松地将其转换为其他格式。如果您想跳过转换为Monk格式并想直接将其转换为其他所需格式，则可以在此处查看管道的示例笔记本。

Monk Format

和尚格式

./Document_Layout_Analysis/ (final_root_dir)      |      |-----------Images (img_dir)      |              |      |              |------------------img1.jpg      |              |------------------img2.jpg      |              |------------------.........(and so on)      |      |      |-----------train_labels.csv (anno_file)

Annotation file format

批注文件格式

| Id         | Labels                                 || img1.jpg   | x1 y1 x2 y2 label1 x1 y1 x2 y2 label2  |

Labels: xmin ymin xmax ymax label标签：xmin ymin xmax ymax标签
xmin, ymin — top left corner of the bounding boxxmin，ymin —边界框的左上角
xmax, ymax — bottom right corner of the bounding boxxmax，ymax —边界框的右下角

The code for data conversion is straight-forward but very long. You can check out the code in one of the notebooks here.

数据转换的代码很简单但是很长。您可以在此处的其中一本笔记本中签出代码。

Following are the format requirements for various pipelines used for this task:

以下是用于此任务的各种管道的格式要求：

yolov3 pipeline used for YOLOv3 architecture required data in YOLOv3 format. You can check out this conversion in this notebook.

用于YOLOv3体系结构的yolov3管道需要YOLOv3格式的数据。您可以在此笔记本中查看此转换。
gluoncv-finetune pipeline used for SSD512 architecture directly takes in Monk Format for training. So, there was no need for further conversion.

用于SSD512架构的gluoncv-finetune管道直接采用Monk格式进行培训。因此，无需进一步转换。
mxrcnn pipeline used for Faster-RCNN architecture required data in COCO format. You can check out this conversion in this notebook.

用于Faster-RCNN体系结构的mxrcnn管道需要COCO格式的数据。您可以在此笔记本中查看此转换。

选择性数据扩充 (Selective Data Augmentation)

There was an issue with the dataset. As most part of a document is text, there were far more paragraphs in the dataset than there were other labels such as tables or graphs. To handle this huge bias in the dataset, we augmented only those document images which had one of these minority labels in them. For example, if the document only had paragraphs and images, then we didn’t augment it. But if it had tables, charts, graphs or any other minority label, we augmented that image by many folds. This process helped in reducing the bias in the dataset by around 25%. This selection and augmentation has been done during the format conversion from VOC to Monk Format. You can check out the code in one of the notebooks here.

数据集存在问题。由于文档的大部分是文本，因此数据集中的段落比其他标签(例如表格或图形)要多得多。为了处理数据集中的这种巨大偏差，我们只对其中具有少数标签之一的那些文档图像进行扩充。例如，如果文档仅包含段落和图像，则我们不会对其进行扩充。但是，如果它具有表格，图表，图形或其他任何少数标签，我们会将其图像放大很多倍。此过程有助于将数据集中的偏差减少约25％。在从VOC转换为Monk格式的过程中，已经完成了这种选择和扩充。您可以在此处的其中一本笔记本中签出代码。

For data augmentation, we have used the Albumentations library. It offers a lot of different ways to augment data, such as random cropping, translation, hue, saturation, contrast, brightness, etc. You can check more about this library here. It can be directly installed using pip command:

对于数据扩充，我们使用了Albumentations库。它提供了许多不同的方法来扩充数据，例如随机裁剪，平移，色调，饱和度，对比度，亮度等。您可以在此处查看有关此库的更多信息。可以使用pip命令直接安装：

! pip install albumentations

Following is the function that we wrote for data augmentation. There were few cases where bounding boxes were going out of the image and Albumentations library wasn’t able to handle it, so we’ve written a custom function to make sure that labels are inside the image.

以下是我们为数据扩充编写的功能。在少数情况下，边界框从图像中移出并且Albumentations库无法处理它，因此我们编写了一个自定义函数来确保标签位于图像中。

def augmentData(fname, boxes):    image = cv2.imread(final_root_dir+img_dir+fname)    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

    transform = A.Compose([        A.IAAPerspective(p=0.7),           A.ShiftScaleRotate(shift_limit=0.1, scale_limit=0.1, rotate_limit=5, p=0.5),        A.IAAAdditiveGaussianNoise(),        A.ChannelShuffle(),        A.RandomBrightnessContrast(),        A.RGBShift(p=0.8),        A.HueSaturationValue(p=0.8)        ], bbox_params=A.BboxParams(format='pascal_voc', min_visibility=0.2))

    for i in range(1, 9):        label=""        transformed = transform(image=image, bboxes=boxes)        transformed_image = transformed['image']        transformed_bboxes = transformed['bboxes']        #print(transformed_bboxes)        flag=False        for box in transformed_bboxes:            x_min, y_min, x_max, y_max, class_name = box            if(xmax<=xmin or ymax<=ymin):                flag=True                break            label+= str(int(x_min))+' '+str(int(y_min))+' '+str(int(x_max))+' '+str(int(y_max))+' '+class_name+' '

        if(flag):            continue        cv2.imwrite(final_root_dir+img_dir+str(i)+fname, transformed_image)        label=label[:-1]        combined.append([str(i) + fname, label])

计算数据集的均值和标准差 (Calculating Mean and Standard deviation of the dataset)

The mxrcnn pipeline (used for Faster-RCNN) also requires mean and standard deviation as one of the parameters. It can be calculated using the following function:

mxrcnn管道(用于Faster-RCNN)也需要平均值和标准偏差作为参数之一。可以使用以下函数进行计算：

def normalize():    channel_sum = np.zeros(3)    channel_sum_squared = np.zeros(3)    num_pixels=0    count=0    for file in files:        file_path=final_root_dir+img_dir+file        img=cv2.imread(file_path)        img= img/255.        num_pixels += (img.size/3)        channel_sum += np.sum(img, axis=(0, 1))        channel_sum_squared += np.sum(np.square(img), axis=(0, 1))    mean = channel_sum / num_pixels    std = np.sqrt((channel_sum_squared/num_pixels) - mean**2)

    #bgr to rgb conversion    rgb_mean = list(mean)[::-1]    rgb_std = list(std)[::-1]    return rgb_mean, rgb_stdmean, std = normalize()mean=[x*255 for x in mean]

训练自己的模型 (Train Your Own Model)

This is where the real power of Monk Library kicks in. Writing code for Object detection architectures can be a very tedious task, but it can be achieved in very few lines of code using Monk Object Detection Library.

这就是Monk库真正的功能所在。为对象检测体系结构编写代码可能是一项非常繁琐的任务，但是使用Monk对象检测库只需几行代码即可实现。

For the comparison purposes, all 3 architectures have been trained for 30 epochs with a learning rate of 0.003.

为了进行比较，所有3种架构都经过了30个时期的培训，学习率为0.003。

For YOLOv3:

对于YOLOv3：

import osimport syssys.path.append("Monk_Object_Detection/7_yolov3/lib")from train_detector import Detectorgtf = Detector()#dataset directoriesimg_dir = "Document_Layout_Analysis/Images/"label_dir = "Document_Layout_Analysis/labels/"class_list_file = "Document_Layout_Analysis/classes.txt"gtf.set_train_dataset(img_dir, label_dir, class_list_file, batch_size=16)gtf.set_val_dataset(img_dir, label_dir)gtf.set_model(model_name="yolov3")#sgd is found out to perform better than adam optimiser on this taskgtf.set_hyperparams(optimizer="sgd", lr=0.003, multi_scale=False, evolve=False)gtf.Train(num_epochs=30)

For Faster-RCNN:

对于Faster-RCNN：

import osimport syssys.path.append("Monk_Object_Detection/3_mxrcnn/lib/")sys.path.append("Monk_Object_Detection/3_mxrcnn/lib/mx-rcnn")from train_base import *# Dataset paramsroot_dir = "./";coco_dir = "Document_Layout_Analysis"img_dir = "Images"set_dataset_params(root_dir=root_dir, coco_dir=coco_dir, imageset=img_dir);set_model_params(model_name="vgg16")set_hyper_params(gpus="0", lr=0.003, lr_decay_epoch='20', epochs=30, batch_size=8)set_output_params(log_interval=500, save_prefix="model_vgg16")#Preprocessing image parameters(mean and std calculated during data pre-processing)set_img_preproc_params(img_short_side=300, img_long_side=500, mean=(196.45086004329943, 199.09071480252155, 197.07683846968297), std=(0.25779948968052024, 0.2550292865960972, 0.2553027154941914))initialize_rpn_params();initialize_rcnn_params();#Removing cache if anyif os.path.isdir("./cache/"):    os.system("rm -r ./cache/")roidb = set_dataset()sym = set_network()train(sym, roidb)

For SSD512:

对于SSD512：

import osimport syssys.path.append("Monk_Object_Detection/1_gluoncv_finetune/lib/");from detector_prototype import Detectorgtf = Detector()root = "Document_Layout_Analysis/"img_dir = "Images/"anno_file = "train_labels.csv"batch_size=8gtf.Dataset(root, img_dir, anno_file, batch_size=batch_size)#vgg16 architecture, with atrous convolutions, pretrained on COCO dataset is used for this taskpretrained = True         gpu=Truemodel_name = "ssd_512_vgg16_atrous_coco"gtf.Model(model_name, use_pretrained=pretrained, use_gpu=gpu)gtf.Set_Learning_Rate(0.003)epochs=30params_file = "saved_model.params"gtf.Train(epochs, params_file)

These models were trained on 16GB of NVIDIA Tesla V100. YOLOv3 took the least amount of time in training- 6–7 hrs, SSD512 took around 11 hrs, and Faster-RCNN took the most amount of time- 24+ hrs.

这些模型在16GB的NVIDIA Tesla V100上进行了培训。在训练中，YOLOv3花费的时间最少(6-7小时)，SSD512花费的时间约为11小时，而Faster-RCNN花费的时间最多(24小时以上)。

4.推论与比较 (4. Inference and Comparison)

The inference code is almost the same as the one used when directly using the pre-trained model. You can check them out in the notebooks here.

推论代码与直接使用预训练模型时使用的推论代码几乎相同。您可以在这里的笔记本中查看它们。

Following results were obtained on test images after training the model from scratch:

从头开始训练模型后，在测试图像上获得以下结果：

Results Obtained from YOLOv3:

从YOLOv3获得的结果：

The outputs produced by YOLOv3 were very accurate. It’s the only model that was able to identify drop-capital among the 3 architectures. Though the confidence in the predictions is low compared to other models, their classification is most accurate among all three.

YOLOv3产生的输出非常准确。它是唯一能够识别这三种架构中的首字母大写的模型。尽管与其他模型相比，对预测的信心较低，但在这三个模型中，它们的分类最为准确。

Inference on Test Images from YOLOv3 Architecture

Results Obtained from Faster-RCNN:

从Faster-RCNN获得的结果：

Faster-RCNN detected bounding boxes with very high confidence, but it missed some of the important regions, such as footer in the 1st example, heading in the 2nd example, and drop capital in the 3rd. If we decrease the threshold confidence for getting the missing boxes, it produces a lot of random boxes with no clarity of what it represents.

Faster-RCNN以很高的置信度检测到边界框，但它错过了一些重要区域，例如第一个示例中的页脚，第二个示例中的标题和第三个示例中的首字母大写。如果我们降低获取缺失框的阈值置信度，则会产生很多随机框，但不清楚其代表的含义。

Results Obtained from SSD512:

从SSD512获得的结果：

SSD512 produces outputs with very high confidence, a lot of them being 0.9+. It was also the only model that was able to identify footer and noises like division lines in the document. But it was also producing repetitive or incorrect headings such as ‘floating’ in the 2nd example (extra box with incorrect label), and graphics and paragraph in the third (2 boxes with different labels for the same region).

SSD512产生的输出具有非常高的置信度，其中很多都是0.9+。它也是唯一能够识别文档中的页脚和噪声(例如分隔线)的模型。但是它还会产生重复的标题或不正确的标题，例如第二个示例中的“ floating”(带有错误标签的额外框)，以及第三个示例中的图形和段落(同一区域中两个带有不同标签的框)。

Inference on Test Images from SSD512 Architecture

Following inferences can be made from this tutorial on the basis of their output:

可以根据本教程的输出得出以下推论：

Monk library makes it very easy for students, researchers and competitors to create deep learning models and try different hyper-parameter tuning to increase the accuracy of the model in very few lines of code.通过Monk库，学生，研究人员和竞争对手可以轻松创建深度学习模型，并尝试不同的超参数调整，从而以很少的几行代码提高模型的准确性。
Faster-RCNN gave the worst performance on this task, whereas SSD512 and YOLOv3 gave comparable results.Faster-RCNN在此任务上的性能最差，而SSD512和YOLOv3的结果可比。
If you want to use a model which shouldn’t take much time to train and missing minute details like footers or separators won’t affect your work, go for YOLOv3.如果您想使用不需要花费太多时间进行训练的模型，并且缺少诸如页脚或分隔符之类的详细信息也不会影响您的工作，请使用YOLOv3。
If these small details are crucial for your work and the focus is more on bounding box prediction than classification, go for SSD512. It should also be considered that gluoncv-finetune pipeline of Monk AI (which has been used for SSD512) also provides architectures that are pre-trained on various other datasets, such as COCO dataset.

如果这些小细节对您的工作至关重要，并且重点放在边界框预测而非分类上，请使用SSD512。还应该考虑的是，Monk AI的gluoncv-finetune管道(已用于SSD512)还提供了在各种其他数据集(例如COCO数据集)上经过预训练的体系结构。

翻译自: https://medium.com/@swapnil.ahlawat/object-detection-document-layout-analysis-using-monk-object-detection-toolkit-6c57200bde5

monk js

查看全文

http://www.taodudu.cc/news/show-863572.html

线性回归 c语言实现_C ++中的线性回归实现
忍者必须死3 玩什么忍者_降维：忍者新手
交叉验证和超参数调整：如何优化您的机器学习模型
安装好机器学习环境的虚拟机_虚拟环境之外的数据科学是弄乱机器的好方法
遭遇棘手交接_Librosa的城市声音分类-棘手的交叉验证
模型越复杂越容易惰性_ML模型的惰性预测
vgg 名人人脸图像库_您看起来像哪个名人？图像相似度搜索模型
机器学习:贝叶斯和优化方法_Facebook使用贝叶斯优化在机器学习模型中进行更好的实验
power-bi_在Power BI中的VertiPaq内-压缩成功！
模型标签数据神经网络_大型神经网络和小数据的模型选择
学习excel数据分析_为什么Excel是学习数据分析的最佳方法
护理方面关于人工智能的构想_如何提出惊人的AI，ML或数据科学项目构想。
api数据库管理_API管理平台如何增强您的数据科学项目
batch lr替代关系_建立关系的替代方法
ai/ml_您本周应阅读的有趣的AI / ML文章（8月9日）
snowflake 使用_如何使用机器学习模型直接从Snowflake进行预测
统计 python_Python统计简介
ios 图像翻转_在iOS 14中使用计算机视觉的图像差异
熔池沉积_用于3D打印的AI（第3部分）：异常熔池分类的纠缠变分自动编码器
机器学习中激活函数和模型_探索机器学习中的激活和丢失功能
macos上的硬盘检测工具_如何在MacOS上使用双镜头面部检测器（DSFD）实现90％以上的精度
词嵌入应用_神经词嵌入的法律应用
谷歌 colab_使用Google Colab在Python中将图像和遮罩拆分为多个部分
美国人口普查年收入比赛_训练网络对收入进行分类：成人普查收入数据集
NLP分类
解构里面再次解构_解构后的咖啡：焙炒，研磨和分层，以获得更浓的意式浓缩咖啡
随机森林算法的随机性_理解随机森林算法的图形指南
南加州大学机器视觉实验室_机器学习带动南加州爱迪生的变革
机器学习特征构建_使用Streamlit构建您的基础机器学习Web应用
数学建模算法：支持向量机_从零开始的算法：支持向量机

monk js_对象检测-使用Monk AI进行文档布局分析相关推荐

使用Python和OCR进行文档解析的完整代码演示（附代码）
来源:DeepHub IMBA 本文约2300字,建议阅读5分钟本文中将使用Python演示如何解析文档(如pdf)并提取文本,图形,表格等信息. 文档解析涉及检查文档中的数据并提取有用的信息.它可以 ...
ML之K-means：基于K-means算法利用电影数据集实现对top 100 电影进行文档分类
ML之K-means:基于K-means算法利用电影数据集实现对top 100 电影进行文档分类目录输出结果实现代码输出结果先看文档分类后的结果,一共得到五类电影: 实现代码 # -*- c ...
java进行文档类型转换PDF
使用jacob进行文档类型转换支持PPT.Excel.Word转为PDF模式本方法对Windows部署的项目友好,最后需要在jdk/bin目录下导入与jar包版本一致的.dll文件文件地址链接: ...
企业怎样有效地进行文档管理
企业文件涵盖了企业的核心知识.文化内涵.商业经验等无形资产.越来越多的管理者意识到文档管理在企业管理中的重要性.企业文档管理是一项必须做好的系统工作.然而,如何提高企业文档管理水平是许多管理者应该考虑 ...
java opennlp_使用opennlp进行文档分类
序本文主要研究下如何使用opennlp进行文档分类 DoccatModel 要对文档进行分类,需要一个最大熵模型(Maximum Entropy Model),在opennlp中对应DoccatMo ...
使用opennlp进行文档分类
序本文主要研究下如何使用opennlp进行文档分类 DoccatModel 要对文档进行分类,需要一个最大熵模型(Maximum Entropy Model),在opennlp中对应DoccatMo ...
ML之H-Clusters：基于H-Clusters算法利用电影数据集实现对top 100电影进行文档分类
ML之H-Clusters:基于H-Clusters算法利用电影数据集实现对top 100电影进行文档分类目录输出结果实现代码输出结果先看输出结果实现代码 # -*- coding: ut ...
Linux系统通过FTP进行文档基本操作【华为云分享】
[摘要] Linux系统里通过FTP可以对文档进行上传,更改权限和基本的文档管理. 获得Linux系统后,不熟悉命令操作的情况下,可以通过FTP工具进行文档操作,下面以WinSCP工具为例进行讲解: ...
利用YAKE进行文档关键词提取
利用YAKE!进行文档关键词提取现记录一种基于关键词统计.无监督.单文档关键词提取算法YAKE!(Yet Another Keyword Extractor)的使用笔记. YAKE!基于5种指标:是 ...

monk js_对象检测-使用Monk AI进行文档布局分析

计算机视觉 (Computer Vision)

介绍 (Introduction)

关于数据集 (About the Dataset)

和尚AI： (Monk AI :)

目录 (Table of Contents)

1.安装和尚对象检测工具包 (1. Installing Monk Object Detection Toolkit)

2.将预训练的模型用于文档布局分析任务 (2. Using the Pre-trained model for the Document Layout Analysis Task)

对于YOLOv3： (For YOLOv3:)

对于SSD512： (For SSD512:)

对于Faster-RCNN： (For Faster-RCNN:)

3.训练自己的模型 (3. Train Your Own Model)

资料准备 (Data Preparation)

选择性数据扩充 (Selective Data Augmentation)

计算数据集的均值和标准差 (Calculating Mean and Standard deviation of the dataset)

训练自己的模型 (Train Your Own Model)

4.推论与比较 (4. Inference and Comparison)

相关文章：

monk js_对象检测-使用Monk AI进行文档布局分析相关推荐

最新文章

热门文章