pdf增值税发票如何获取里面的发票编号

第一种直接读取pdf文件获取里面文字

第二种将pdf转成图片识别里面的二维码，获取调用百度图片识别接口。

二维码识别依赖

<dependency>
       <groupId>com.google.zxing</groupId>
       <artifactId>javase</artifactId>
       <version>3.4.0</version>
       </dependency>

<dependency>
       <groupId>com.google.zxing</groupId>
       <artifactId>core</artifactId>
       <version>3.4.0</version>
       </dependency>

pdf读取所需依赖

<dependency>
<groupId>org.apache.pdfbox</groupId>
<artifactId>pdfbox</artifactId>
<version>2.0.15</version>
</dependency>

<dependency>
<groupId>org.apache.pdfbox</groupId>
<artifactId>fontbox</artifactId>
<version>2.0.15</version>
</dependency>

<dependency>
<groupId>org.apache.pdfbox</groupId>
<artifactId>jempbox</artifactId>
<version>1.8.16</version>
</dependency>

/**
*
* pdf转图片
*
* @author
*/
public static void pdfFileToImage()
{
// pdf文件
File pdffile = new File("C:/Users/Luo-ping/Desktop/顺丰电子发票.pdf");
// 转成的 png 文件存储全路径及文件名
String targetPath = "D:/test.png";
try
{
FileInputStream instream = new FileInputStream(pdffile);
InputStream byteInputStream = null;

PDDocument doc = PDDocument.load(instream);
PDFRenderer renderer = new PDFRenderer(doc);
int pageCount = doc.getNumberOfPages();
if (pageCount > 0)
{
BufferedImage image = renderer.renderImage(0, 2.0f);
image.flush();
ByteArrayOutputStream bs = new ByteArrayOutputStream();
ImageOutputStream imOut;
imOut = ImageIO.createImageOutputStream(bs);
ImageIO.write(image, "png", imOut);
byteInputStream = new ByteArrayInputStream(bs.toByteArray());
byteInputStream.close();
}

File uploadFile = new File(targetPath);
FileOutputStream fops;
fops = new FileOutputStream(uploadFile);
fops.write(readInputStream(byteInputStream));
fops.flush();
fops.close();
}
catch (Exception e)
{
e.printStackTrace();
}
}

private static byte[] readInputStream(InputStream inStream) throws Exception
{
ByteArrayOutputStream outStream = new ByteArrayOutputStream();
byte[] buffer = new byte[1024];
int len = 0;
while ((len = inStream.read(buffer)) != -1)
{
outStream.write(buffer, 0, len);
}
inStream.close();
return outStream.toByteArray();
}

/**
*
* 识别图中二维码
*
* @author
* @return
*/
public static String extractImages()
{
String filename = "D:/test.png";
String returnResult = "";
MultiFormatReader multiFormatReader = new MultiFormatReader();
File file = new File(filename);
try
{
BufferedImage image = ImageIO.read(file);
// 定义二维码参数
Map hints = new HashMap();
hints.put(EncodeHintType.CHARACTER_SET, "utf-8");

// 获取读取二维码结果
BinaryBitmap binaryBitmap = new BinaryBitmap(new HybridBinarizer(new BufferedImageLuminanceSource(image)));
Result result = null;
result = multiFormatReader.decode(binaryBitmap, hints);
returnResult = result.getText();
System.err.println(returnResult);
}
catch (Exception e)
{
e.printStackTrace();
}
return returnResult;
}

直接读取pdf

/**
* 读PDF文件，使用了pdfbox开源项目
*
* @param fileName
*
*/
public static void readPDF(String fileName)
{
File file = new File(fileName);
FileInputStream in = null;
try
{
in = new FileInputStream(fileName);
// 新建一个PDF解析器对象
PDFParser parser = new PDFParser(new RandomAccessFile(file, "rw"));
// 对PDF文件进行解析
parser.parse();
// 获取解析后得到的PDF文档对象
PDDocument pdfdocument = parser.getPDDocument();
// 新建一个PDF文本剥离器
PDFTextStripper stripper = new PDFTextStripper();
// 从PDF文档对象中剥离文本
String result = stripper.getText(pdfdocument);
FileWriter fileWriter = new FileWriter(new File("pdf.txt"));
fileWriter.write(result);
fileWriter.flush();
fileWriter.close();
System.out.println("PDF文件的文本内容如下：");
System.out.println(result);

}
catch (Exception e)
{
System.out.println("读取PDF文件" + file.getAbsolutePath() + "生失败！" + e);
e.printStackTrace();
}
finally
{
if (in != null)
{
try
{
in.close();
}
catch (IOException e1)
{
}
}
}
}

pdf增值税发票如何获取里面的发票编号相关推荐

验证码识别，发票编号识别（转）
毕业设计做了一个简单的研究下验证码识别的问题,并没有深入的研究,设计图形图像的东西,水很深,神经网络,机器学习,都很难.这次只是在传统的方式下分析了一次. 今年工作之后再也没有整理过,前几天一个家伙要 ...
python获取进程编号（目的、获取当前进程编号、根据编号杀死指定进程号、获取当前父进程编号）
1. 获取进程编号的目的获取进程编号的目的是验证主进程和子进程的关系,可以得知子进程是由那个主进程创建出来的. 获取进程编号的两种操作获取当前进程编号获取当前父进程编号 2. 获 ...
pdf解析，获取字段数据
最近项目需要解析pdf单据,获取里面的字段数据,通过网上的查阅发现itext比pdfbox的文档要多一点,所以选择了itext(不是说pdfbox不好,只是api和例子太少,难以解).因pdf非模板化 ...
如何根据银行卡号获取对应的银行编号，如：ICBC
最近项目里面需要使用一个东西,就是如何根据银行卡号获取对应的银行编号,如:ICBC,所以就在网上找了些资源,在此整理分享给大家.都是干货哦! 废话少说,直接上代码了. package com.whb. ...
C/C++获取安卓手机IMEI编号demo
C/C++获取安卓手机IMEI编号demo #include <stdlib.h> #include <exception> #include <fstream> ...
利用opencv从USB摄像头获取图片获得摄像头编号
本文转自博客园-Arkenstone 由于opencv自带的VideoCapture函数直接从usb摄像头获取视频数据,所以用这个来作为实时的图像来源用于实体检测识别是很方便的. 安装opencv 安 ...
python-docx获取word的自动编号
刚毕业的小菜鸡,第一次写文章,如果有写的不好的地方请谅解. 目录 1.word文档分析 2.python-docx获取word中的自动编号 3.尚未解决的问题 1.word文档分析 word文档是一种 ...
PHP开发微信支付小微商户V3版本图片上传、生成签名、平台证书获取、平台证书编号、敏感信息加密
吐槽一下,看微信支付小微商户的开发文档头都大了,什么是平台证书.什么是商户API证书...... 好了废话不多说下面明确几个名词: 商户API证书:是由权威CA颁发,用于有关微信支付等操作API接口使 ...
java读取pdf签名_Java 获取PDF中的数字签名信息
一.概述及程序环境要求本文以Java代码演示如何获取PDF文档中的数字签名信息,包括签名人.签名位置.日期.原因.联系方式.签名在文档中的坐标等等. 程序环境包括: Spire.Pdf.jar(ja ...

pdf增值税发票如何获取里面的发票编号

pdf增值税发票如何获取里面的发票编号相关推荐

最新文章

热门文章