需求：

简单实现大量图片文档内容提取。如提取ppt文案场景。

可行方案

使用word导入一批本地图片，另存为为pdf，然后使用开源软件Umi-OCR批量文档转换，可以实现生成双层可搜索pdf。

具体步骤

准备好图片。

使用工具批量下载这些图片，如nodejs

执行npm init; npm install download;
将图片数组[“1.jpg”, “2.jpg”]保存到data.json中
创建 index.js

 const download = require("download")；
 const fs = require("fs");
 fs.readFile("./data.json", (err, res) => {
   // 图片网络地址，我这里是一个JSON数据，根据自己的需要进行修改
   res = JSON.parse(res);
   // 记录当前下载到第几项
   let index = 0;
   // 存储下载失败的图片链接地址
   let errorImageUrlList = [];
   // 递归函数，使用递归循环防止请求阻塞
   const downloadImages = async () => {
     if (index >= res.length) return;
     //如果不是JSON数据修改此
     let price = res[index];
     // let imgName = price.split('/')[price.split('/').length - 1];
     // 使用try抛出异常，防止因为一张图片下载失败，从而导致后面无法继续下载
     try {
       await download(price, `./images2`);
       await console.log(`第${index + 1}张图片下载成功`);
       await index++;
       await downloadImages();
     } catch {
       console.log("请求失败，当前下载到第" + index + "图片");
       errorImageUrlList.push(price);
       index++;
       downloadImages();
     }
   };
   downloadImages();
 });

执行node index.js批量下载图片

使用word，创建新文档，导入上述图片文件，另存为pdf
下载Umi-OCR，选中批量文档，拖入pdf，选择输出目录，开始任务即可

Carnia's Notes

探索

批量图片OCR方案

需求：

可行方案

具体步骤

目录

关系图谱

反向链接