首页如何安装 Tesseract OCR工具

如何安装 Tesseract OCR工具

时间: 2024-11-04 18:09:10 浏览: 135

Tesseract OCR（Optical Character Recognition，光学字符识别）是一个开源的文本识别引擎，常用于识别扫描文档中的文字。在Python中，你需要通过`pytesseract`库来调用Tesseract。下面是安装Tesseract及其依赖的步骤： ### 对于Windows用户: 1. 下载Tesseract官网提供的最新版本：https://github.com/tesseract-ocr/tesseract 官网会提供预编译好的exe文件，选择适合你的系统的版本下载。 2. 解压下载的Tesseract文件到你喜欢的位置，比如`C:\Program Files\tesseract`。 3. 添加Tesseract路径到系统环境变量： - 右键点击“此电脑” -> 属性 -> 系统保护 -> “高级”系统设置 -> “环境变量”。 - 在系统变量里找到`PATH`，点击“编辑”，在新窗口添加`C:\Program Files\tesseract`到变量值的末尾。 4. 安装`pytesseract`库，使用pip命令： ``` pip install pytesseract ``` 5. 验证安装是否成功，打开命令提示符或PowerShell，输入`pytesseract.tesseract_cmd`，如果返回的是`tesseract.exe`的完整路径，则安装完成。 ### 对于Mac/Linux用户: 1. 在Ubuntu或Debian上，可以通过包管理器安装： ``` sudo apt-get update && sudo apt-get install tesseract-ocr ``` 或者在Arch Linux上： ``` sudo pacman -S tesseract ``` 2. 在macOS上，首先安装Homebrew，然后： ``` brew install tesseract ``` 或者手动从源代码编译安装Tesseract。 3. 安装`pytesseract`： ``` pip install pytesseract ``` 4. 同样验证安装，`pytesseract.tesseract_cmd`应该指向`/usr/bin/tesseract`或其他相应位置。完成后，你应该可以在Python代码中使用`pytesseract`库了。

阅读全文