OCR’ing Business Cards
OCR名片

宋群丽    辽宁师范大学
时间:2022-05-12 语向:英-中 类型:人工智能 字数:1977
  • OCR’ing Business Cards
    OCR名片
  • In a previous tutorial, we learned how to automatically OCR and scan receipts by:
    在上一个教程中,我们学习了如何通过以下方法自动OCR和扫描收据:
  • Detecting the receipt in the input image
    检测所述输入图像中的所述回执
  • Applying a perspective transform to obtain a top-down view of the receipt
    应用透视变换以获得收据的自顶向下视图
  • Utilizing Tesseract to OCR the text on the receipt
    利用Tesseract对收据上的文字进行OCR
  • Using regular expressions to extract the price data
    利用正则表达式提取价格数据
  • To learn how to OCR a business card using Python, just keep reading.
    要了解如何使用Python对名片进行OCR,只需继续阅读即可。
  • OCR’ing Business Cards
    OCR名片
  • In this tutorial, we will use a very similar workflow, but this time apply it to business card OCR. More specifically, we’ll learn how to extract the name, title, phone number, and email address from a business card.
    在本教程中,我们将使用一个非常类似的工作流,但这次将其应用于名片OCR。更具体地说,我们将学习如何从名片中提取姓名,头衔,电话号码和电子邮件地址。
  • You’ll then be able to extend this implementation to your projects.
    然后您就可以将此实现扩展到您的项目中。
  • Learning Objectives
    学习目标
  • In this tutorial, you will:
    在本教程中,您将:
  • Learn how to detect business cards in images
    了解如何在图像中检测名片
  • Apply OCR to a business card image
    将OCR应用于名片图像
  • Utilize regular expressions to extract: Name Job title Phone number Email address
    利用正则表达式提取:姓名职位电话号码电子邮件地址
  • Name
    姓名
  • Job title
    职位名称
  • Phone number
    电话号码
  • Email address
    电子邮件地址
  • Business Card OCR
    名片OCR
  • In the first part of this tutorial, we will review our project directory structure. We’ll then implement a simple yet effective Python script to allow us to OCR a business card.
    在本教程的第一部分,我们将回顾我们的项目目录结构。然后,我们将实现一个简单而有效的Python脚本,以允许我们对名片进行OCR。
  • We’ll wrap up this tutorial with a discussion of our results, along with the next steps.
    我们将通过讨论我们的结果以及接下来的步骤来结束本教程。
  • Configuring your development environment
    配置开发环境
  • To follow this guide, you need to have the OpenCV library installed on your system.
    要遵循本指南,您需要在系统上安装OpenCV库。
  • Luckily, OpenCV is pip-installable:
    幸运的是,OpenCV是可通过PIP安装的:
  • If you need help configuring your development environment for OpenCV, I highly recommend that you read my pip install OpenCV guide — it will have you up and running in a matter of minutes.
    如果您需要帮助为OpenCV配置开发环境,我强烈建议您阅读我的pip安装OpenCV指南-它将使您在几分钟内启动并运行。
  • Having Problems Configuring Your Development Environment?
    在配置开发环境时遇到问题?
  • All that said, are you:
    所有这些,你是不是:
  • Short on time?
    时间短吗?
  • Learning on your employer’s administratively locked system?
    学习你雇主的行政锁定系统?
  • Wanting to skip the hassle of fighting with the command line, package managers, and virtual environments?
    想要跳过与命令行,包管理器和虚拟环境斗争的麻烦吗?
  • Ready to run the code right now on your Windows, macOS, or Linux system?
    准备好在Windows,macOS或Linux系统上运行代码了吗?
  • Then join PyImageSearch University today!
    现在就加入PyImageSearch大学吧!
  • Gain access to Jupyter Notebooks for this tutorial and other PyImageSearch guides that are pre-configured to run on Google Colab’s ecosystem right in your web browser! No installation required.
    获取Jupyter笔记本本教程和其他PyImageSearch指南的访问权限,这些指南预先配置为在Google Colab的生态系统上运行,就在您的web浏览器中!无需安装。
  • And best of all, these Jupyter Notebooks will run on Windows, macOS, and Linux!
    最棒的是,这些Jupyter笔记本可以在Windows,macOS和Linux上运行!
  • Project Structure
    项目结构
  • We first need to review our project directory structure.
    我们首先需要回顾我们的项目目录结构。
  • Start by accessing the “Downloads” section of this tutorial to retrieve the source code and example images.
    从访问本教程的“下载”部分开始,检索源代码和示例图像。
  • From there, take a look at the directory structure:
    从那里,看一看目录结构:
  • We only have a single Python script to review, ocr_business_card.py. This script will load example business card images (i.e., larry_page.png and tony_stark.png), OCR them, and then output the name, job title, phone number, and email address from the business card.
    我们只有一个要查看的Python脚本,ocr_business_card.py。该脚本将加载示例名片图像(即,larry_page.png和tony_stark.png),OCR它们,然后输出名片中的姓名,职务,电话号码和电子邮件地址。
  • Best of all, we’ll be able to accomplish our goal in under 120 lines of code (including comments)!
    最棒的是,我们将能够在不到120行的代码(包括注释)内完成我们的目标!
  • Implementing Business Card OCR
    实施名片OCR
  • We are now ready to implement our business card OCR script! First, open the ocr_business_card.py file in our project directory structure and insert the following code:
    我们现在准备实现我们的名片OCR脚本!首先,在我们的项目目录结构中打开OCR_Business_Card.py文件,并插入以下代码:
  • Our imports here are similar to the ones in a previous tutorial on OCR’ing receipts.
    我们这里的导入类似于以前的OCR收据教程中的导入。
  • We need our four_point_transform function to obtain a top-down, bird’s-eye view of the business card. Obtaining this view typically yields higher OCR accuracy.
    我们需要我们的four_point_transform函数来获得名片的自顶向下的鸟瞰视图。获得这种视图通常会产生更高的OCR精度。
  • The pytesseract package is used to interface with the Tesseract OCR engine. We then have Python’s regular expression library, re, which will allow us to parse the names, job titles, email addresses, and phone numbers from business cards.
    pytesseract包用于与Tesseract OCR引擎接口。然后我们有了Python的正则表达式库re,它将允许我们解析名片中的姓名,职务,电子邮件地址和电话号码。
  • With the imports taken care of, we can move on to command line arguments:
    在处理好导入之后,我们可以转到命令行参数:
  • Our first command line argument, --image, is the path to our input image on disk. We assume that this image contains a business card with sufficient contrast between the foreground and background, ensuring we can successfully apply edge detection and contour processing to extract the business card.
    我们的第一个命令行参数--image是磁盘上输入映像的路径。我们假设这幅图像包含一张前景和背景之间有足够对比度的名片,确保我们能够成功地应用边缘检测和轮廓处理来提取名片。
  • We then have two optional command line arguments, --debug and --min-conf. The --debug command line argument is used to indicate if we are debugging our image processing pipeline and showing more of the processed images on our screen (useful for when you can’t determine why a business card was detected or not).
    然后我们有两个可选的命令行参数--debug和--min-conf。debug命令行参数用于指示是否正在调试图像处理管道并在屏幕上显示更多已处理的图像(当您无法确定为什么检测到名片时非常有用)。
  • We then have --min-conf, the minimum confidence (on a scale of 0-100) required for successful text detection. You can increase --min-conf to prune out weak text detections.
    然后我们有--min-conf,成功的文本检测所需的最小置信度(在0-100的范围内)。您可以增加--min-conf来删除弱文本检测。
  • Let’s now load our input image from disk:
    现在让我们从磁盘加载输入映像:
  • Here, we load our input --image from disk and then clone it. We make it a clone to extract the original high-resolution version of the business card after contour processing.
    这里,我们从磁盘加载输入--映像,然后克隆它。我们将其作为克隆来提取经过轮廓处理后的名片的原始高分辨率版本。
  • We then resize our image to have a width of 600px and then compute the ratio of the new width to the old width (a requirement for when we want to obtain a top-down view of the original high-resolution business card).
    然后,我们调整图像的大小,使其宽度为600px,然后计算新宽度与旧宽度的比率(当我们希望获得原始高分辨率名片的自上而下视图时,这是一个要求)。
  • We continue our image processing pipeline below.
    下面我们继续我们的图像处理管道。
  • First, we take our original image and then convert it to grayscale, blur it, and then apply edge detection, the result of which can be seen in Figure 2.
    首先,我们获取原始图像,然后将其转换为灰度,模糊它,然后应用边缘检测,其结果如图2所示。
  • Note that the outline/border of the business card is visible on the edge map. However, suppose there are any gaps in the edge map. In that case, the business card will not be detectable via our contour processing technique, so you may need to tweak the parameters to the Canny edge detector or capture your image in an environment with better lighting conditions.
    注意,名片的轮廓/边框在边缘图上是可见的。但是,假设在边缘映射中存在任何间隙。在这种情况下,名片将无法通过我们的轮廓处理技术检测到,因此您可能需要调整Canny边缘检测器的参数,或者在光照条件更好的环境中拍摄图像。
  • From there, we detect contours and sort them in descending order (largest to smallest) based on the area of the computed contour. Our assumption here will be that the business card contour will be one of the largest detected contours, hence this operation.
    从那里,我们检测轮廓,并根据计算轮廓的面积按降序(从大到小)对它们进行排序。我们这里的假设是名片轮廓将是检测到的最大轮廓之一,因此进行了此操作。
  • We also initialize cardCnt (Line 40), which is the contour that corresponds to the business card.
    我们还初始化cardCnt(第40行),它是对应于名片的轮廓。
  • Let’s now loop over the largest contours:
    现在让我们对最大的轮廓进行循环:
  • Lines 45 and 46 perform contour approximation.
    行45和46执行轮廓近似。
  • If our approximated contour has four vertices, then we can assume that we found the business card. If that happens, we break from the loop and update our cardCnt.
    如果我们的近似轮廓有四个顶点,那么我们就可以假设我们找到了名片。如果发生这种情况,我们将中断循环并更新我们的cardcnt。
  • If we reach the end of the for loop and still haven’t found a valid cardCnt, we gracefully exit the script. Remember, we cannot process the business card if one cannot be found in the image!
    如果我们到达for循环的末尾,仍然没有找到有效的cardCnt,我们将优雅地退出脚本。记住,如果在图像中找不到名片,我们无法处理名片!
  • Our next code block handles showing some debugging images as well as obtaining our top-down view of the business card:
    我们的下一个代码块句柄显示了一些调试图像,并获得了名片的自顶向下视图:
  • Lines 62-66 make a check to see if we are in --debug mode, and if so, we draw the contour of the business card on the output image.
    第62-66行进行检查,看看我们是否处于--debug模式,如果是,我们在输出图像上绘制名片的轮廓。
  • We then apply a four-point perspective transform to the original, high-resolution image, thus obtaining the top-down, bird’s-eye view of the business card (Line 70).
    然后,我们对原始的,高分辨率的图像应用四点透视变换,从而获得名片的自上而下的,鸟瞰视图(第70行)。
  • We multiply the cardCnt by our computed ratio here since cardCnt was computed for the reduced image dimensions. Multiplying by ratio scales the cardCnt back into the dimensions of the orig image.
    我们在这里将cardCnt乘以我们计算的比率,因为cardCnt是为减少的图像维数计算的。乘以比率可将cardCnt缩放回原始图像的维度。
  • We then display the transformed image to our screen (Lines 73 and 74).
    然后,我们将变换后的图像显示到屏幕上(第73和74行)。
  • With our top-down view of the business card obtain, we can move on to OCR’ing it:
    通过对所获得的名片的自上而下视图,我们可以继续对其进行OCR:
  • Lines 78 and 79 OCR the business card, resulting in the text output.
    第78和79行OCR名片,产生文本输出。
  • But the question remains, how are we going to extract the information from the business card itself? The answer is to utilize regular expressions.
    但问题仍然存在,我们如何从名片本身提取信息?答案是利用正则表达式。
  • Lines 83 and 84 utilize regular expressions to extract phone numbers and email addresses (Walia, 2020) from the text, while Lines 88 and 89 do the same for names and job titles (Regular expression for first and last name, 2020).
    第83行和第84行利用正则表达式从文本中提取电话号码和电子邮件地址(Walia,2020),而第88行和第89行利用正则表达式提取姓名和职务(名字和姓氏的正则表达式,2020)。
  • A review of regular expressions is outside the scope of this tutorial, but the gist is that they can be used to match particular patterns in text.
    对正则表达式的回顾不在本教程的范围之内,但主要是它们可以用于匹配文本中的特定模式。
  • For example, a phone number consists of a specific digits pattern and sometimes includes dashes and parentheses. Email addresses also follow a pattern, including a text string, followed by an “@” symbol, and then the domain name.
    例如,电话号码由特定的数字模式组成,有时还包括破折号和括号。电子邮件地址也遵循一种模式,包括一个文本字符串,后跟一个“@”符号,然后是域名。
  • Any time you can reliably guarantee a pattern of text, regular expressions can work quite well. That said, they aren’t perfect either, so you may want to look into more advanced natural language processing (NLP) algorithms if you find your business card OCR accuracy is suffering significantly.
    任何时候,只要您能够可靠地保证文本的模式,正则表达式都可以很好地工作。也就是说,它们也不是完美的,所以如果你发现你的名片OCR的准确性受到了很大的影响,你可能想要研究更高级的自然语言处理(NLP)算法。
  • The final step here is to display our output to the terminal:
    这里的最后一步是向终端显示我们的输出:
  • This final code block loops over the extracted phone numbers (Lines 96 and 97), email addresses (Lines 106 and 107), and names/job titles (Lines 116 and 117), displaying each to our terminal.
    最后一个代码块在提取的电话号码(第96和97行),电子邮件地址(第106和107行)和姓名/职务(第116和117行)上循环,将每个代码显示给我们的终端。
  • Of course, you could take this extracted information, write to disk, save it to a database, etc. Still, for the sake of simplicity (and not knowing your project specifications of business card OCR), we’ll leave it as an exercise to you to save the data as you see fit.
    当然,您可以将提取的信息写入磁盘,将其保存到数据库,等等。但是,为了简单起见(并且不知道您的名片OCR项目规范),我们将把保存数据作为您认为合适的练习。
  • Business Card OCR Results
    名片OCR结果
  • We are now ready to apply OCR to business cards. Open a terminal and execute the following command:
    我们现在已经准备好将OCR应用到名片上。打开终端并执行以下命令:
  • Figure 3 (top) shows the results of our business card localization. Notice how we have correctly detected the business card in the input image.
    图3(顶部)显示了我们名片本地化的结果。注意我们是如何正确地检测到输入图像中的名片的。
  • From there, Figure 3 (bottom) displays the results of applying a perspective transform of the business card, thus resulting in the top-down, bird’s-eye view of the image.
    从那里,图3(底部)显示了应用名片透视变换的结果,从而得到了图像的自顶向下的鸟瞰视图。
  • Once we have the top-down view of the image (typically required to obtain higher OCR accuracy), we can apply Tesseract to OCR it, the results of which can be seen in our terminal output above.
    一旦我们有了图像的自上而下视图(通常需要获得更高的OCR精度),我们就可以将Tesseract应用于OCR it,其结果可以在我们上面的终端输出中看到。
  • Note that our script has successfully extracted both phone numbers on Tony Stark’s business card.
    注意,我们的脚本已经成功提取了托尼·斯塔克名片上的两个电话号码。
  • No email addresses are reported as there is no email address on the business card.
    由于名片上没有电子邮件地址,因此不报告电子邮件地址。
  • We then have the name and job title displayed as well. It’s interesting that we can OCR all the text successfully because the text of the name is more distorted than the phone number text. Our perspective transform dealt with all the text effectively even though the amount of distortion changes as you go further away from the camera. That’s the point of perspective transform and why it’s important to the accuracy of our OCR.
    然后,我们还会显示姓名和职务。有趣的是,我们可以成功地OCR所有文本,因为姓名文本比电话号码文本更失真。我们的透视变换有效地处理了所有的文本,即使当你离相机越远,失真的量也会发生变化。这就是透视变换的要点,也是为什么它对OCR的准确性很重要的原因。
  • Let’s try another example image, this one of an old Larry Page (co-founder of Google) business card:
    让我们再试一个例子,这张是拉里·佩奇(谷歌联合创始人)的名片:
  • Figure 4 (top) displays the output of localizing Page’s business card. The bottom then shows the top-down transform of the image.
    图4(顶部)显示了本地化Page的名片的输出。然后底部显示图像的自顶向下变换。
  • This top-down transform is passed through Tesseract OCR, yielding the OCR’d text as output. We take this OCR’d text, apply a regular expression, and thus obtain the results above.
    这种自顶向下的转换通过Tesseract OCR传递,产生OCR的文本作为输出。我们取这个OCR的文本,应用一个正则表达式,从而得到上面的结果。
  • Examining the results, you can see that we have successfully extracted Larry Page’s two phone numbers, email address, and name/job title from the business card.
    检查结果,您可以看到我们已经成功地从名片中提取了Larry Page的两个电话号码,电子邮件地址和姓名/职务。
  • ✓ 28 courses on essential computer vision, deep learning, and OpenCV topics
    检查(&C);28门关于计算机视觉,深度学习和OpenCV主题的课程
  • ✓ 28 Certificates of Completion
    检查(&C);28份竣工证书
  • ✓ 39h 44m on-demand video
    检查(&C);39h 44m点播视频
  • ✓ Brand new courses released every month, ensuring you can keep up with state-of-the-art techniques
    检查(&C);每月发布全新课程,确保您能跟上最新技术
  • ✓ Pre-configured Jupyter Notebooks in Google Colab
    检查(&C);谷歌Colab中预先配置的Jupyter笔记本
  • ✓ Run all code examples in your web browser — works on Windows, macOS, and Linux (no dev environment configuration required!)
    检查(&C);在您的web浏览器中运行所有代码示例-适用于Windows,macOS和Linux(不需要开发环境配置!)
  • ✓ Access to centralized code repos for all 400+ tutorials on PyImageSearch
    检查(&C);访问PyImageSearch上所有400多个教程的集中代码回购
  • ✓ Easy one-click downloads for code, datasets, pre-trained models, etc.
    检查(&C);简单的一键下载代码,数据集,预训练模型等。
  • ✓ Access on mobile, laptop, desktop, etc.
    检查(&C);在移动,笔记本电脑,台式机等上访问。
  • Summary
    摘要
  • In this tutorial, you learned how to build a basic business card OCR system. Essentially, this system was an extension of our receipt scanner but with different regular expressions and text localization strategies.
    在本教程中,您学习了如何构建一个基本的名片OCR系统。从本质上说,这个系统是我们的收据扫描仪的扩展,但是使用了不同的正则表达式和文本定位策略。
  • If you ever need to build a business card OCR system, I recommend that you use this tutorial as a starting point, but keep in mind that you may want to utilize more advanced text post-processing techniques, such as true natural language processing (NLP) algorithms, rather than regular expressions.
    如果您需要构建名片OCR系统,我建议您使用本教程作为起点,但请记住,您可能希望使用更高级的文本后处理技术,例如真正的自然语言处理(NLP)算法,而不是正则表达式。
  • Regular expressions can work very well for email addresses and phone numbers, but for names and job titles that may fail to obtain high accuracy. If and when that time comes, you should consider leveraging NLP as much as possible to improve your results.
    正则表达式可以很好地处理电子邮件地址和电话号码,但对于可能无法获得高准确性的姓名和职务。如果到了那个时候,您应该考虑尽可能多地利用NLP来改进您的结果。
  • Citation Information
    引文信息
  • Rosebrock, A. “OCR’ing Business Cards,” PyImageSearch, 2021, https://www.pyimagesearch.com/2021/11/03/ocring-business-cards/
    A.Rosebrock,“OCR'ing Business Cards”,PyImageSearch,2021,https://www.PyImageSearch.com/2021/11/03/ocring-business-cards/
  • @article{Rosebrock_2021_OCR_BCards,
    @文章{Rosebrock_2021_OCR_BCards,
  • author = {Adrian Rosebrock},
    作者={Adrian Rosebrock},
  • title = {{OCR}’ing Business Cards},
    标题={{OCR}'ing名片},
  • journal = {PyImageSearch},
    日记={PyImageSearch},
  • year = {2021},
    年份={2021},
  • note = {https://www.pyimagesearch.com/2021/11/03/ocring-business-cards/},
    注={https://www.pyimagesearch.com/2021/11/03/ocring-business-cards/},
  • }
    }
  • To download the source code to this post (and be notified when future tutorials are published here on PyImageSearch), simply enter your email address in the form below!
    要将源代码下载到这篇文章(并在PyImageSearch上发布未来教程时得到通知),只需在下面的表单中输入您的电子邮件地址即可!

400所高校都在用的翻译教学平台

试译宝所属母公司