Introduction
1. 推动基于深度学习的STR发展三要素:
(1)先进的硬件系统:高性能计算支持训练大规模识别网络
(2)基于深度学习的STR算法能自动进行特征学习
(3)STR应用需求旺盛
BACKGROUND
STR基本问题:
(1)Text localization(文本定位)
- Pan He, Weilin Huang, Tong He, Qile Zhu, Yu Qiao, and Xiaolin Li.Single shot text detector with regional attention. In Proceedings of ICCV. 3047–3055.
- Fangneng Zhan and Shijian Lu. 2019. ESIR: End-to-end scene text recognition via iterative image rectification. In Proceedings of CVPR. 2059–2068.
- Fang Yin, Rui Wu, Xiaoyang Yu, and Guanglu Sun. 2019. Video text localization based on Adaboost. Multimedia Tools and Applications 78, 5 (2019), 5345–5354.
(2)Text verification(文本验证)
- Tao Wang, David J Wu, Adam Coates, and Andrew Y Ng. 2012. End-to-end text recognition with convolutional neural networks. In Proceedings of ICPR. 3304–3308.
- Max Jaderberg, Andrea Vedaldi, and Andrew Zisserman. 2014. Deep features for text spotting. In Proceedings of ECCV. 512–528.
(3)Text detection(文本检测)
基于回归:
- Yuliang Liu and Lianwen Jin. 2017. Deep matching prior network: Toward tighter multi-oriented text detection. In Proceedings of CVPR. 1962–1969.
- Yuliang Liu, Lianwen Jin, Shuaitao Zhang, Canjie Luo, and Sheng Zhang. 2019. Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90 (2019), 337–345.
基于分割:
- Yongchao Xu, Yukang Wang, Wei Zhou, Yongpan Wang, Zhibo Yang, and Xiang Bai. 2019. TextField: learning a deep direction field for irregular scene text detection. IEEE Transactions on Image Processing 28, 11 (2019), 5566–5579.
- Yuliang Liu, Lianwen Jin, and Chuanming Fang. 2020. Arbitrarily Shaped Scene Text Detection with a Mask Tightness Text Detector. IEEE Transactions on Image Processing 29 (2020), 2918–2930.
(4)Text segmentation(文本分割)
单行分割:
- Fangneng Zhan and Shijian Lu. 2019. ESIR: End-to-end scene text recognition via iterative image rectification. In Proceedings of CVPR. 2059–2068.
单字符分割(早期文字识别方法):
- Palaiahnakote Shivakumara, Souvik Bhowmick, Bolan Su, Chew Lim Tan, and Umapada Pal. 2011. A new gradient based character segmentation method for video text recognition. In Proceedings of ICDAR. 126–130
- Anand Mishra, Karteek Alahari, and CV Jawahar. 2012. Scene text recognition using higher order language priors. In Proceedings of BMVC. 1–11.
(5)Text recognition(文本识别)
- Zhanzhan Cheng, Fan Bai, Yunlu Xu, Gang Zheng, Shiliang Pu, and Shuigeng Zhou. 2017. Focusing attention: Towards accurate text recognition in natural images. In Proceedings of ICCV. 5086–5094.
- Canjie Luo, Lianwen Jin, and Zenghui Sun. 2019. MORAN: A Multi-Object Rectified Attention Network for Scene Text Recognition. Pattern Recognition 90 (2019), 109–118.
- Baoguang Shi, Mingkun Yang, Xinggang Wang, Pengyuan Lyu, Cong Yao, and Xiang Bai. 2019. ASTER: An Attentional Scene Text Recognizer with Flexible Rectification. IEEE Trans. Pattern Anal. Mach. Intell 41, 9 (2019), 2035–2048.
(6)End-to-end system(端到端)
- Hui Li, Peng Wang, and Chunhua Shen. 2017. Towards end-to-end text spotting with convolutional recurrent neural networks. In Proceedings of ICCV. 5238–5246.
- Tong He, Zhi Tian, Weilin Huang, Chunhua Shen, Yu Qiao, and Changming Sun. 2018. An end-to-end textspotter with explicit alignment and attention. In Proceedings of CVPR. 5020–5029.
- Yuliang Liu, Hao Chen, Chunhua Shen, Tong He, Lianwen Jin, and Liangwei Wang. 2020. ABCNet: Real-time Scene Text Spotting with Adaptive Bezier-Curve Network. In Proceedings of CVPR.
其它:Script identification、Text enhancement、Text tracking、NLP
METHODOLOGIES
STR常见方法有基于单字符分割的方法和文本行识别的方法
1. 基于单字符分割
三个步骤:图像处理,字符分割,单字符识别
- Zhaoyi Wan, Mingling He, Haoran Chen, Xiang Bai, and Cong Yao. 2020.TextScanner: Reading Characters in Order for Robust Scene Text Recognition. In Proceedings of AAAI.
通过与语义分割实现字符级识别,通过构建两个分支分别进行字符的分类和定位
存在的问题:
(1)字符定位被认为是STR中最具挑战性的任务之一,识别效果受字符定位效果影响
(2)单字符识别未考虑到上下文语义信息,最终单词级别效果可能较差
2. 文本行识别
四个步骤:图像处理,特征提取,序列模型,文本行预测,其中第一步和第三部非必需
(1)图像处理
背景移除
传统的二值化方法可以应用到文档图像中,对于自然场景中的复杂图像,可以借鉴GAN的方法移除背景
- Canjie Luo, Qingxiang Lin, Yuliang Liu, Jin Lianwen, and Shen Chunhua. 2020. Separating Content from Style Using Adversarial Learning for Recognizing Text in the Wild. CoRR abs/2001.04189 (2020).(借助CANs的方法移除背景)
图像超分
对于模糊且分辨率低的图像,通过采用图像超分的方法解决
- Wenjia Wang, Enze Xie, Peize Sun, Wenhai Wang, Lixun Tian, Chunhua Shen, and Ping Luo. 2019. TextSR: Content-Aware Text Super-Resolution Guided by Recognition. CoRR abs/1909.07113 (2019).(首次将图像超分与识别任务相结合)
- https://github.com/JasonBoy1/TextZoom(超分+识别)
图像整流
通过人为设计整流网络应对不规则文本图像,规范化图像输入
- Max Jaderberg, Karen Simonyan, Andrew Zisserman, et al. 2015. Spatial transformer networks. In Proceedings of NIPS. 2017–2025.(STN)
- Baoguang Shi, Mingkun Yang, Xinggang Wang, Pengyuan Lyu, Cong Yao, and Xiang Bai. 2019. ASTER: An Attentional Scene Text Recognizer with Flexible Rectification. IEEE Trans. Pattern Anal. Mach. Intell 41, 9 (2019), 2035–2048.(TPS)
- Canjie Luo, Lianwen Jin, and Zenghui Sun. 2019. MORAN: A Multi-Object Rectified Attention Network for Scene Text Recognition. Pattern Recognition 90 (2019), 109–118.(提出多目标整流网络,预测图像每个部分的偏移量来纠正不规则文本)
(2)特征提取
图像特征提取的效果直接影响到最终的识别性能,更深更先进的特征提取网络能取得更好的效果,但是需要更高的内存开销以及需要更大的算力支持,背景消除+简单的特征提取网络可能是未来发展的一个方向
基于CNN:
- Xiao Yang, Dafang He, Zihan Zhou, Daniel Kifer, and C Lee Giles. 2017. Learning to read irregular text with attention mechanisms. In Proceedings of IJCAI. 3280–3286.(VGG)
- Qingqing Wang, Wenjing Jia, Xiangjian He, Yue Lu, Michael Blumenstein, Ye Huang, and Shujing Lyu. 2019. ReELFA: A Scene Text Recognizer with Encoded Location and Focused Attention. In Proceedings of ICDAR: Workshops. 71–76.(VGG)
- Baoguang Shi, Mingkun Yang, Xinggang Wang, Pengyuan Lyu, Cong Yao, and Xiang Bai. 2019. ASTER: An Attentional Scene Text Recognizer with Flexible Rectification. IEEE Trans. Pattern Anal. Mach. Intell 41, 9 (2019), 2035–2048.(ResNet)
- Xiaoxue Chen, Tianwei Wang, Yuanzhi Zhu, Lianwen Jin, and Canjie Luo. 2020. Adaptive Embedding Gate for Attention-Based Scene Text Recognition. Neurocomputing 381 (2020), 261–271.(ResNet)
- Yunze Gao, Yingying Chen, Jinqiao Wang, Ming Tang, and Hanqing Lu. 2018. Dense Chained Attention Netw