This work presents a comparison of machine learning algorithms that are implemented to segment the characters of text presented as an image. The algorithms are designed to work on degraded documents with text that is not aligned in an organized fashion. The paper investigates the use of Support Vector Machines, K-Nearest Neighbor algorithm and an Encoder Network to perform the operation of character spotting. Character Spotting involves extracting potential characters from a stream of text by selecting regions bound by white space.
翻译:这项工作比较了机器学习算法, 这些算法用来分割作为图像显示的文字的字符。 这些算法旨在处理退化的文档, 其文本不是以有组织的方式对齐的。 本文调查了支持矢量机、 K- Nearest 邻里算法和用于执行字符识别操作的编码器网络的使用情况。 特征识别包括从文本中提取潜在字符, 选择受白色空间约束的区域 。