# OCR 解析功能更新文档 ## 功能描述 修改了 `/marriage/ocr/parse` 接口的返回结果结构,新增 `probability`(识别概率)和 `location`(定位信息)字段,参考百度结婚证识别API的返回格式。 ## 新增DTO类 ### 1. OcrProbability.java 表示OCR识别结果的概率信息 ```java { "average": 20.68798065, // 平均概率 "min": 0.9106679559 // 最小概率 } ``` ### 2. OcrLocation.java 表示OCR识别结果在图片中的位置信息 ```java { "width": 109, // 宽度 "height": 47, // 高度 "top": 933, // 顶部距离 "left": 253 // 左侧距离 } ``` ### 3. OcrFieldData.java 表示单个字段的完整识别数据 ```java { "word": "王连杰", // 识别的文本内容 "probability": {...}, // 识别概率 "location": {...} // 识别结果的位置 } ``` ## API 返回格式 ### 请求 ``` POST /marriage/ocr/parse Content-Type: application/json { "mobile": "18888888888", "smsCode": "123456", "uploadId": "xxxxx" } ``` ### 响应 ```json { "code": 200, "msg": "success", "data": { "raw": "{原始百度API返回的JSON响应}", "words": ["王连杰", "320321199504011218", ...], "parsed": { "husbandName": "王连杰", "wifeName": "张丹", "husbandId": "320321199504011218", "wifeId": "320321199406197047", "husbandBirthDate": "1995-04-01", "wifeBirthDate": "1994-06-19", "husbandNationality": "中国", "wifeNationality": "中国", "husbandGender": "男", "wifeGender": "女", "marriageNo": "320321201700004108", "certificateHolder": "王连杰", "registerDate": "2017-04-01" }, "parsed_detailed": { "husbandName": { "word": "王连杰", "probability": { "average": 20.68798065, "min": 0.9106679559 }, "location": { "width": 109, "height": 47, "top": 933, "left": 253 } }, "wifeName": { "word": "张丹", "probability": { "average": 19.14912224, "min": 0.8554975986 }, "location": { "width": 83, "height": 43, "top": 1204, "left": 239 } }, "husbandId": { "word": "320321199504011218", "probability": { "average": 13.2870388, "min": 0.5172381401 }, "location": { "width": 341, "height": 68, "top": 1081, "left": 343 } }, "wifeId": { "word": "320321199406197047", "probability": { "average": 15.98988342, "min": 0.6194867492 }, "location": { "width": 336, "height": 56, "top": 1351, "left": 326 } }, "husbandBirthDate": { "word": "1995-04-01", "probability": { "average": 20.66628647, "min": 0.7240950465 }, "location": { "width": 250, "height": 55, "top": 1044, "left": 857 } }, "wifeBirthDate": { "word": "1994-06-19", "probability": { "average": 25.62935066, "min": 0.9226108789 }, "location": { "width": 255, "height": 56, "top": 1322, "left": 829 } }, "husbandNationality": { "word": "中国", "probability": { "average": 17.19703484, "min": 0.7144192457 }, "location": { "width": 79, "height": 43, "top": 1011, "left": 249 } }, "wifeNationality": { "word": "中国", "probability": { "average": 23.41218376, "min": 0.9498358369 }, "location": { "width": 79, "height": 46, "top": 1264, "left": 242 } }, "husbandGender": { "word": "男", "probability": { "average": 24.97878838, "min": 0.9302400351 }, "location": { "width": 39, "height": 40, "top": 973, "left": 792 } }, "wifeGender": { "word": "女", "probability": { "average": 21.57674408, "min": 0.8877936602 }, "location": { "width": 42, "height": 42, "top": 1243, "left": 765 } }, "marriageNo": { "word": "320321201700004108", "probability": { "average": 16.35309982, "min": 0.6457977891 }, "location": { "width": 363, "height": 44, "top": 650, "left": 272 } }, "certificateHolder": { "word": "王连杰", "probability": { "average": 16.20750237, "min": 0.6932016015 }, "location": { "width": 119, "height": 44, "top": 362, "left": 271 } }, "registerDate": { "word": "2017-04-01", "probability": { "average": 19.06731987, "min": 0.7248777151 }, "location": { "width": 354, "height": 42, "top": 511, "left": 272 } } } } } ``` ## 主要变更 ### 1. 返回字段说明 | 字段 | 类型 | 说明 | |------|------|------| | raw | String | 百度API原始返回的JSON响应 | | words | Array | 识别的所有文本内容(旧格式,用于兼容) | | parsed | Object | 简化后的解析结果(仅包含文本,用于向后兼容) | | parsed_detailed | Object | **新增** 详细的解析结果(包含probability和location) | ### 2. 向后兼容性 - `parsed` 字段保持不变,仍然返回简化后的 `Map` 格式 - 新增 `parsed_detailed` 字段,返回完整的字段数据,包括: - `word`:识别的文本 - `probability`:识别的概率信息(average和min) - `location`:识别结果在图片中的位置信息 ### 3. 识别字段列表 支持的结婚证识别字段包括: - `husbandName` - 男方姓名 - `husbandId` - 男方身份证号 - `husbandBirthDate` - 男方出生日期 - `husbandNationality` - 男方国籍 - `husbandGender` - 男方性别 - `wifeName` - 女方姓名 - `wifeId` - 女方身份证号 - `wifeBirthDate` - 女方出生日期 - `wifeNationality` - 女方国籍 - `wifeGender` - 女方性别 - `marriageNo` - 结婚证字号 - `certificateHolder` - 持证人 - `registerDate` - 登记日期 - `remark` - 备注 ## 使用示例 ### 获取识别概率信息 ```java // 获取女方姓名的识别概率 double wifeNameAverage = response.data.parsed_detailed.wifeName.probability.average; double wifeNameMin = response.data.parsed_detailed.wifeName.probability.min; ``` ### 获取识别位置信息 ```java // 获取男方姓名的位置信息 int width = response.data.parsed_detailed.husbandName.location.width; int height = response.data.parsed_detailed.husbandName.location.height; int top = response.data.parsed_detailed.husbandName.location.top; int left = response.data.parsed_detailed.husbandName.location.left; ``` ### 获取简化文本(向后兼容) ```java // 仍然可以使用parsed字段获取简化的结果 String husbandName = response.data.parsed.husbandName; ``` ## 技术实现 ### 新增方法 1. **parseMarriageFieldsFromRawDetailed(String ocrResp)** - 从百度API原始响应中提取详细的字段数据 - 返回 `Map` 格式 2. **extractFieldData(JsonNode arr)** - 从JSON数组中提取单个字段的完整数据 - 包含word、probability、location三个部分 - 返回 `OcrFieldData` 对象 3. **convertToSimpleParsed(Map parsedDetailed)** - 将详细的字段数据转换为简化的字符串映射 - 用于维持向后兼容性 - 返回 `Map` 格式 ## 注意事项 1. 如果百度API返回的字段不包含probability或location信息,这些字段将为null 2. 地址信息中的width、height、top、left均为图片中的像素坐标 3. 概率值为0-100之间的浮点数,越高表示识别准确度越高 4. 处理日期字段会自动进行格式转换(从"YYYY年MM月DD日"转换为"YYYY-MM-DD") 5. 结婚证字号会自动提取数字部分(移除非数字字符)