346 lines
10 KiB
Markdown
346 lines
10 KiB
Markdown
# OCR 解析功能更新文档
|
||
|
||
## 功能描述
|
||
|
||
修改了 `/marriage/ocr/parse` 接口的返回结果结构,新增 `probability`(识别概率)和 `location`(定位信息)字段,参考百度结婚证识别API的返回格式。
|
||
|
||
## 新增DTO类
|
||
|
||
### 1. OcrProbability.java
|
||
表示OCR识别结果的概率信息
|
||
|
||
```java
|
||
{
|
||
"average": 20.68798065, // 平均概率
|
||
"min": 0.9106679559 // 最小概率
|
||
}
|
||
```
|
||
|
||
### 2. OcrLocation.java
|
||
表示OCR识别结果在图片中的位置信息
|
||
|
||
```java
|
||
{
|
||
"width": 109, // 宽度
|
||
"height": 47, // 高度
|
||
"top": 933, // 顶部距离
|
||
"left": 253 // 左侧距离
|
||
}
|
||
```
|
||
|
||
### 3. OcrFieldData.java
|
||
表示单个字段的完整识别数据
|
||
|
||
```java
|
||
{
|
||
"word": "王连杰", // 识别的文本内容
|
||
"probability": {...}, // 识别概率
|
||
"location": {...} // 识别结果的位置
|
||
}
|
||
```
|
||
|
||
## API 返回格式
|
||
|
||
### 请求
|
||
|
||
```
|
||
POST /marriage/ocr/parse
|
||
Content-Type: application/json
|
||
|
||
{
|
||
"mobile": "18888888888",
|
||
"smsCode": "123456",
|
||
"uploadId": "xxxxx"
|
||
}
|
||
```
|
||
|
||
### 响应
|
||
|
||
```json
|
||
{
|
||
"code": 200,
|
||
"msg": "success",
|
||
"data": {
|
||
"raw": "{原始百度API返回的JSON响应}",
|
||
"words": ["王连杰", "320321199504011218", ...],
|
||
"parsed": {
|
||
"husbandName": "王连杰",
|
||
"wifeName": "张丹",
|
||
"husbandId": "320321199504011218",
|
||
"wifeId": "320321199406197047",
|
||
"husbandBirthDate": "1995-04-01",
|
||
"wifeBirthDate": "1994-06-19",
|
||
"husbandNationality": "中国",
|
||
"wifeNationality": "中国",
|
||
"husbandGender": "男",
|
||
"wifeGender": "女",
|
||
"marriageNo": "320321201700004108",
|
||
"certificateHolder": "王连杰",
|
||
"registerDate": "2017-04-01"
|
||
},
|
||
"parsed_detailed": {
|
||
"husbandName": {
|
||
"word": "王连杰",
|
||
"probability": {
|
||
"average": 20.68798065,
|
||
"min": 0.9106679559
|
||
},
|
||
"location": {
|
||
"width": 109,
|
||
"height": 47,
|
||
"top": 933,
|
||
"left": 253
|
||
}
|
||
},
|
||
"wifeName": {
|
||
"word": "张丹",
|
||
"probability": {
|
||
"average": 19.14912224,
|
||
"min": 0.8554975986
|
||
},
|
||
"location": {
|
||
"width": 83,
|
||
"height": 43,
|
||
"top": 1204,
|
||
"left": 239
|
||
}
|
||
},
|
||
"husbandId": {
|
||
"word": "320321199504011218",
|
||
"probability": {
|
||
"average": 13.2870388,
|
||
"min": 0.5172381401
|
||
},
|
||
"location": {
|
||
"width": 341,
|
||
"height": 68,
|
||
"top": 1081,
|
||
"left": 343
|
||
}
|
||
},
|
||
"wifeId": {
|
||
"word": "320321199406197047",
|
||
"probability": {
|
||
"average": 15.98988342,
|
||
"min": 0.6194867492
|
||
},
|
||
"location": {
|
||
"width": 336,
|
||
"height": 56,
|
||
"top": 1351,
|
||
"left": 326
|
||
}
|
||
},
|
||
"husbandBirthDate": {
|
||
"word": "1995-04-01",
|
||
"probability": {
|
||
"average": 20.66628647,
|
||
"min": 0.7240950465
|
||
},
|
||
"location": {
|
||
"width": 250,
|
||
"height": 55,
|
||
"top": 1044,
|
||
"left": 857
|
||
}
|
||
},
|
||
"wifeBirthDate": {
|
||
"word": "1994-06-19",
|
||
"probability": {
|
||
"average": 25.62935066,
|
||
"min": 0.9226108789
|
||
},
|
||
"location": {
|
||
"width": 255,
|
||
"height": 56,
|
||
"top": 1322,
|
||
"left": 829
|
||
}
|
||
},
|
||
"husbandNationality": {
|
||
"word": "中国",
|
||
"probability": {
|
||
"average": 17.19703484,
|
||
"min": 0.7144192457
|
||
},
|
||
"location": {
|
||
"width": 79,
|
||
"height": 43,
|
||
"top": 1011,
|
||
"left": 249
|
||
}
|
||
},
|
||
"wifeNationality": {
|
||
"word": "中国",
|
||
"probability": {
|
||
"average": 23.41218376,
|
||
"min": 0.9498358369
|
||
},
|
||
"location": {
|
||
"width": 79,
|
||
"height": 46,
|
||
"top": 1264,
|
||
"left": 242
|
||
}
|
||
},
|
||
"husbandGender": {
|
||
"word": "男",
|
||
"probability": {
|
||
"average": 24.97878838,
|
||
"min": 0.9302400351
|
||
},
|
||
"location": {
|
||
"width": 39,
|
||
"height": 40,
|
||
"top": 973,
|
||
"left": 792
|
||
}
|
||
},
|
||
"wifeGender": {
|
||
"word": "女",
|
||
"probability": {
|
||
"average": 21.57674408,
|
||
"min": 0.8877936602
|
||
},
|
||
"location": {
|
||
"width": 42,
|
||
"height": 42,
|
||
"top": 1243,
|
||
"left": 765
|
||
}
|
||
},
|
||
"marriageNo": {
|
||
"word": "320321201700004108",
|
||
"probability": {
|
||
"average": 16.35309982,
|
||
"min": 0.6457977891
|
||
},
|
||
"location": {
|
||
"width": 363,
|
||
"height": 44,
|
||
"top": 650,
|
||
"left": 272
|
||
}
|
||
},
|
||
"certificateHolder": {
|
||
"word": "王连杰",
|
||
"probability": {
|
||
"average": 16.20750237,
|
||
"min": 0.6932016015
|
||
},
|
||
"location": {
|
||
"width": 119,
|
||
"height": 44,
|
||
"top": 362,
|
||
"left": 271
|
||
}
|
||
},
|
||
"registerDate": {
|
||
"word": "2017-04-01",
|
||
"probability": {
|
||
"average": 19.06731987,
|
||
"min": 0.7248777151
|
||
},
|
||
"location": {
|
||
"width": 354,
|
||
"height": 42,
|
||
"top": 511,
|
||
"left": 272
|
||
}
|
||
}
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
## 主要变更
|
||
|
||
### 1. 返回字段说明
|
||
|
||
| 字段 | 类型 | 说明 |
|
||
|------|------|------|
|
||
| raw | String | 百度API原始返回的JSON响应 |
|
||
| words | Array<String> | 识别的所有文本内容(旧格式,用于兼容) |
|
||
| parsed | Object<String, String> | 简化后的解析结果(仅包含文本,用于向后兼容) |
|
||
| parsed_detailed | Object<String, OcrFieldData> | **新增** 详细的解析结果(包含probability和location) |
|
||
|
||
### 2. 向后兼容性
|
||
|
||
- `parsed` 字段保持不变,仍然返回简化后的 `Map<String, String>` 格式
|
||
- 新增 `parsed_detailed` 字段,返回完整的字段数据,包括:
|
||
- `word`:识别的文本
|
||
- `probability`:识别的概率信息(average和min)
|
||
- `location`:识别结果在图片中的位置信息
|
||
|
||
### 3. 识别字段列表
|
||
|
||
支持的结婚证识别字段包括:
|
||
- `husbandName` - 男方姓名
|
||
- `husbandId` - 男方身份证号
|
||
- `husbandBirthDate` - 男方出生日期
|
||
- `husbandNationality` - 男方国籍
|
||
- `husbandGender` - 男方性别
|
||
- `wifeName` - 女方姓名
|
||
- `wifeId` - 女方身份证号
|
||
- `wifeBirthDate` - 女方出生日期
|
||
- `wifeNationality` - 女方国籍
|
||
- `wifeGender` - 女方性别
|
||
- `marriageNo` - 结婚证字号
|
||
- `certificateHolder` - 持证人
|
||
- `registerDate` - 登记日期
|
||
- `remark` - 备注
|
||
|
||
## 使用示例
|
||
|
||
### 获取识别概率信息
|
||
|
||
```java
|
||
// 获取女方姓名的识别概率
|
||
double wifeNameAverage = response.data.parsed_detailed.wifeName.probability.average;
|
||
double wifeNameMin = response.data.parsed_detailed.wifeName.probability.min;
|
||
```
|
||
|
||
### 获取识别位置信息
|
||
|
||
```java
|
||
// 获取男方姓名的位置信息
|
||
int width = response.data.parsed_detailed.husbandName.location.width;
|
||
int height = response.data.parsed_detailed.husbandName.location.height;
|
||
int top = response.data.parsed_detailed.husbandName.location.top;
|
||
int left = response.data.parsed_detailed.husbandName.location.left;
|
||
```
|
||
|
||
### 获取简化文本(向后兼容)
|
||
|
||
```java
|
||
// 仍然可以使用parsed字段获取简化的结果
|
||
String husbandName = response.data.parsed.husbandName;
|
||
```
|
||
|
||
## 技术实现
|
||
|
||
### 新增方法
|
||
|
||
1. **parseMarriageFieldsFromRawDetailed(String ocrResp)**
|
||
- 从百度API原始响应中提取详细的字段数据
|
||
- 返回 `Map<String, OcrFieldData>` 格式
|
||
|
||
2. **extractFieldData(JsonNode arr)**
|
||
- 从JSON数组中提取单个字段的完整数据
|
||
- 包含word、probability、location三个部分
|
||
- 返回 `OcrFieldData` 对象
|
||
|
||
3. **convertToSimpleParsed(Map<String, OcrFieldData> parsedDetailed)**
|
||
- 将详细的字段数据转换为简化的字符串映射
|
||
- 用于维持向后兼容性
|
||
- 返回 `Map<String, String>` 格式
|
||
|
||
## 注意事项
|
||
|
||
1. 如果百度API返回的字段不包含probability或location信息,这些字段将为null
|
||
2. 地址信息中的width、height、top、left均为图片中的像素坐标
|
||
3. 概率值为0-100之间的浮点数,越高表示识别准确度越高
|
||
4. 处理日期字段会自动进行格式转换(从"YYYY年MM月DD日"转换为"YYYY-MM-DD")
|
||
5. 结婚证字号会自动提取数字部分(移除非数字字符)
|
||
|