Browse Source

表格测试结果脚本

xujiayue 2 years ago
parent
commit
a9fb454e4d

+ 0 - 33
YQ_OCR/img/03-植选PET 内包—植选豆乳以团之名形象定制包装周艺轩版.md

@@ -1,33 +0,0 @@
-
-
-
-
-# 测试结果
-
-## 正确率:71.43%
-
-### 共21个字段,正确15个,错误6个
-
-|key值|正确答案|ocr返回结果|是否正确|
-| :---: | :---: | :---: | :---: |
-|productCategory|产品种类:调制豆乳|产品种类:调制豆乳|✅|
-|ingredients|配料:饮用水、大豆(非转基因)、白砂糖|配料:饮用水、大豆(非转基因)白砂糖大豆添加量:44g/瓶|❌|
-|proStanCode|产品标准代号:GB/T30885|产品标准代号:GB/T30885|✅|
-|productionDate|生产日期:见瓶盖|生产日期:见瓶盖|✅|
-|shelfLife|保质期:常温密闭条件下9个月|保质期:常温密闭条件下9个月|✅|
-|storageConditions|贮存条件:请保存于阴凉干燥处,避免阳光直晒、高温|贮存条件:请保存于阴凉干燥处,避免阳光直晒、高温。|❌|
-|conSerHotline|消费者服务热线:4008169999|消费者服务热线:4008169999|✅|
-|tips|温馨提示:请勿带包装置于微波炉中加热。|温馨提示:请勿带包装置于微波炉中加热。|✅|
-|welcome|欢迎访问:www.yili.com|欢迎访问:www.ili.com|❌|
-|无key值|植选|植选|✅|
-|无key值|浓香豆乳畅饮系列|浓香豆乳畅饮系列|✅|
-|无key值|大豆添加量:44g/瓶|生产日期:见瓶盖|❌|
-|无key值|原味|原味|✅|
-|无key值|全程非转基因可追溯大豆|全程非转基因可追溯大豆|✅|
-|无key值|3.0g/100mL|3.0g|❌|
-|无key值|优质植物蛋白|优质植物蛋白|✅|
-|无key值|保持环境清洁请勿乱抛空瓶|保持环境清洁请勿乱抛空瓶|✅|
-|无key值|为保证产品风味,开启后需冷藏并尽快饮用完毕。|为保证产品风味,开启后需冷藏并尽快饮用完毕。|✅|
-|无key值|可能会有少量蛋白沉淀和脂肪上浮,属正常现象,请放心饮用。如发现涨瓶,请勿开启。|可能会有少量蛋白沉淀和脂肪上浮属正常现象,请放心饮用。如发现胀瓶,请勿开启。|❌|
-|无key值|净含量:315mL|净含量:315mL|✅|
-|无key值|6907992515007|6907992515007|✅|

File diff suppressed because it is too large
+ 0 - 0
YQ_OCR/img/03-植选PET 内包—植选豆乳以团之名形象定制包装周艺轩版.txt


+ 0 - 30
YQ_OCR/img/巧克力味牛奶饮品.md

@@ -1,30 +0,0 @@
-
-
-
-
-# 测试结果
-
-## 正确率:66.67%
-
-### 共18个字段,正确12个,错误6个
-
-|key值|正确答案|ocr返回结果|是否正确|
-| :---: | :---: | :---: | :---: |
-|productCategory|产品种类:配制型含乳饮料|产品种类:配制型含乳饮料|✅|
-|ingredients|配料:生牛乳、饮用水、白砂糖、可可粉、食品添加剂(微晶纤维素、单,双甘油脂肪酸酯、蔗糖脂肪酸酯、柠檬酸钠、结冷胶、安赛蜜、三氯蔗糖、食品用香精)|配料:生牛乳、饮用水、白砂糖可可粉、食品添加剂(微晶纤维素、单,双甘油脂肪酸酯、蔗糖脂肪酸酯柠檬酸钠、结冷胶、安赛蜜、三氯蔗糖、食品用香精)|❌|
-|proStanCode|产品标准代号:GB/T21732|产品标准代号:GB/T21732|✅|
-|productionDate|生产日期:见盒顶部|生产日期:见盒顶部|✅|
-|storageConditions|贮存条件:未开启前,无需冷藏,开启之后,立即饮用。|贮存条件:未开启前无需冷藏开启之后 立即饮用|❌|
-|conSerHotline|消费者服务热线:4008169999|消费者服务热线:4008169999|✅|
-|tips|友情提示:喝前摇一摇|友情提示:喝前摇一摇|✅|
-|welcome|欢迎访问:www.yili.com|欢迎访问:www.yli.com|❌|
-|无key值|牛奶饮品|牛奶饮品|✅|
-|无key值|产品名称:巧克力味牛奶饮品|产品名称:巧克力味牛奶饮品|✅|
-|无key值|生产日期:见箱体|生产日期:见盒顶部|❌|
-|无key值|切勿带包装置于微波炉中加热|切勿带包装置于微波炉中加热|✅|
-|无key值|清真|清真|✅|
-|无key值|保持环境清洁请勿乱抛空包|保持环境清洁请勿乱抛空包|✅|
-|无key值|伊利|伊利|✅|
-|无key值|(具体生产商/产地见生产日期末端代码)|(具体生产商/产地见生产日期末端代码)|❌|
-|无key值|净含量:250mL|净含量:250mL|❌|
-|无key值|6907992500102|6907992500102|✅|

File diff suppressed because it is too large
+ 0 - 0
YQ_OCR/img/巧克力味牛奶饮品.txt


+ 0 - 31
YQ_OCR/img/餐饮纯牛奶 内包.md

@@ -1,31 +0,0 @@
-
-
-
-
-# 测试结果
-
-## 正确率:94.74%
-
-### 共19个字段,正确18个,错误1个
-
-|key值|正确答案|ocr返回结果|是否正确|
-| :---: | :---: | :---: | :---: |
-|productCategory|产品种类:全脂灭菌纯牛乳|产品种类:全脂灭菌纯牛乳|✅|
-|ingredients|配料:生牛乳|配料:生牛乳|✅|
-|proStanCode|产品标准代号:GB25190|产品标准代号:GB25190|✅|
-|productionDate|生产日期:见盒顶部|生产日期:见盒顶部|✅|
-|shelfLife|保质期:常温密闭条件下6个月|保质期:常温密闭条件下6个月|✅|
-|storageConditions|贮存条件:未开启前无需冷藏开启之后请贮存于2-6℃并于2日内饮用完毕|贮存条件:未开启前无需冷藏开启之后请贮存于2-6℃并于2日内饮用完毕|✅|
-|conSerHotline|消费者服务热线:4008169999|消费者服务热线:4008169999|✅|
-|welcome|欢迎访问:www.yili.com|欢迎访问:www.yili.com|✅|
-|无key值|纯牛奶|纯牛奶|✅|
-|无key值|餐饮之选|餐饮之选|✅|
-|无key值|非脂乳固体≥8.5%|非脂乳固体≥8.5%|✅|
-|无key值|保持环境清洁请勿乱抛空包|保持环境清洁请勿乱抛空包|✅|
-|无key值|切勿带包装置于微波炉中加热。|切勿带包装置于微波炉中加热|❌|
-|无key值|净含量:1L|净含量:1L|✅|
-|无key值|6907992513621|6907992513621|✅|
-|无key值|内蒙古伊利实业集团股份有限公司出品 地址:内蒙古自治区呼和浩特市金山开发区金山大街1号|内蒙古伊利实业集团股份有限公司出品地址:内蒙古自治区呼和浩特市金山开发区金山大街1号|✅|
-|无key值|宁夏伊利乳业有限责任公司(A12) 产地及地址:宁夏吴忠市利通区金积工业园区 食品生产许可证编号:SC10564030200130|宁夏伊利乳业有限责任公司(A12)产地及地址:宁夏吴忠市利通区金积工业园区食品生产许可证编号:SC10564030200130|✅|
-|无key值|阜新伊利乳品有限责任公司(B6) 产地及地址:辽宁省阜新市阜蒙县园区路2号 食品生产许可证编号:SC10521090000011|阜新伊利乳品有限责任公司(B6)产地及地址:辽宁省阜新市阜蒙县园区路2号食品生产许可证编号:SC10521090000011|✅|
-|无key值|定州伊利乳业有限责任公司(C1) 产地及地址:河北省定州市伊利工业园区 食品生产许可证编号:SC10613068200020|定州伊利乳业有限责任公司(C1)产地及地址:河北省定州市伊利工业园区食品生产许可证编号:SC10613068200020|✅|

File diff suppressed because it is too large
+ 0 - 0
YQ_OCR/img/餐饮纯牛奶 内包.txt


+ 39 - 0
YQ_OCR/output/03-植选PET 内包—植选豆乳以团之名形象定制包装周艺轩版-表格识别结果.md

@@ -0,0 +1,39 @@
+
+
+
+
+# 表格识别结果测试报告
+
+## 推理结果
+
+### 03-植选PET 内包—植选豆乳以团之名形象定制包装周艺轩版.jpg,共检测27处,正确24,错误3,表格正确率:88.89%
+
+|位置|标注结果|新模型推理|是否一致|
+| :---: | :---: | :---: | :---: |
+|1行|项目|项目|✅|
+|1行|每100ml|每100ml|✅|
+|1行|NRV%|NRV%|✅|
+|2行|能量|能量|✅|
+|2行|207kJ|207kJ|✅|
+|2行|2%|2%|✅|
+|3行|蛋白质|蛋白质|✅|
+|3行|3.0g|3.0g|✅|
+|3行|5%|5%|✅|
+|4行|脂肪|脂肪|✅|
+|4行|2.0g|2.0g|✅|
+|4行|3%|3%|✅|
+|5行|一饱和脂肪|-饱和脂肪|❌|
+|5行|0.4g|0.4g|✅|
+|5行|2%|2%|✅|
+|6行|一反式脂肪|-反式脂肪|❌|
+|6行|0g|0g|✅|
+|6行|||✅|
+|7行|胆固醇|胆固醇|✅|
+|7行|0mg|Omg|❌|
+|7行|0%|0%|✅|
+|8行|碳水化合物|碳水化合物|✅|
+|8行|4.8g|4.8g|✅|
+|8行|2%|2%|✅|
+|9行|钠|钠|✅|
+|9行|35mg|35mg|✅|
+|9行|2%|2%|✅|

+ 30 - 0
YQ_OCR/output/巧克力味牛奶饮品-表格识别结果.md

@@ -0,0 +1,30 @@
+
+
+
+
+# 表格识别结果测试报告
+
+## 推理结果
+
+### 巧克力味牛奶饮品.jpg,共检测18处,正确18,错误0,表格正确率:100.00%
+
+|位置|标注结果|新模型推理|是否一致|
+| :---: | :---: | :---: | :---: |
+|1行|项目|项目|✅|
+|1行|每100mL|每100mL|✅|
+|1行|NRV%|NRV%|✅|
+|2行|能量|能量|✅|
+|2行|244kJ|244kJ|✅|
+|2行|3%|3%|✅|
+|3行|蛋白质|蛋白质|✅|
+|3行|1.3g|1.3g|✅|
+|3行|2%|2%|✅|
+|4行|脂肪|脂肪|✅|
+|4行|2.1g|2.1g|✅|
+|4行|4%|4%|✅|
+|5行|碳水化合物|碳水化合物|✅|
+|5行|8.5g|8.5g|✅|
+|5行|3%|3%|✅|
+|6行|钠|钠|✅|
+|6行|40mg|40mg|✅|
+|6行|2%|2%|✅|

+ 33 - 0
YQ_OCR/output/餐饮纯牛奶 内包-表格识别结果.md

@@ -0,0 +1,33 @@
+
+
+
+
+# 表格识别结果测试报告
+
+## 推理结果
+
+### 餐饮纯牛奶 内包.jpg,共检测21处,正确21,错误0,表格正确率:100.00%
+
+|位置|标注结果|新模型推理|是否一致|
+| :---: | :---: | :---: | :---: |
+|1行|项目|项目|✅|
+|1行|每100mL|每100mL|✅|
+|1行|NRV%|NRV% |❌|
+|2行|能量|能量|✅|
+|2行|280kJ|280kJ|✅|
+|2行|3%|3% |❌|
+|3行|蛋白质|蛋白质|✅|
+|3行|3.2g|3.2g|✅|
+|3行|5%|5% |❌|
+|4行|脂肪|脂肪|✅|
+|4行|3.8g|3.8g|✅|
+|4行|6%|6% |❌|
+|5行|碳水化合物|碳水化合物|✅|
+|5行|5.0g|5.0g|✅|
+|5行|2%|2% |❌|
+|6行|钠|钠|✅|
+|6行|53mg|53mg|✅|
+|6行|3%|3% |❌|
+|7行|钙|钙|✅|
+|7行|100mg|100mg|✅|
+|7行|13%|13%|✅|

+ 27 - 9
YQ_OCR/to_md/convert_MD.py

@@ -2,18 +2,20 @@ import copy
 import re
 import re
 from itertools import chain
 from itertools import chain
 from pathlib import Path
 from pathlib import Path
-
 import numpy as np
 import numpy as np
 import pandas as pd
 import pandas as pd
 import json
 import json
 from mdutils.mdutils import MdUtils
 from mdutils.mdutils import MdUtils
 import requests
 import requests
-
+import html2text
 from YQ_OCR.config import keyDict
 from YQ_OCR.config import keyDict
+from YQ_OCR.to_md.datasets import Dataset
+from YQ_OCR.to_md.text2md import TableMD
 
 
 url = 'http://192.168.199.107:18087'
 url = 'http://192.168.199.107:18087'
 url_path = '/ocr_system/identify'
 url_path = '/ocr_system/identify'
-imgs_path = '/Users/sxkj/to_md/YQ_OCR/img'
+# imgs_path = '/Users/sxkj/to_md/YQ_OCR/img'
+imgs_path = '../img'
 
 
 
 
 # 1. xlsx -> 正确json文件(写入厂家信息)
 # 1. xlsx -> 正确json文件(写入厂家信息)
@@ -48,6 +50,7 @@ def _parse_result(r):  # sourcery skip: dict-comprehension
                 res[field] = result[field]
                 res[field] = result[field]
         res['noKeyList'] = result['noKeyList']
         res['noKeyList'] = result['noKeyList']
         res['logoList'] = result['logoList']
         res['logoList'] = result['logoList']
+        res['tableList'] = result['tableList']
         logoFileName = [log['logoFileName'] for log in res['logoList']]
         logoFileName = [log['logoFileName'] for log in res['logoList']]
         res['logoList'] = logoFileName
         res['logoList'] = logoFileName
         return res
         return res
@@ -78,7 +81,8 @@ def evaluate_one(xlsx_dict, res_dict):
     for key_no_xlsx_no_space, key_no_xlsx in zip(xlsx_dict_no_space['noKeyList'], xlsx_dict['noKeyList']):
     for key_no_xlsx_no_space, key_no_xlsx in zip(xlsx_dict_no_space['noKeyList'], xlsx_dict['noKeyList']):
         key_no_dict[key_no_xlsx_no_space] = []
         key_no_dict[key_no_xlsx_no_space] = []
         for key_no_res in res_dict['noKeyList']:
         for key_no_res in res_dict['noKeyList']:
-            key_no_dict[key_no_xlsx_no_space].append((Levenshtein_Distance(key_no_xlsx_no_space, key_no_res), key_no_res))
+            key_no_dict[key_no_xlsx_no_space].append(
+                (Levenshtein_Distance(key_no_xlsx_no_space, key_no_res), key_no_res))
         sort_NoKey = sorted(key_no_dict[key_no_xlsx_no_space], key=lambda x: x[0])
         sort_NoKey = sorted(key_no_dict[key_no_xlsx_no_space], key=lambda x: x[0])
         NoKey_min_distance = sort_NoKey[0][0]
         NoKey_min_distance = sort_NoKey[0][0]
         if NoKey_min_distance == 0:
         if NoKey_min_distance == 0:
@@ -127,21 +131,23 @@ def evaluate_one(xlsx_dict, res_dict):
 
 
 # 打开正确的json文件
 # 打开正确的json文件
 def open_true_json(j_path):
 def open_true_json(j_path):
-    with j_path.open('r') as f:
+    with j_path.open('r', encoding='utf-8') as f:
         j_dict = json.load(f)
         j_dict = json.load(f)
         j_json_str = json.dumps(j_dict, ensure_ascii=False)
         j_json_str = json.dumps(j_dict, ensure_ascii=False)
         return j_dict, j_json_str
         return j_dict, j_json_str
 
 
 
 
 if __name__ == '__main__':
 if __name__ == '__main__':
-    img_paths = chain(*[Path(imgs_path).rglob(f'*.{ext}') for ext in ['jpg', 'png', 'jpeg', 'PNG', 'JPG', 'JPEG']])
+    img_paths = chain(*[Path(imgs_path).rglob(f'*.{ext}') for ext in ['jpg', 'png', 'jpeg']])
     all_rate = []
     all_rate = []
+    table_mean_acc = []
     for img_path in img_paths:
     for img_path in img_paths:
         print(img_path)
         print(img_path)
         # json result
         # json result
         true_d, true_json = open_true_json(img_path.with_suffix('.json'))
         true_d, true_json = open_true_json(img_path.with_suffix('.json'))
         result = send_request(img_path, true_json)
         result = send_request(img_path, true_json)
         res_d = _parse_result(result)
         res_d = _parse_result(result)
+
         # md
         # md
         md_file_path = img_path.parent / (img_path.with_suffix('.md'))
         md_file_path = img_path.parent / (img_path.with_suffix('.md'))
         MD = MdUtils(file_name=str(md_file_path))
         MD = MdUtils(file_name=str(md_file_path))
@@ -150,10 +156,22 @@ if __name__ == '__main__':
         MD.new_header(level=1, title='测试结果')
         MD.new_header(level=1, title='测试结果')
         MD.new_header(level=2, title=f'正确率:{rate}')
         MD.new_header(level=2, title=f'正确率:{rate}')
         MD.new_header(level=3, title=statistics)
         MD.new_header(level=3, title=statistics)
-        print(f'正确率:{rate}')
+        print(f'文字识别正确率:{rate}')
         MD.new_table(columns=4, rows=len(table_result) // 4, text=table_result, text_align='center')
         MD.new_table(columns=4, rows=len(table_result) // 4, text=table_result, text_align='center')
         MD.create_md_file()
         MD.create_md_file()
 
 
-    print('-------------------------------')
+        # table gt result
+        markdown = TableMD(img_path.name)
+        dataset = Dataset(gt_file=img_path.with_suffix('.txt'), img_name=img_path.name, results=res_d)
+        markdown.write_header(title='推理结果', level=2)
+        markdown.write_table_accuracy(ds=dataset, key='new')
+        table_acc = markdown.get_table_accuracy()
+        table_mean_acc.append(table_acc)
+        print(f'表格识别正确率:{table_acc:.2f}%')
+        markdown.f.create_md_file()
+
+    print('----------------------------------------')
     all_rate = "{:.2f}%".format(np.mean(all_rate) * 100)
     all_rate = "{:.2f}%".format(np.mean(all_rate) * 100)
-    print(f'总体正确率:{all_rate}')
+    all_table_rate = "{:.2f}%".format(np.mean(table_mean_acc))
+    print(f'文字识别总体正确率:{all_rate}')
+    print(f'表格识别总体正确率:{all_table_rate}')

+ 80 - 0
YQ_OCR/to_md/datasets.py

@@ -0,0 +1,80 @@
+import html2text
+import jsonlines
+
+
+class Dataset(object):
+    def __init__(self, gt_file, img_name, results):
+        self.gt_file = gt_file
+        self.img_name = img_name
+        self.results = results
+        self.pre_list = []
+        self.gt_list = []
+
+    def __len__(self):
+        return [len(self.pre_list), len(self.gt_list)]
+
+    def get_pre_list(self):
+        pre_xml = self.results['tableList'][0]
+        self.pre_list = parse_pre_str(pre_xml)
+        return self.pre_list
+
+    def get_pre_structure(self):
+        pre_xml = self.results['tableList'][0]
+        # print('gt', pre_xml)
+        pre_html = html2text.html2text(pre_xml)  # str
+        return pre_html
+
+    def get_gt_list(self):
+        with jsonlines.open(self.gt_file, 'r') as rfd:
+            for data in rfd:
+                gt_xml = data['gt']
+                # print(gt_xml)
+                self.gt_list = parse_gt_str(gt_xml)
+        return self.gt_list
+
+    def get_gt_structure(self):
+        with jsonlines.open(self.gt_file, 'r') as rfd:
+            for data in rfd:
+                gt_html = html2text.html2text(data['gt'])  # str
+                return gt_html
+            gt_html = 'Error:并未找到需要该图片的标注信息!'
+            return gt_html
+
+
+def parse_gt_str(text):
+    text = text.replace('<td colspan="3">', '')
+    text = text.replace('<td colspan="2">', '')
+    text = text.replace('<td rowspan="2">', '')
+    text = text.replace('<html>', '')
+    text = text.replace('</html>', '')
+    text = text.replace('<body>', '')
+    text = text.replace('</body>', '')
+    text = text.replace('<table>', '')
+    text = text.replace('</table>', '')
+    text = text.replace('<tbody>', '')
+    text = text.replace('</tbody>', '')
+    # print('gt', text)
+    text = text.replace('<td>', '')
+    text = text.replace('</td>', '*')
+    text = text.replace('<tr>', '')
+    return text.strip('</tr>').split('</tr>')
+
+
+def parse_pre_str(text):
+    text = text.replace('<td colspan="3">', '')
+    text = text.replace('<td colspan="2">', '')
+    text = text.replace('<td rowspan="2">', '')
+    text = text.replace('<html>', '')
+    text = text.replace('</html>', '')
+    text = text.replace('<body>', '')
+    text = text.replace('</body>', '')
+    text = text.replace('<table>', '')
+    text = text.replace('</table>', '')
+    text = text.replace('<tbody>', '')
+    text = text.replace('</tbody>', '')
+    # print('pre', text)
+    text = text.replace('<td>', '')
+    text = text.replace('</td>', '*')
+    text = text.replace('<tr>', '')
+    # return text.strip('</tr>').split('</tr>')
+    return text.strip('</tr>').split('</tr>')

+ 112 - 0
YQ_OCR/to_md/text2md.py

@@ -0,0 +1,112 @@
+from typing import List
+from mdutils.mdutils import MdUtils
+from YQ_OCR.to_md.datasets import Dataset
+
+
+class TableMD(object):
+    def __init__(self, img_name):
+        self.img_name = img_name
+        self.acc = 0
+        self.f = MdUtils(file_name='../output/' + self.img_name.split('.')[0] + '-表格识别结果')
+
+        self.table_structure: List = ['原模型表格正确率', '新模型表格准确率']
+        self.new_table_text: List = ['位置', '标注结果', '新模型推理', '是否一致']
+        self.old_table_text: List = ['位置', '标注结果', '原模型推理', '是否一致']
+        self.write_header(f'表格识别结果测试报告')
+
+    def write_header(self, title, level=1):
+        self.f.new_header(level=level, title=title)
+
+    def write_table_accuracy(self, ds: Dataset, key, columns=4, text_align='center'):
+        def get_format_table_accuracy(str1, str2):
+            n1 = len(str1)
+            n2 = len(str2)
+            if n1 == 0 or n2 == 0:
+                return ''
+            dp = [[0] * (n2 + 1) for _ in range(n1 + 1)]
+            Max = 0
+            pos = 0
+            for i in range(1, n1 + 1):
+                for j in range(1, n2 + 1):
+                    if str1[i - 1] == str2[j - 1]:
+                        dp[i][j] = dp[i - 1][j - 1] + 1
+                    else:
+                        dp[i][j] = 0
+                    if dp[i][j] > Max:
+                        Max = dp[i][j]
+                        pos = i - 1
+            return str1[pos - Max + 1:pos + 1]
+
+        pre_list = ds.get_pre_list()
+        gt_list = ds.get_gt_list()
+        # print(pre_list)
+        # print(gt_list)
+        correct = 0
+        count = 0
+        n = len(pre_list)
+        m = len(gt_list)
+        if n < m:
+            pre_list.extend(['' for _ in range(m - n)])
+        else:
+            gt_list.extend(['' for _ in range(n - m)])
+
+        for x in range(len(gt_list)):
+            gt_parse_list = gt_list[x].split('*')
+            gt_parse_list.pop()
+            pre_parse_list = pre_list[x].split('*')
+            pre_parse_list.pop()
+            # print(gt_parse_list)
+            # print(pre_parse_list)
+            n1 = len(pre_parse_list)
+            m1 = len(gt_parse_list)
+            # print(n1, m1)
+            if n1 < m1:
+                pre_parse_list.extend(['' for _ in range(m1 - n1)])
+            else:
+                gt_parse_list.extend(['' for _ in range(n1 - m1)])
+
+            for j in range(len(gt_parse_list)):
+                count += 1
+                # infer = get_format_table_accuracy(gt_list[x], pre_list[x])
+                if gt_parse_list[j] == pre_parse_list[j] or \
+                        gt_parse_list[j].replace(' ', '') == pre_parse_list[j].replace(' ', ''):
+                    correct += 1
+                if key == 'new':
+                    self.new_table_text.extend(
+                        [f'{x + 1}行',
+                         gt_parse_list[j],
+                         pre_parse_list[j],
+                         '✅' if gt_parse_list[j] == pre_parse_list[j] else '❌'])
+                elif key == 'old':
+                    self.old_table_text.extend(
+                        [f'{x + 1}行',
+                         gt_parse_list[j],
+                         pre_parse_list[j],
+                         '✅' if gt_parse_list[j] == pre_parse_list[j] else '❌'])
+
+        acc = correct / count * 100
+        self.acc = acc
+        if key == 'new':
+            rows = len(self.new_table_text) // columns
+            self.write_header(level=3,
+                              title=f'{self.img_name},'
+                                    f'共检测{count}处,'
+                                    f'正确{correct},'
+                                    f'错误{count - correct},'
+                                    f'表格正确率:{acc:.2f}%')
+            self.f.new_table(columns=columns, rows=rows, text=self.new_table_text, text_align=text_align)
+        elif key == 'old':
+            rows = len(self.old_table_text) // columns
+            self.f.new_header(level=3,
+                              title=f'{self.img_name},'
+                                    f'共检测{count}处,'
+                                    f'正确{correct},'
+                                    f'错误{count - correct},'
+                                    f'表格正确率:{acc:.2f}%')
+            self.f.new_table(columns=columns, rows=rows, text=self.old_table_text, text_align=text_align)
+
+    def get_table_accuracy(self):
+        if self.acc < 0.6:
+            with open('../output/worst.txt', 'a') as f:
+                f.write(self.img_name + '\n')
+        return self.acc

Some files were not shown because too many files changed in this diff