Page 126 - Kỷ yếu hội thảo khoa học quốc tế - Ứng dụng công nghệ mới trong công trình xanh , lần thứ 8
P. 126
th
HỘI THẢO KHOA HỌC QUỐC TẾ ATiGB LẦN THỨ TÁM - The 8 ATiGB 2023 109
Model accuracy will be benchmarked on test data • On-device dictionaries and NLP for disambiguation
across all languages. The optimization process will and context.
focus on maximizing accuracy given model size and By targeting model improvements and leveraging
latency constraints. Comparisons to single language mobile-optimized NLP, the multi-lingual approach
models will validate the multi-lingual approach. User can reach accuracy closer to specialized individual
feedback will be incorporated to refine the interface models.
and experience. Figure 5 shows how our proposed
work shows 100% accuracy in recognizing text using V. CONCLUSIONS
OCR on a student ID card. This research demonstrated the feasibility of a
IV. RESULTS AND ANALYSIS multi-lingual OCR system for extracting text from
international student ID cards. The approach built on
The multi-lingual OCR model was evaluated on a existing Vietnamese/English models to train and
held-out test set of student ID card images across the optimize OCR - only a 2-3% decrease from individual
2 target languages of our study. OCR accuracy was uni-lingual models per language. It achieved 94.5%
measured as the percentage of text characters accuracy for multi-lingual model on average for
correctly extracted from the ID card region. Table I English and Vietnamese both and 95.35% accuracy
summarizes the results compared to individual uni- for uni-lingual model on average for English and
lingual models. Vietnamese languages. This represents a promising
Table I. OCR Accuracy By Language proof of concept given the significant challenge of
recognizing diverse scripts and designs. If accuracy
Multi-Lingual Uni-Lingual
Language can be further improved, the system would provide a
Model Model
convenient and inclusive student ID card reader
English 95.2% 96.5%
usable by international populations.
Vietnamese 93.8% 94.2%
The multi-lingual model was efficiently optimized
As seen in Table I, the multi-lingual model for on-device deployment using TensorFlow Lite and
achieves comparable accuracy to the specialized uni- mobile-specific architectures. This enables fast and
lingual models, with around a 1-3% decrease across reliable OCR performance on students' own devices
both languages. Accuracy above 94% can be without relying on network connectivity. The user
considered viable for practical use. interface and experiences were designed to be
A. Error Analysis intuitive across languages and scripts. These
a) Characters are mostly correct, but some are innovations successfully built upon the initial
mistaken for spaces or misses. Vietnamese/English OCR system described in our
study to address the needs of international
b) Extra spaces are predicted between universities.
characters.
c) Some characters are missed entirely.
These errors may stem from the challenging
segmentation of cursive English or Vietnamese script
(figures 6 & 7). Language disambiguation is also
difficult when characters are analyzed individually
rather than in context.
B. Optimizations
The following measures can further optimize
multi-lingual accuracy:
• Increased training data for low-accuracy
languages.
• Regularization and aggressive augmentation to
reduce overfitting.
• CTC loss training to improve sequence
prediction.
• Ensemble modelling to combine strengths of
different architectures. Fig. 6. Correct 85.71%, can't recognize 'ô', 'ô'
into letter 'o'
ISBN: 978-604-80-9122-4