                             Application of Optical Character

                       Recognition Method: A Case Study for

                             Educational Information System

                     Cong-Doan Truong                  Tran Viet Long                Ngo Quang Truong
                     International School            International School            International School
                  Vietnam National University     Vietnam National University     Vietnam National University
                       Hanoi, Vietnam                  Hanoi, Vietnam                  Hanoi, Vietnam
                       Ha Manh Hung                   Hoang Huong Chi                  Ninh Thi Linh
                     International School            International School            International School
                  Vietnam National University     Vietnam National University     Vietnam National University
                       Hanoi, Vietnam                  Hanoi, Vietnam                  Hanoi, Vietnam
                      Pham Duy Duong                   Satyam Mishra                   Do Cong Tuan
                Faculty of Electrical and Electronic   International School          International School
                        Engineering               Vietnam National University     Vietnam National University
               The University of Danang - University   Hanoi, Vietnam                  Hanoi, Vietnam
                   of Technology and Education
                      Danang, Vietnam

                  Abstract: This research presents a multi-lingual Optical   reliable  OCR  performance  independent  of  network
               Character  Recognition  (OCR)  system  designed  to  extract   connectivity. User interfaces were designed to be language-
               text from international student ID cards, catering to English   agnostic.  Future  work  includes  expanding  language
               and  Vietnamese  languages.  The multi-lingual  OCR  model   coverage, increasing training data, and exploring advanced
               was   compared   to   individual   uni-lingual   models,   neural  network  architectures.  Integration  with  student
               demonstrating  comparable  accuracy,  with  a  modest  1-3%   information  databases  and  administration  systems  for
               decrease.  Key  error  analysis  revealed  challenges  in   streamlined  access  and  real-time  language  translation  for
               character  segmentation  and  language  disambiguation,   enhanced  usability  are  also  suggested.  In  summary,  this
               particularly  in  cursive  scripts.  To  enhance  accuracy,   research lays the foundation for inclusive OCR systems in
               optimization  measures  including  increased  training  data,   international universities, offering equitable experiences for
               regularization, CTC loss training, ensemble modelling, and   diverse  student  populations  and  facilitating  administrative
               on-device  dictionaries  were  proposed.  The  study  achieved   processes.
               an average accuracy of 94.5% for the multi-lingual model   Keywords: Optical Character Recognition (OCR), Multi-
               and  95.35%  for  the  uni-lingual  model,  showcasing  the   lingual  Text  Recognition,  Deep  Learning,  Mobile
               feasibility  of  this  approach  despite  the  complexities  of   Deployment, Language Translation, Machine Learning
               diverse  scripts  and  designs.  The  system  was  efficiently
               optimized  for  on-device  deployment,  ensuring  fast  and
                  I.  INTRODUCTION                            more efficient process in generating and managing ID
                                                              cards, leading to a more robust solution and a more
                  Existing systems for student ID cards in Vietnam   efficient  delivery  of  service  [4].  Additionally,  the
               have limitations in terms of content, format, and test   invention of an intelligent electronic student ID card
               administration,   which   negatively   impact   the   with a control module, charging module, positioning
               communication goal of foreign language education in   module, and communication module offers automatic
               Vietnamese  schools  [1].  These  limitations  call  for  a   and  accurate  attendance  recording,  real-time  student
               radical  renovation  in  both  the  format  and   position  detection,  and  improved  positioning
               administration  of  the  national  matriculation  and   accuracy [5].
               general certificate of secondary education English test
               [2].  On  the  other  hand,  electronic  ID  systems  in   The  objective  of  this  research  is  to  develop  and
               educational  institutions  have  advanced  technologically,   evaluate a multi-lingual optical character recognition
               but there is a need for enhancements to improve their   (OCR)  system  capable  of  extracting  text  from
               capabilities [3]. These enhancements aim to achieve a   international  student  identification  cards.  While
                                                              existing  systems  such  as  the  one  described  in  our
