Page 124 - Kỷ yếu hội thảo khoa học quốc tế - Ứng dụng công nghệ mới trong công trình xanh , lần thứ 8
P. 124

th
               HỘI THẢO KHOA HỌC QUỐC TẾ ATiGB LẦN THỨ TÁM - The 8  ATiGB 2023                         107

               study  are  limited  to  Vietnamese  and  English,   multilingual  NMT  models,  achieving  higher  BLEU
               international  universities  require  support  for  diverse   scores  [12].  Additionally,  techniques  used  in
               languages  reflecting  their  global  student  bodies.  By   multilingual  text  translation,  such  as  increasing  the
               training  OCR  models  on  datasets  encompassing   similarity  of  semantically  similar  sentences  in
               languages  such  as  English  and  Vietnamese,  this   different  languages,  can  be  applied  to  speech
               project  aims  to  create  an  inclusive  identification   translation  to  improve  few-shot  speech  translation
               system accessible to all students. Both the accuracy of   using  limited  data  [13].  These  approaches  aim  to
               multi-lingual OCR and its optimized deployment on   overcome the challenges of data scarcity and improve
               mobile  devices  will  be  evaluated.  If  successful,  the   the efficiency and accuracy of machine translation for
               system  will  significantly  improve  the  convenience   additional languages [14].
               and  accessibility  of  student  services,  registration,   Multi-lingual   OCR   systems   face   several
               access  control,  and  other  functions.  Our  study   challenges.  One  of  the  main  difficulties  is  language
               provides  a  strong  basis  for  the  techniques  required,   barriers, which can lead to requirements inconsistency
               including  deep  learning  for  text  recognition,  mobile   and  incompleteness  in  the  elicitation  process  [15].
               model  deployment,  and  user-friendly  interfaces.  By   Another challenge is the growing diversity of internet
               expanding  these  capabilities  to  new  languages,  this   users,  with  different  languages  and  cultural
               research  can  break  down  informational  barriers  and   preferences, which requires OCR systems to be able
               streamline  administration  for  international  students   to  handle  a  wide  range  of  languages  [16].
               from  all  backgrounds.  The  development  and   Additionally,  multi-lingual  OCR  systems  need  to
               evaluation  of  the  multi-lingual  OCR  system  will   consider   competing   objectives,   such   as
               assess  the  feasibility  of  this  approach  and  provide   recommendation  quality  at  the  individual  and
               direction for further improvements.            aggregate level, stakeholder objectives, and long-term
                  II. LITERATURE REVIEW                       vs. short-term objectives [17]. These competing goals
                  Existing  research  has  focused  on  OCR  for   make  it  necessary  to  develop  multi-objective
               languages like English and Vietnamese. One study by   recommender  systems  that  can  optimize  multiple
               Chinh  Ngo  et  al.  introduces  MTet,  a  large  parallel   objectives simultaneously. Overall, the challenges of
               corpus  for  English-Vietnamese  translation,  and   multi-lingual OCR systems include language barriers,
               releases the first pretrained model EnViT5 for these   diversity  of  languages  and  cultural  preferences,  and
               languages.  Their  model  outperforms  previous  state-  the need to balance competing objectives.
               of-the-art results in translation BLEU score. Another   III.  METHODOLOGY
               paper by H. V. T. Chi et al. proposes a method based   A. Technologies Used
               on  MT-DNN  to  detect  similarities  between  English
               and   Vietnamese   sentences   for   paraphrase   • Firebase ML;
               identification.  They  achieve  improved  accuracy  and   • TensorFlow Lite Model trained by BERT;
               F1  scores  by  changing  the  shared  layers  of  the
               original MT-DNN. Thi-Vinh Ngo et al. addresses the   • React Native Framework;
               rare  word  issue  in  multilingual  MT  systems  for   • Android Virtual Device of Android Studio;
               French-Vietnamese  and  English-Vietnamese  pairs.   • Actual Android Device;
               They propose strategies to learn word similarity and
               augment the translation ability of rare words, resulting   • Visual Code.
               in  significant  improvements  in  BLEU  points.  Duc   B. Data Collection and Augmentation
               Toan Truong et al. explore context-aware models for   To  create  the  multi-lingual  training  dataset,
               English-Vietnamese  translation  tasks,  aiming  to   student  ID  card  images  (Figure  1)  will  be  collected
               improve translation quality and human readability by   for languages including Vietnamese and English. We
               considering contextual information from consecutive   made  sure  that  there  should  be  a  coordination  with
               sentences.[6]–[9].                             international  student  groups  and  synthesis  using
                  Machine  translation  techniques  to  support   graphical  editing  will  be  used  to  generate  samples.
               additional  language  translation  include  Statistical   Data augmentation techniques like rotation, resizing,
               Machine  Translation  (SMT),  Rule-based  Machine   and  noise  injection  will  expand  the  training  data
               Translation   (RBMT),   Example-based   Machine   diversity.
               Translation (EBMT), and Neural Machine Translation
               (NMT)  [10].  Multilingual  NMT  models  leverage
               information  from  multiple  languages  to  improve
               translation  performance  [11].  Data  augmentation
               techniques  can  further  enhance  the  performance  of

                                                                                   ISBN: 978-604-80-9122-4
   119   120   121   122   123   124   125   126   127   128   129