A language assistant system for smart glasses

A language assistant system for smart glasses

A language assistant system for smart glasses Shi Fan Zhang, Kin Hong Wong The Dept. of Computer Science and Engineering The Chinese University of Hong Kong Shatin, Hong Kong Email:khwong.cse.cuhk.edu.hk ICNGC2017b A language assistant system for smart glasses v.7d 1 Overview Introduction and the idea

Background Theory The algorithm Affine transform Quadrangle detection Affine Transformation Optical Character Recognition OCR Finger location Pose estimation Results Conclusions and discussions ICNGC2017b A language assistant system for smart glasses v.7d 2 Introduction Developed a language assistant system

May be used for people using wearable smart glasses. The user points to a word he needs language translation, our system will recognize the word And report to the user using sound or images. Tested with satisfactory results. ICNGC2017b A language assistant system for smart glasses v.7d 3 Background Smart cameras are becoming very popular The Hong Kong Police Force is also running trials on "Body Worn Video Camera" for evidence capturing and recording [2]. Some gadgets even equipped with Global Positioning System (GPS), High Density camera, Head-up-display (HUD) and smartphone connectivity.

Personal communication capabilities are also included in some systems, for example, Golden-i [4] features a wearable computer at work, and allows users to send/receive email, or browse webpages or files. The Google Glass is also an example. It may also handle face recognition and computer vision tasks. Some people are working in this direction and exploring new ways of using it [5]. ICNGC2017b A language assistant system for smart glasses v.7d 4 Theory and Methodology Our system will help you in Language translating First, the rectangular image of the paper with the text should be detected. The paper may not be perpendicular to the users view, the text may be geometrically distorted. To fix, we recertify the image so the texts will appear normal.

Four corners of the rectangular image are recognized and then affine transformation is applied to rectify the image. Then OCR will be carried out. ICNGC2017b A language assistant system for smart glasses v.7d 5 The algorithm Locate the quadrangle which is the boundary of the paper. Since the 3D geometry of the paper is known (assume A4 size paper), so using a pose estimation technique we can find the pose of the paper. Another approach is to use affine transform to rectify the image to the normal pose. The next step is to use a finger pointing algorithm to find where the user wants the text to be located on the paper. So we should know which part of the image contains the text to be translated. Use the rectified image as the input of the OCR system.

ICNGC2017b A language assistant system for smart glasses v.7d 6 Quadrangle detection K. K. Lee [6] combines Randomized Hough Transform and Qcorners to facilitate multiple-quadrilateral detection. A similar idea can be applied to rectangle corner detection. Our approach is based on the method of Hough transform. First all lines inside the image are selected, our target is to find the four corners that constitute the quadrilateral of the paper. In our processing cycle, first we pick two detected lines and calculate their intersection point. If the intersection point is not out of the boundary, one score is voted for this point. We repeat the above for all line pairs, and then four points with the highest scores are the four points of the quadrangle. ICNGC2017b A language assistant system for smart glasses v.7d

7 Demo https://youtu.be/aJd4BenNkTA ICNGC2017b A language assistant system for smart glasses v.7d 8 Homography transform Three conditions under which Optical Character Recognition (OCR) has excellent performance: The text region is normal to the principal axis of the camera lens. The text is not skewed. The text is clear and distinct from the background.

Therefore, affine transformation is applied to rectify the frames so that the text lines are not geometrically distorted. OpenCV [7] provides several functions to rotate the input frames and are used in our project. ICNGC2017b A language assistant system for smart glasses v.7d 9 Homography transform Where X-Y are the coordinates to be calculated in the second reference system, given coordinates x-y in the first reference system in function of 8 transformation parameters a, b, c, d, e, f, g, h. ICNGC2017b A language assistant system for smart glasses v.7d

http://www.corrmap.com/features/homography_transformation.php 10 homography transform to rectify the text Input: Index of the nearest text region Output: The recognized string of text ICNGC2017b A language assistant system for smart glasses v.7d 11 Optical Character Recognition OCR Use Tesseract for our OCR task. Tesseract [8] Achieves high accuracy when dealing with printed English characters. Since our users may be Chinese, and we found that for the Chinese text, Tesseract performs poorly if all

parameters are set to be default values. Often one Chinese character is split into two so it is obvious that Tesseract is difficult to recognize those characters with radicals (parts of a Chinese character). To fix the problem, some parameters (for instance, set psm=6) are adjusted to inform Tesseract that the targets are of the same shape. An example of successfully Chinese text recognition is shown in Fig. 2. ICNGC2017b A language assistant system for smart glasses v.7d 12 OCR test We performed the experiments using (1) default setting and (2) adjusted parameter setting; the results are shown in Table 1 and 2, respectively. From the results, it can be found that Tesseract recognizes Chinese characters precisely with adjusted parameters, regardless whether the characters are printed in standard format or

some strokes are eroded to some extent. It should be mentioned that Tesseract can be trained manually. Although the official dataset contains huge chunks of trained data, it may fail to completely recognize all the characters without mistake. However, related documents declare that official dataset cannot be revised. Manual training the Tesseract needs to start everything from scratch, which can be a hard task. So it may be the next target we will achieve in future. ICNGC2017b A language assistant system for smart glasses v.7d 13 Test result Table 1: Tesseract recognizes Chinese characters with default parameters High quality characters Low quality

characters Number of all characters Number of recognized characters Overall accuracy 75 73 97.3% 264

180 68.2% Table 2: Tesseract recognizes Chinese characters with adjusted parameters High quality characters Low quality characters Number of all characters Number of recognized characters

Overall accuracy 75 74 98.7% 264 262 99.2% ICNGC2017b A language assistant system for smart glasses v.7d 14

Finger location The feature of human hand includes: Five fingers. Each finger is a convex polygon The space between two adjacent fingers forms a hollow region. First, the background is removed,

Our system will perform image thresholding, contour and convex hull finding. https://youtu.be/qsNmY7OW5_s The contour with maximum area (target contour) should form the image of our hand. If we calculate the distance between points on the contour and the convex hull, several local maxima can be found. They correspond to the hollow region between adjacent fingers. We can select the top point on the target contour and it is the fingertip of the finger. A small image area above the fingertip which contains the target word will be selected. ICNGC2017b A language assistant system for smart glasses v.7d 15 Objective Our proposal Perspective-Four-Point Kalman Filter Experiments & ResultsConclusion

Perspective-Four-Point Suppose we have four model points defined as t=0: Their image projection points can be detected as: ICNGC2017b A language assistant system for smart glasses v.7d 16 Pose Estimation for future development Pose estimation using 4 points. A video demonstration of our implementation can be found at

https://youtu.be/1vMNN4Gbty8 https://www.youtube.com/watch?v=diN 565iTPdM ICNGC2017b A language assistant system for smart glasses v.7d 17 Results and Demonstration We integrated all modules The OCR module can look up its meaning in the dictionary and show the result on screen. As shown in Fig.4, the user points to a word https://www.youtube.com/watch?v=c epRpxxQbDs (right window), OCR is ued for recognition. The result is the text of the target image. Then, the recognized text is shown on

screen (left window). A video demonstration of this process running in real-time can be found at Our new test https://youtu.be/ZVW7IUBM0R8 ICNGC2017b A language assistant system for smart glasses v.7d 18 Conclusions and Discussions From the tests, our idea of building a language assistant for smart glass users is feasible. The camera can detect the quadrangle of the paper. Affine transform is applied successful to rectify the text of the page. A finger pointing detection system can be used to find the text to be translated. An Optical Character Recognition (OCR) system translates the word into text.

The meaning of the text can be then be found by an electronic dictionary. The whole process is in real-time and can be very useful for students or travelers where instant language translation of foreign words is necessary. ICNGC2017b A language assistant system for smart glasses v.7d 19 Q&A Thank you ICNGC2017b A language assistant system for smart glasses v.7d 20

Recently Viewed Presentations

  • Where does the universe come from?

    Where does the universe come from?

    TITLE: Where does the universe come from? Lo: To know Catholic beliefs about the origin of our universe. To understand what Ex Nhilo means. To be able to explain the concept of creation as described by St. Augustine
  • Western Front Battles - yardvmc

    Western Front Battles - yardvmc

    The battle field was covered with mud and craters due to torrential rains and previous attacks which had destroyed drainage systems The soldiers, animals, arms and provisions were all stuck in the mud More than 16 000 Canadian casualties The...
  • Folie 1 - University of Warwick

    Folie 1 - University of Warwick

    Conclusion Primordialist view Key assumptions Nations are real process National sentiment is no construct It is rooted in a feeling of kinship Nations are eternal or at least go back to ancient times Outline What is a nation? Classical definitions...
  • Nursing 3703 Pharmacology in Nursing

    Nursing 3703 Pharmacology in Nursing

    More free drug (? Effect) Requires higher dose as volume increases. Reduced drug excretion (longer half-life) Metabolism. Immature liver with term newborns and children. Matures rapidly after about 1 month. Generally complete at 1 year. Increased drug sensitivity. Drug effects...
  • Sensation & Perception, 2e

    Sensation & Perception, 2e

    Figure 8.11 A conjunction search with a binding problem Visual Search Feature integration theory: Anne Treisman's theory of visual attention, which holds that a limited set of basic features can be processed in parallel preattentively, but that other properties, including...
  • Fair Mountain Acres (FMA) History and Facts July

    Fair Mountain Acres (FMA) History and Facts July

    8/4/2015. Board of Directors/Officers. Subsequent to the Restrictions being filed in 1993, By-Laws being established and the NCPCA being adopted for managing FMA, the first slate of officers for the Fair Mountain Acres Property Association was established in 1999, initially...
  • Pop Art - Dobele

    Pop Art - Dobele

    Arts were inspired by magazines, pop music, television, films, and advertisements. Andy Warhol often wore a white wig in public to be different from everyone else. Mickey Mouse was part of the popular culture of the 1950s and 60s. Andy...
  • Physics 121C Mechanics

    Physics 121C Mechanics

    An air-track glider attached to a spring. The glider is pulled a distance A from its rest position and released. Fig. (b) shows a graph of the motion of the glider, as measured each 1/20 of a second. The graphs...