User Tests 3

•May 11, 2012 • 2 Comments

In the weeks of the 2nd and 9th of April, we conducted our third user test. While this blog post may be rather belated, it hasn’t been forgotten! This user test used an adaptation of the prototype application we used during the previous user test, and was carried out by 14 users.

As was the case in our previous user tests, we started with a pretest questionnaire, followed by the briefing. We then started the test for which we made several scenarios. Then we ended the test with the posttest questionnaire and the SUS.
Continue reading ‘User Tests 3’

Poster

•April 27, 2012 • Leave a Comment

Our thesis poster can be found here: Thesis Poster.

Fourth Thesis Chapter

•April 1, 2012 • Leave a Comment

Below, you can find our fourth thesis chapter, written in Dutch:
Verslag2_Annelies_Matthias.pdf

Second Thesis Presentation

•March 30, 2012 • 1 Comment

The slides used during our second thesis presentation can be found here:

Note that the slides by themselves may be too little to grasp the full presentation. They were used as a visual aid, but do not contain the full details of the presentation.

EDIT: Since we presented our progress in style, we figured we should add a complementary picture.

User Tests 2

•March 3, 2012 • 1 Comment

Last week, we conducted 14 user tests with our first digital prototype. Our prototype was an Android implementation on an HTC Desire HD. In the following two tables, you can see their smartphone usage split by gender and age.

Male Female
Android phone 2 1
Other smartphone/Tablet 3 1
none 2 5

20-29 30-39 40-49 50-59 60-69
Android phone 2 1 0 0 0
Other smartphone/Tablet 2 1 0 1 0
none 0 2 1 3 1

As was the case in our previous user tests, we started with a pretest questionnaire, followed by the briefing. We then started the test for which we made several scenarios. Then we ended the test with the posttest questionnaire and the SUS.
Continue reading ‘User Tests 2’

Implementation: The Trilogy – III, The Buds

•February 26, 2012 • Leave a Comment

The implementation on our phone consists of three activities and three important helper classes. We’ll start with describing the activities and then continue with the helper classes, hoping to give you a good overview of how the application generally works.

Our three activities are the main activity, an activity responsible for the “option menu” and an activity responsible for the “about screen”. Apart from defining the layouts, these activities are trivial.

Our three important helper classes are CustomCameraView, responsible for the video preview and for knowing when to send requests to our server (with the corresponding frame); OverlayView, responsible for overlaying translations, creating the augmented view; and ImageTracker, responsible for tracking a textbox, when the user moves the phone.
Next we will try to explain how these classes work together. When the phone is held relatively stable, a request to the server is made, asking for text regions. When the answer arrives, an ImageTracker is started for every box that was returned. No new requests are send to the server as long as the phone is still held stable, because this likely means that the user is still looking at the same sign. Instead, the tracker will track the boxes to make up for minor changes in the video feed, due to the impossibility of holding a phone perfectly stable. For the tracking, we use the OpenCV library. The tracker notifies the OverlayView of changes in the box positions, so that it can update these positions on the screen. Meanwhile, the translations are requested from the server. Upon arrival, the OverlayView is notified and it fills the empty boxes with the translations.

Errata Regarding “The Trilogy Vol. I & II”

•February 6, 2012 • 2 Comments

It has been a while since our previous blog post. The exam period has come and gone, but this doesn’t mean we’ve been completely idle in the meantime. We are currently still working on tracking text, and thus the release date of the third and final chapter of our trilogy is yet to be determined. For now, we present errata on the previously published first and second volumes.

During our own tests while working on the implementation, we noticed that the time delay experienced by a user was much too long. Before a frame is sent to the server, it requires some processing on the phone (such as turning it into a correctly encoded image and Base64 encoding the image). This took roughly 5 seconds. It then took roughly 4.5 seconds to get a response from the server, with the bounding box information. In total, this is a 9.5 second delay between the start of “let’s analyse this frame” and the first feedback a user would receive. Requesting the translations took another 3.5 seconds. We had to take a look and determine if we could in any way improve these times. Changes were made to both the server code that deals with the received requests, as well as the algorithm used to locate the text in the images.

Implementation: The Trilogy – I, The Roots (the server code)
In the first part we mentioned that we used the Base64 encoded string of the byte array representing the image, to send the image data to the server. Sten, our thesis mentor, said that Base64 is rather cumbersome, however. He was indeed right. Although the server itself did not have trouble decoding the Base64 string (takes about 7 ms), encoding the data on the mobile phone was a time critical step (taking up to around 2 seconds). To improve this, we made Tomcat automatically parse multipart/form-data POST requests (without storing the received files as actual files, as to counter IO delay). This removes the need to Base64 encode the data. We can simply use the in memory byte array and include it in a multipart/form-data POST request on the phone. This change lowered the time for the bounding boxes request (incl. the processing change on the phone) to 7 seconds, a speed gain of 2.5 seconds!

A second minor improvement we made was in regards to the translation request. If the OCR results of a certain region are empty, there is no need to request a translation either. While we so far have not noticed a speed improvement because of this; likely due to the fact that we only rarely end up with no OCR results at all; we do believe it is good to include once we use a paying service.

Implementation: The Trilogy – II, The Core (core algorithms)
The C++ code we based our implementation on uses graphs to find the connected components (used to determine which parts are text). We saw that with the graph library we used, this was very time consuming (and memory intensive). When looking for faster graph implementations that could be used to find connected components, we came across Fast connected component labeling algorithm using a divide and conquer technique, J. Park et al., 2000. This paper describes a divide and conquer technique to find connected components, without using graphs. We implemented this algorithm with success. The time we spend waiting for the bounding box information is now roughly 5 seconds, which is 2 seconds better than before (after the POST request code optimisation)! We didn’t pay attention to the exact numbers concerning memory improvement on our server, but we did see a huge improvement.

The total improvement we managed to obtain was thus 4.5 seconds!