Image Search and Text Search


What is Image Search?
Text line segmentation and creating DSC files
Executing image search
Bucket Search
Managing search results
Text Search 

What is Image Search?

When deciphering handwritten historical materials, the reader may come across character strings that are difficult to decipher owing to smudged handwriting.
In this case, processing that infers the contents through comparison with character strings at other locations is effective, but such processing is very work intensive.
Moreover, the determination of the locations and frequency of occurrence of a given word in historical materials requires one to go over the entire material, which by itself involves an enormous amount of work. f

The image search function was created to reduce the amount of work required for deciphering historical materials.

Text line segmentation and creating DSC files

To execute image search, it is necessary to segmentate the "lines" in the images as pre-processing, and to create a DSC file for storing this information.

One such DSC file is allocated to each image and holds the "line" image of that image.

For example, to execute image search for the sample1.jpg image, the sample1.dsc file must be created.
Further, to create a DSC file, a Segfo file is normally created. (This Segfo file is not used only for search, but also to specify which part of the image to actually consider as one line during transcription.

This DSC file can be created by using a group of external tools called Segfo-DSC tools.
At present, these Segfo-DSC tools include SegfoMaker, Segfo2dsc, etc., developed by Kengo Terasawa, as well as the revised version of SegfoMaker, SegfoMaker revised version developed by Tsukushi Shimizu in answer to a request by the Department of Humanistic Informatics of Kyoto University. For the usage methods of these tools, refer to their respective manuals. (Note, however, that while the method is primitive in the case of SMART-GS released on September 2008, these operations are possible with SMART-GS. For details, refer to the Appendix.)

There tools can be acquired from the HCP site (http:.//www.shayashi.jp/xoops/html as of October 4, 2008).

Segfo files are of two types, table format files and xml format files, and in the case of SMART-GS, the xml format defined by Hajime Inomura is used.
SegfoMaker and SegfoMaker revised version are output as segfo files in the table format, so caution is required. To change a segfo file in the table format to the xml format, use the tbl2xml.jar tool distributed along with SegfoMaker revised version. In the case of SMART-GS, use the procedure described below. This procedure is slightly complicated, but an example is attached.

To avoid deleting the example, start the operation only after first checking the work flow at steps 1-5.

1. In the HCPproject site, perform member registration and then acquire HDIMS (Segfo2Dsc) and SegfoMaker revised version and install them.

2. Using SegfoMaker (HDIMS-attached)  and SegfoMaker revised version , create a segfo file in the table format. For example, from sample.jpg of the smart_gs/img image folder of SMART-GS, a file with an extension such as sample.segfo can be created.
Note: The distributed SMART-GS includes sample1.jpg, the same image used in the manual, in c://smat-gs-ng/smart_gs/imges/sample/. The following procedure is described using this image as an example. The sample1.xml and sample1.dsc files created with the procedure described below are included already in the downloaded SMART-GS, so they can just be imitated.

3. Start up tbl2xml.jar included in the SegfoMaker revised version download file, and create a segfo file in the xml format by dragging & dropping the segfo file in the table format created in step 2. For example, sample1.xml can be created.

4. Start up Segfo2dsc.exe attached to the HDIMS download file, and create the sample1.dsc file by dragging & dropping segfo file sample1.segfo.

5. If the original image is sample/sample.jpg under the img folder of SMART-GS (as in the example of the download file), place sample.xml in the sample/sample1.xml folder directly under the dsc folder of SMART-GS, and place sample1.dsc in dsc/sample/sample1.xml under the dsc folder of SMART-GS.
Since the same sample exists in the SMART-GS download, instead of creating sample1.dsc, etc., you can first check this example and then create segfo or dsc files for your own image files.

After performing the above, image search can be executed by loading the directory in which the DSC file created with the Segfo-DSC tools are placed to SMART-GS.

To make SMART-G aware of the folder in which the DSC file is placed, first select [set directory path] from [Preference] on the menu bar to open the [Directory Setting] dialog box.  

 

Here, by setting for the [dsc] item the path of the directory in which the DSC file has been placed, SMART-GS is made aware of the DSC file.

For the directory path setting procedure, refer also to Initial Settings.



Executing image search

  1. First, specify whether the sentences of historical material to be searched run vertically or horizontally.

    To do this, open [Text Type] from [Preference] on the menu bar, and select either [Vertical] or [Horizontal].

    (Once this setting is made, it does not need to be done at every search.)

  2. Once the vertical/horizontal setting has been made, specify the image (query image) to be searched among the images.

    To specify the query image, use the tool for image markup described in section 2. Workbench
    Image markup can be done using one of three methods, namely Rectangle, Marker, or Lasso.
    In this example, we will select Rectangle and specify the "feet" character string in the image as the query image.

    Press the [Rectangle] button on the toolbar and drag the cursor so as to enclose the

    "feet" character string in the image.
  3. Select the marked up query image (whose rectangle has changed to red), and press the [ImageSearch] button at the center of the top level of the toolbar.

    The Search Dialog dialog box shown below is displayed as a result.
     

    In this dialog box, specify the range of the image to be searched.

    If [All Spread] is selected, all the images in the directories and sub-directories under root are selected for search.

    If [Current Directory] is selected, all the spreads included in the folder that includes the image currently being edited are selected for search.

    If [Select Spreads] is specified, the images to be searched can be specified concretely by number. The image number is the number displayed to the left of each image in the image tree.

    For example, to specify images No. 2 and No. 4 in addition to all images from No. 6 to No. 10, input "2, 4, 6-10" as half-byte characters in the blank part directly under [Select].

    Here, let's input No. 0 to limit the search range to the image currently being edited.

    * Note that the spreads to be used for image search are restricted to the spreads where DSC files have been created. @

    Image for which a DSC file has been created have the "(SEARCH)" character string suffixed to the image name in the image tree in the left part of the screen.  

    Reversely, no DSC file has been created in the case of images that do not have "(SEARCH)" suffixed to the image name.

    Looking at the following figure, one can see that a DSC file has been created for "sample1" under the sample folder.



  4. The query image can be edited prior to search by pressing the [Edit] button under the Search Dialog dialog box.

    Simple drawing tools such as freeline, point drawing, an eraser, and cut and paste are available.

    If the character string to be searched is smudged, or if it has been partially scraped off, and that part of the character string needs to be accentuated, the query image can be edited using these edit functions to facilitate the obtainment of good search results.

  5. Once the image range to be searched has been specified and the query image has been edited, press the [Search] button located toward the left under [Search Dialog] to execute the search.

    After a little while, the search results are displayed as shown below.
     

    The character strings found through the search can be displayed magnified by placing the mouse cursor over the desired image in the panel.

    The "feet" character string is displayed four times in this spread in addition to the query images. They are all search hits.

    As shown in the above figure, two check boxes, [Yes] and [No], are placed next to the image of each search result.
    The use of these check boxes is described in Bucket Search later on in this document, so refer to that.

  6. When an image in the search results is clicked, the image included in this image can be opened in a new window.

    At this time, the item enclosed in a thick red rectangle is the character string found through the search.


  7. The display method for the search results can be changed by selecting [DisplayMode] from the [Option] menu on the menu bar at the top of the [Search Result] dialog box.



    [Context Mode] ... Displays the text strings found through the search as a large image that includes also surrounding parts.

    In the following figure, the part highlighted in light gray is the character string found through the search.


    [Line Mode] ... Displays the character strings found through the search as a large image of medium size that includes also surrounding parts.

    Like in the Context Mode, the part highlighted in light gray is the character string found through the search.

    As shown in the following figure, the search results are displayed as a vertical list.


    [Segment Mode] ... Displays only the image of the search string found through the search (default). The search results are displayed lined up in a chessboard configuration as shown below.

Bucket search

In a search executed as shown in the above example, the "feet" character strings included in the images could all be found through the search.

However, there is no guarantee that all the searched for character strings can be found without fail.

Moreover, character strings not found in the query image may be found using a character string in a different location as the query image.

To prevent such incomplete search results, a function for reusing the search results as request images by placing them in a "bucket" is provided.

  1. Two check boxes, [Yes] and [No], are located in the left part of the panels containing the images found through the search.
     

    By selecting the [Yes] check box, the [Bucket] dialog box is displayed as shown in the following figure, and the images whose [Yes] check box has been selected are added to the bucket.



    Any number of search results can be placed in this bucket. Moreover, each image can be edited here in the same way as in the Search Dialog dialog box.

    Further, if the [No] check box is selected here, that character string will not appear in subsequent searches. This avoids wasteful verification of images.

  2. If the [Search] button located in the bottom part of the Bucket dialog box is pressed, search using all the query images in the bucket is executed.

    The following figure shows the results of the search executed using the bucket.

    "NEW" displayed in red characters at the top left of each panel indicates a character string for which there was no hit in the previous search.


    Panels that have the  mark are query images included in the bucket.
    Panels that do not have any mark indicate character strings that have been hit in the previous search as well.

  3. A search can be executed again by adding to the bucket the search results obtained using the bucket.

    By repeating this, incomplete searches can be avoided and search accuracy can be improved.

Managing search results

In SMART-GS, the history is saved automatically each time a search is performed. This history can be seen with Reasoning Web.

Following execution of a search, the queries folder can be seen by looking at Desktop View of Reasoning Web.

The search history, including the date and time, is saved in this folder.



Moreover, when a search is executed using a bucket, that bucket can be saved with a name and a memo.

After a bucket is created from the search results, the following dialog box is displayed when the [Register] button located in the lower part of the bucket is pressed.


After being given a suitable name and memo, the bucket is saved to Reasoning Web.

Similarly to the search history, the saved bucket can be checked from Desktop View of Reasoning Web.

The saved bucket is displayed on the same level as the queries folder to which the search history has been saved.

Here, the bucket has been named "Search Sample" as an example, and an icon named "Search Sample" can be seen displayed next to the queries folder.

Double-click this icon to open the saved bucket.


Text Search

In SMART-GS, various types of "text" are provided, including Transcription for inputting transcriptions of handwritten Annotation for attaching comments to a historical material, Translation for storing a document that is the translation of a transcribed document, and Text for general texts.

The Text Search function is provided for executing regular character string search for these texts.

First, press the [TextSearch] button on the tool bar to display the following dialog box.


Input the text to be searched in the [query text:] field and select the image range to be searched, the type of text to be searched, etc.

Since Transcription, Translation, and Annotation are each related to specific images, when performing a search for these types of text, the target image range must be specified.

The range specification method is the same as for image search, so refer to the explanation of image search.

As an example, let us input the "sample" character string as the search target.

Here, by selecting the [Case Sensitive] check box, a distinction is made between uppercase and lowercase letters during the search.

Press the [Search] button to display the list of search results.
 

As shown in the above figure, the character strings found during the search are displayed highlighted in yellow.

By double-clicking a label displaying a search result, the editing screen for the text that includes the search result is jumped to.



previous top of manual next