When deciphering handwritten historical materials, the
reader may come across character strings that are difficult to decipher owing to
smudged handwriting.
In this case, processing that infers the contents
through comparison with character strings at other locations is effective, but
such processing is very work intensive.
Moreover, the determination of the locations and frequency of occurrence
of a given word in historical materials requires one to go
over the entire material, which by itself involves an enormous amount of
work. f
The image search function was created to reduce the amount of work required for deciphering historical materials.
To execute image search, it is necessary to segmentate the "lines" in the images as pre-processing, and to create a DSC file for storing this information.
One such DSC file is allocated to each image and holds the "line" image of that image.
For example, to
execute image search for the sample1.jpg image, the sample1.dsc
file must be created.
Further, to create a DSC file, a Segfo file is normally created. (This Segfo
file is not used only for search, but also to specify which part of the image to
actually consider as one line during transcription.
This DSC file
can be created by using a group of external tools called Segfo-DSC tools.
At present, these Segfo-DSC tools
include SegfoMaker, Segfo2dsc, etc., developed by Kengo Terasawa, as well as the
revised version of SegfoMaker, SegfoMaker revised version developed by Tsukushi
Shimizu in answer to a request by the Department of Humanistic Informatics of
Kyoto University. For the usage methods of these tools, refer to their
respective manuals. (Note, however, that while the method is primitive in the
case of SMART-GS released on September 2008, these operations are possible with
SMART-GS. For details, refer to the Appendix.)
There tools can be
acquired from the HCP site (http:.//www.shayashi.jp/xoops/html as of October 4,
2008).
Segfo files are of two types, table format files and xml
format files, and in the case of SMART-GS, the xml format defined by Hajime
Inomura is used.
SegfoMaker and SegfoMaker revised version are output as
segfo files in the table format, so caution is required. To change a segfo file
in the table format to the xml format, use the tbl2xml.jar tool distributed
along with SegfoMaker revised version. In the case of SMART-GS, use the
procedure described below. This procedure is slightly complicated, but an
example is attached.
To avoid deleting the example, start
the operation only after first checking the work flow at steps 1-5.
1. In the HCPproject site, perform member registration and then acquire HDIMS (Segfo2Dsc) and SegfoMaker revised version and install them.
2. Using SegfoMaker (HDIMS-attached) and
SegfoMaker revised version , create a segfo file in the table format. For
example, from sample.jpg of the smart_gs/img image folder of SMART-GS, a file
with an extension such as sample.segfo can be created.
Note: The distributed SMART-GS includes sample1.jpg, the same
image used in the manual, in c://smat-gs-ng/smart_gs/imges/sample/. The
following procedure is described using this image as an example. The sample1.xml
and sample1.dsc files created with the procedure described below are included
already in the downloaded SMART-GS, so they can just be imitated.
3. Start up tbl2xml.jar included in the SegfoMaker revised version download file, and create a segfo file in the xml format by dragging & dropping the segfo file in the table format created in step 2. For example, sample1.xml can be created.
4. Start up Segfo2dsc.exe attached to the HDIMS download file, and create the sample1.dsc file by dragging & dropping segfo file sample1.segfo.
5. If the original image is sample/sample.jpg under the
img folder of SMART-GS (as in the example of the download file), place
sample.xml in the sample/sample1.xml folder directly under the dsc folder of
SMART-GS, and place sample1.dsc in dsc/sample/sample1.xml under the dsc folder
of SMART-GS.
Since the same sample exists in the SMART-GS download, instead
of creating sample1.dsc, etc., you can first check this
example and then create segfo or dsc files for your own image files.
After performing the above, image search can be executed by loading the directory in which the DSC file created with the Segfo-DSC tools are placed to SMART-GS.
To make SMART-G aware of the folder in which the DSC file is placed, first select [set directory path] from [Preference] on the menu bar to open the [Directory Setting] dialog box.
Here, by
setting for the [dsc] item the path of the directory in which the DSC file has
been placed, SMART-GS is made aware of the DSC file.
For the directory path
setting procedure, refer also to Initial Settings.
First, specify whether the sentences of historical material to be searched run vertically or horizontally.
To do this, open [Text Type] from [Preference] on the menu bar, and select either [Vertical] or [Horizontal].
(Once this
setting is made, it does not need to be done at every search.)
Once the vertical/horizontal setting has been made, specify the image (query image) to be searched among the images.
To specify
the query image, use the tool for image markup described in section 2. Workbench.
Image markup
can be
done using one of three methods, namely Rectangle, Marker, or Lasso.
In
this example, we will select Rectangle and specify the "feet" character string
in the image as the query image.
Select the marked up query image (whose rectangle has changed to red), and press the [ImageSearch] button at the center of the top level of the toolbar.
The Search Dialog dialog box shown below is displayed as a result.In this dialog box, specify the range of the image to be searched.
If [All Spread] is selected, all the images in the directories and sub-directories under root are selected for search.
If [Current Directory] is selected, all the spreads included in the folder that includes the image currently being edited are selected for search.
If [Select Spreads] is specified, the images to be searched can be specified concretely by number. The image number is the number displayed to the left of each image in the image tree.
For example,
to specify images No. 2 and No. 4 in addition to all images from No. 6 to No.
10, input "2, 4, 6-10" as half-byte characters in the blank part directly
under [Select].
Here, let's input No. 0 to limit the search range to the image currently being edited.
* Note that the spreads to be used for image search are restricted to the spreads where DSC files have been created. @
Image for which a DSC file has been created have the "(SEARCH)" character string suffixed to the image name in the image tree in the left part of the screen.
Reversely, no DSC file has been created in the case of images that do not have "(SEARCH)" suffixed to the image name.
Looking at the following figure,
one can see that a DSC file has been created for "sample1" under the sample folder.
Simple drawing tools such as freeline, point drawing, an eraser, and cut and paste are available.
If the character string to be searched is smudged, or if it has been partially scraped off, and that part of the character string needs to be accentuated, the query image can be edited using these edit functions to facilitate the obtainment of good search results.Once the image range to be searched has been specified and the query image has been edited, press the [Search] button located toward the left under [Search Dialog] to execute the search.
After a little while, the search results are displayed as shown below.The character strings found through the search can be displayed magnified by placing the mouse cursor over the desired image in the panel.
The "feet" character string is displayed four times in this spread in addition to the query images. They are all search hits.
As shown in the above figure, two check boxes, [Yes] and [No], are placed next to the image of each search result.When an image in the search results is clicked, the image included in this image can be opened in a new window.
At this time,
the item enclosed in a thick red rectangle is the character string found
through the search.
[Context Mode] ... Displays the text strings found through the search as a large image that includes also surrounding parts.
In the following figure, the part highlighted in light gray is the character string found through the search.[Line Mode] ... Displays the character strings found through the search as a large image of medium size that includes also surrounding parts.
Like in the Context Mode, the part highlighted in light gray is the character string found through the search.
As shown in
the following figure, the search results are displayed as a vertical list.
[Segment
Mode] ... Displays only the image of the search string found through the
search (default). The search results are displayed lined up in a chessboard
configuration as shown below.
In a search executed as shown in the above example, the "feet" character strings included in the images could all be found through the search.
However, there is no guarantee that all the searched for character strings can be found without fail.
Moreover, character strings not found in the query image may be found using a character string in a different location as the query image.
To prevent such incomplete search results, a function for reusing the search results as request images by placing them in a "bucket" is provided.
By selecting the [Yes] check box, the [Bucket] dialog box is displayed as shown in the following figure, and the images whose [Yes] check box has been selected are added to the bucket.
Any number of search results can be placed in this bucket. Moreover, each image can be edited here in the same way as in the Search Dialog dialog box.
Further, if the [No] check box is selected here, that character string will not appear in subsequent searches. This avoids wasteful verification of images.If the [Search] button
located in the bottom part of the Bucket dialog box is pressed, search using
all the query images in the bucket is executed.
The following figure
shows the results of the search executed using the bucket.
"NEW" displayed in red
characters at the top left of each panel indicates a character string for
which there was no hit in the previous search.
Panels
that have the
mark
are query images included in the bucket.
Panels
that do not have any mark indicate character strings that have been hit in the
previous search as well.
A
search can be executed again by adding to the bucket the search results
obtained using the bucket.
In SMART-GS, the history
is saved automatically each time a search is performed. This history can be seen
with Reasoning Web.
Following execution of a
search, the queries folder can be
seen by looking at Desktop View of
Reasoning Web.
The
search history, including the date and time, is saved in this folder.
Moreover, when a search is
executed using a bucket, that bucket can be saved with a name and a memo.
After
a bucket is created from the search results, the following dialog box is
displayed when the [Register] button located in the lower part of the bucket is
pressed.
After being given a
suitable name and memo, the bucket is saved to Reasoning Web.
Similarly to the search
history, the saved bucket can be checked from Desktop View of Reasoning Web.
The saved bucket is
displayed on the same level as the queries folder to which the search
history has been saved.
Here,
the bucket has been named "Search Sample" as an example, and an icon named
"Search Sample" can be seen displayed next to the queries folder.
Double-click this icon to open the saved bucket.
In SMART-GS, various types
of "text" are provided, including Transcription for inputting
transcriptions of handwritten Annotation for attaching comments to a
historical material, Translation for
storing a document that is the translation of a transcribed document, and Text for general texts.
The Text Search function is provided for
executing regular character string search for these texts.
First,
press the [TextSearch] button on the tool bar to display the following dialog
box.
Input the text to be
searched in the [query text:] field and select the image range to be searched,
the type of text to be searched, etc.
Since Transcription, Translation, and Annotation are each related to specific
images, when performing a search for these types of text, the target image range
must be specified.
The range specification
method is the same as for image search, so refer to the explanation of image
search.
As an example, let us
input the "sample" character string as the search target.
Here, by selecting the
[Case Sensitive] check box, a distinction is made between uppercase and
lowercase letters during the search.
Press
the [Search] button to display the list of search results.
As shown in the above
figure, the character strings found during the search are displayed highlighted
in yellow.
By double-clicking a label
displaying a search result, the editing screen for the text that includes the
search result is jumped to.