When the SIETS storage is added, the SIETS storage configuration XML
file is created automatically with the default SIETS storage configuration parameter values set for each SIETS storage instance.
Configuring the SIETS storage is performed on an SIETS storage instance level, as each instance can be located on different hardware and contain different amounts of data.
The SIETS storage configuration file includes parameters for SIETS storage users, and indexing options, and other parameters for performance tuning.
While you may want to leave the performance tuning parameter default values, as by default they are optimally adjusted for the performance of most common size and type of data and could be more complicate to understand, there are parameters that are specific only to your system and must be configured yourself. These are users, user passwords, optional dictionary settings, and so on.
If you, however, feel that a performance tuning is necessary, but you are not sure about which parameters must be changed, contact and get assistance from the SIETS support team.
For information on contacting the SIETS support team, see Getting Help.
SIETS storage configuration is performed in textual mode. After you select
the
SIETS storage configuration control, the SIETS storage configuration XML
file is loaded in SIETS Enterprise Manager. Configuring SIETS storage means editing configuration parameters in the tags of the XML
file.
If any of the configuration parameter tags are deleted from the SIETS storage configuration XML
file, the default value of the parameter is used when running the SIETS storage.
The changes made to the SIETS storage configuration file become effective after the instance is stopped, if running, and started again with the new configuration.
For more information on stopping and starting SIETS storage instances, see Running SIETS Storages.
As opening, editing, saving, closing the SIETS storage configuration file is common for editing all configuration parameters, it is described in a separate section. Each parameter or parameter group is described in a separate section, which includes its representation in the SIETS storage configuration file and describes what it means.
This section contains the following topics:
To open, edit, save, and close the SIETS storage configuration file from SIETS Enterprise Manager, proceed as follows:
After you have logged in SIETS Enterprise Manager, in Main Menu, select SIETS Storages.
In the SIETS Storages window, in the Name column, select the SIETS storage to be configured.
To configure an instance, select Configuration below the instance.
Select Instance Configuration.
To open the configuration file for editing, click Edit.
Edit the configuration parameter values in tags as described in the following sections.
To save the changes made, click Save.
To discard the changes and close the window, click Cancel.
This section described SIETS storage configuration file parameters in a separate section. Each section contains the parameter and its child parameter description and example of how it they appear in the SIETS storage configuration file.
This section contains the following topics:
The following table describes general SIETS storage configuration parameters:
Second level element |
Third level element |
Description |
Default |
||||||
---|---|---|---|---|---|---|---|---|---|
<general> |
General information about the SIETS storage. |
||||||||
<storage> |
SIETS storage name, entered when adding a new SIETS storage. |
||||||||
<port> |
SIETS storage port, entered when adding a new SIETS storage. |
||||||||
<max_resultset> |
Maximum number of documents returned to the result set. |
1000 |
|||||||
<timeout> |
Function timeout period in seconds. If the command is not executed during this predefined timeout period, the command returns the error. |
60 |
|||||||
<log_path> |
Relative path according to the SIETS storage directory where all log files are stored. |
||||||||
<log_rotate> |
Number of days after which all log files are deleted to ensure that the disk is not over flooded with log information. |
60 |
|||||||
<dump> |
Information whether the dump is to be created. The following values are possible:
|
no |
Example:
<general>
<storage>Newspapers</storage>
<port>90</port>
<max_resultset>1000</max_resultset>
<timeout>60</timeout>
<log_path>./logs</log_path>
<log_rotate>60</log_rotate>
<dump>no</dump>
</general>
The following table describes user management SIETS storage configuration parameters:
Second level element |
Third level element |
Forth level element |
Description |
---|---|---|---|
<users> |
This element contains a list of SIETS storage users. |
||
<user> |
This element contains a user name and password. It is repeated for each user. |
||
<name> |
User name. |
||
<pass> |
User password. |
Example:
<users>
<user>
<name>John</name>
<pass>unbreakable_password</pass>
</user>
</users>
SIETS storage users must not be confused with SIETS Enterprise Manager user accounts. For information on the SIETS Enterprise Managers users, see Administering SIETS Enterprise Manager User Accounts.
The dictionary
element, which is a second level element, contains several parameters
and parameter groups for configuring search query defining options. This
section contains the following topics, which each describes one of the
search defining
options:
The following table describes parameter for configuring special symbols:
Third level element |
Description |
Default |
---|---|---|
<specsymbols> |
Letters and numbers are regular symbols that form words. By default, all
other symbols
like |
_ |
The following table describes parameters for configuring wildcard patterns support:
Third level element |
Forth level element |
Description |
Default |
||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
<wildcards> |
This element contains parameters for configuring wildcard patterns support. |
||||||||||||||||||||||||||||||||||||||||||
<allow> |
Information whether the wildcard patterns search is enabled. |
yes |
|||||||||||||||||||||||||||||||||||||||||
<cover_ |
When wildcard patterns are used to define a class of words to be searched, only a limited number of statistically frequent words are searched for to ensure a higher performance. This element defines the limit in percent from the sum of all words created from the wildcard pattern appearance in the SIETS storage. Example:
Search query:
All words: Number of times each word appears in the SIETS storage:
Cover factor 60% means that words in shadowed cells are searched and returned.
Note that the word |
95 |
|||||||||||||||||||||||||||||||||||||||||
<min_ |
The minimum limit of the wildcard patterns matching set from the SIETS storage vocabulary in absolute numbers.
This parameter overcomes the |
4 |
|||||||||||||||||||||||||||||||||||||||||
<max_ |
The maximum limit of the wildcard patterns matching set from the SIETS storage vocabulary in absolute numbers.
This parameter overcomes the |
16 |
The following table describes parameters for configuring stemming:
Third level element |
Forth level element |
Description |
Default |
||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
<stemming> |
This element contains parameters for configuring stemming. |
||||||||||||||||||||||||||||||||||||||||||
<allow> |
Information whether the language declinations search is enabled. |
yes |
|||||||||||||||||||||||||||||||||||||||||
<cover_ |
When language declinations are used to define a class of words to be searched, only a limited number of statistically frequent words are searched for to ensure a higher performance. This element defines the limit in percent from the sum of all words created from the language declinations appearance in the SIETS storage. Example:
Search query:
All words: Number of times each word appears in the SIETS storage:
Cover factor 80% means that only words in shadowed cells are searched and returned.
Note that the word |
95 |
|||||||||||||||||||||||||||||||||||||||||
<min_ |
The minimum limit of the language declinations matching set from the SIETS storage vocabulary in absolute numbers.
This parameter overcomes the |
4 |
|||||||||||||||||||||||||||||||||||||||||
<max_ |
The maximum limit of the language declinations matching set from the SIETS storage vocabulary in absolute numbers.
This parameter overcomes the |
16 |
If the alternatives
search is performed, the system returns a set of alternative words from
the
SIETS storage vocabulary, which are similar in spelling or has a different
language declination, for example, if you enter
bote
, then bite
are byte
are offered for searching. Note that only words from the SIETS storage are returned.
This feature can be used for fuzzy searches and for spelling error corrections.
The following table describes parameters for configuring alternatives support:
Third level element |
Forth level element |
Description |
Default |
---|---|---|---|
<alternatives> |
This element contains parameters for configuring alternatives support limits.
When searching alternative words, the |
||
<cr> |
Minimum ratio to include the alternative in the search query between the occurrence of the alternative and the occurrence of the search term. If you increase this parameter, there are less number of results returned to the result set, however performance is improved. |
2.0 |
|
<idif> |
Maximum number that indicates how much does the alternative differs from
the search term, the greater the
If you increase this parameter, there are greater number of results returned to the result set, however performance is reduced. |
3.0 |
|
<h> |
Minimum number that gives an overall estimation of the quality of the
alternative, the greater the
If you increase this parameter, there are less number of results returned to the result set, however performance is improved. |
2.5 |
The following table describes parameters for configuring ignored words options:
Third level element |
Forth level element |
Description |
Default |
---|---|---|---|
<ignore> |
This element contains parameters detecting ignored words. |
||
<word_freq> |
Ratio between all words in the SIETS storage and the word to be ignored. If this ratio for a word is less than this number, the word is added to the ignored word list. |
500 |
|
<word_len> |
Maximum length of the word to be ignored. |
5 |
Note: It is possible to include ignored words in the search by using the
+
sign in front of the ignored word. Full text index contains all
words, including ignored words. The ignored words feature is used only
for filtering out common words such as
and
, but
, is
.
The following is an example of the whole directory
element:
<dictionary>
<specsymbols>_</specsymbols
<wildcards>
<allow>yes</allow>
<cover_factor>0.95</cover_factor>
<min_expand>4</min_expand>
<max_expand>16</max_expand>
</wildcards>
<national>
<cover_factor>0.95</cover_factor>
<min_expand>4</min_expand>
<max_expand>16</max_expand>
</national>
<alternatives>
<cr>2.0</cr>
<idif>3.0</idif>
<h>2.5</h>
</alternatives>
<ignore>
<word_freq>500</word_freq>
<word_len>5</word_len>
</ignore>
</dictionary>
The following table describes SIETS storage repository configuration parameters:
Second level element |
Third level element |
Forth level element |
Fifth level element |
Description |
Default |
---|---|---|---|---|---|
<repository> |
This element contains the repository configuration parameters. |
||||
<highlight> |
This element contains parameters for highlighting the matching search terms in the search result. |
||||
<open_ |
Highlight open mark. |
<b> |
|||
<close_ |
Highlight close mark. |
</b> |
|||
<tag_compression> |
This parameter enables or disables (on/off values) tag compression. It can reduce size of storage on disk if documents are small or tag intensive. In case of large text documents with few tags it has no effect on storage size but performance could be affected. |
off |
Example:
<repository>
<highlight>
<open_mark><b></open_mark>
<close_mark></b></close_mark>
</highlight>
</repository>
The following table describes SIETS storage indexing configuration parameters:
Second level element |
Third level element |
Forth level element |
Description |
Default |
---|---|---|---|---|
<index> |
This element contains the indexing configuration parameters. |
|||
<cache> |
||||
<size> |
Indexing cache size in mega bytes, from 50 to 150 MB. If you enter a number outside this interval, then:
Note that the indexing demon uses more RAM than this number, because there are also other operations. If you are importing a large data amounts in size of several GB, then the whole is being used. |
80 |
||
<usage_ |
Minimum indexing amount of the cache in percent. Only if this minimum is exceeded in the cache, the indexing is started. If the data amount in the cache is less than the minimum, the background indexing is not performed. Leave this parameter unchanged, unless advised by the SIETS technical support team. |
10 |
||
<usage_ |
Maximum indexing amount of the cache in percent. If the maximum is exceeded, all CPU and I/O resources will be used for indexing. If the data amount in the cache is less than the maximum, CPU and I/O resources for indexing are used proportionally the data amount in the cache. Leave this parameter unchanged, unless advised by the SIETS technical support team |
90 |
||
<background_indexing> |
Information whether the background indexing is performed. If not,
indexing is performed only when the
Leave this parameter unchanged, unless advised by the SIETS technical support team |
yes |
||
<optimize |
Number of search results to be optimized according to the relevance. Search results after this number are sorted by the rating. It is suggested to have this number the same as the maximum number of documents returned to the result set. The greater the number, the more relevant search results. The lesser the number, the higher performance. |
1000 |
||
<weight |
Weight threshold for the relevance, which is considered as a very relevant. For example, if 100 is the maximum relevance weight interval, then 90 is very close to the maximum, but also is likely that documents with such relevance exists in reality. Therefore, it is considered as very relevant. |
90 |
Example:
<index>
<cache>
<size>80</size>
<usage_idle>10</usage_idle>
<usage_critical>90</usage_critical>
</cache>
<background_indexing>yes</background_indexing>
<optimize_to>1000<optimize_to>
<weight_threshold>90<weight_threshold>
</index>