This guide describes the SIETS administration and configuration concepts and contains step-by-step instructions for administering and configuring SIETS servers using SIETS Enterprise Manager.
This section contains the following topics:
SIETS is a system for information storage and retrieval. The SIETS system consists of the SIETS server and application programming interface (API), and SIETS Enterprise Manager.
The SIETS server is an operational unit that performs information storing and retrieval tasks by executing a predefined set of commands.
SIETS API libraries are used for building information storage and retrieval applications that are specific and customized according to your company.
SIETS Enterprise Manager is a tool for administering a single SIETS server or a cluster of SIETS servers.
SIETS is designed with grid computing in mind. SIETS server instances can run on multiple logically connected computers forming a cluster that theoretically can store unlimited amounts of data and merge search results.
Nowadays, unstructured data amounts in companies are increasing very rapidly; the only way how to effectively retrieve such data from collections and, therefore, make the data usable, is full text search (FTS). Full text search is the main methodology implemented in SIETS server for information indexing and searching.
Full text search in SIETS is based on an optimized mathematical model, which ensures very high performance for searching poorly structured information in large amounts compared to traditional SQL systems. For this purpose, in SIETS, data are stored in an inverted index.
Subjects for full text search can be any unstructured data, for example, text collections, separate phrases or words in text documents, Web pages, Web addresses, several special markups for textual and numerical data, bookmarks of HTML or XML pages, domain names, SQL database entry key IDs, file names, and so on.
The following figure illustrates the SIETS system from a high level.
In Figure 1, users are accessing the SIETS server via FTS queries. However, also other technologies for data storing, manipulating and other implemented in SIETS, for example, retrieval queries, update requests, status and control commands, XML queries using XPath notation, and so on.
This section contains the following topics:
SIETS server is a stand-alone server for storing and retrieving information such as plain texts or XML structured documents. It can be run in one or more instances per computer.
For more information, see Multiple Storages Architecture.
SIETS application programming interface (API) is a standardized set of commands for accessing the SIETS server.
SIETS Enterprise Manager is an administrative tool, which allows administering and configuring all SIETS system parameters and options and which is accessed via the HTTP(S) protocol. Therefore, administering SIETS is a maximally convenient and compact task.
SIETS document is a unit in the SIETS storage against which searching is performed. It can be unstructured or XML structured.
SIETS storage is a data collection for storing SIETS documents in a format that ensures a search is performed very fast. The SIETS storage is serviced by one SIETS server instance, and consists of vocabulary, document repository, and inverted index. Multiple storages can be run on a single computer.
Vocabulary is a list of all unique words in the SIETS storage. Unique words are found in documents and added to the vocabulary while storing these documents to the SIETS storage. Each SIETS storage has its own vocabulary. Each word in the vocabulary has an ID of the integer type assigned to it. Vocabulary is stored in RAM for better performance.
Document repository is a place where all SIETS documents are stored in the format, in which they were stored in the SIETS system, for returning the documents on a search request. Each SIETS storage has its own document repository.
Inverted index is a list of words, where each word has a list of pointers to SIETS documents in which the word occurs. Inverted index ensures fast FTS functionality with possibility to build different logical expressions when performing a search. Each SIETS storage has its own inverted index.
For more information on inverted index, see What Is Inverted Index?
The following SIETS administrator tasks can be performed:
Task |
Steps described in |
---|---|
Administering the SIETS server or a cluster of SIETS servers, which includes the following:
|
|
Configuring each SIETS storage on the SIETS servers. |
|
Running SIETS commands to test the system. |