Siets Server software supports rich set of linguistic search options using only natural language words or phrases in queries. Below are some examples of typical search query syntax for Siets Server XML database when using API command "SEARCH".
Free text natural language terms, case insensitive (default querying mode):
a little john in london
would return all documents that contain combination of relevant words "little", "John” and “London”, but ignoring words "a" and "in".
Siets Server search engine software can process data in 160 languages in the same XML document store. Multiple language content can be stored in one XML document formatted with UTF-8 character set encoding.
Search queries need to be in the same languages that are actually present in the customer data.
Intuitive default 'rule of thumb' for making case sensitive free text queries is to capitalize the first letter in a word:
windows
would find both words "windows" and "Windows". However,
Windows
would find only capitalized words "Windows".
Free text natural language terms, case sensitive:
a Little John in London
would return all documents that contain combination of relevant capitalized words "Little", "John” and “London”, ignoring "a" and "in", and ignoring words "little", "john" and "london".
Contextually best matches based on natural language text analytics performed during every search query, will surface Siets Server results with more relevant results upfront, prioritizing natural language analytics vs algorithmic data sorting.
For the previous example free text query:
a little john in london
document with the following text content:
Little John went to the nearest underground station in London.
would be sorted upfront (surfaced) vs the following document with this text content:
John went to the nearest underground station. It was a little more distance away from his hotel. It was time to go to London.
The second document will also be found and will be present in Siets search results. Yet it will be listed only after the first result as being less contextually meaningful document from the natural language user point of view.
One can say that the first document is more relevant than the second document for the particular query context.
Standard rules for Siets Server language analytics used per each query are based on fast linguistic data pattern matching. Software is taking into account language based factors that determine human relevance of search results. Among pattern matching criteria used by Siets Server are:
The most relevant document context matching the search query terms by combined above text analytics criteria will be surfaced by Siets Server, sorting the results by decreasing contextual relevance and providing small fragments of text snippets from documents where the best context matches were found.
Additionally transactional pagination for results will be used by Siets Server to avoid information overload in web browsers and limit number of results per one page (eg 20, 50, 100 etc), all parameters configurable for every Siets API query.
Siets Server enforces natural language content relevance sorting rules for each free text query, if not specified otherwise. That enables to surface the most relevant results upfront by the best matching linguistic information in the textual content.
Contextual relevance ranking rules can also be flexibly further customized by the owner for different content surfacing rules, custom defined for each Siets Server database through its search index ranking policy. Please kindly read other sections in this website about Siets unique information ranking model and methods how to apply ranking weights to sort, group and order search results.
Finally, search results based on standard text analytics sorting by context relevance can be combined with other information ordering rules through Siets API, for example, enabling results sorting by numeric values or dates like in classic SQL databases.
There are potentially millions of sort orders possible in a single Siets Server database. When the software is responding to free text ad hoc queries coming from millions of users, it is prioritizing results by the best ranked context match first, and only then by other sort orders.
Free text natural language phrase, case insensitive:
"a little john in london"
would return all documents that contain a context phrase "a little John in London” exactly in this order of all words, including "a" and "in", and matching all required words in upper and lower case, where "john" and "John", "london" and "London" etc are equal matches.
Free text natural language phrase, case-sensitive:
"A Little John in London"
would return all documents that contain a context phrase "A little John in London” exactly in this order of all words, including "A" and "in", with words started with capitalized-letters matched in case-sensitive way, matching "Little", but no "little", "John", but no "john" etc.
Use plus + symbol as the word prefix to require strong match irrespectively of this word presence in the ignored word list.
Free text with enforced all search terms matches in a natural language query:
+A little John +in London
would return all documents that contain combination of relevant words "little", "John” and “London” with usually ignored words "A" and "in".
Enclose word in dollar sign $ symbols to request word substitution with a list of word forms derived from stemming and Boolean OR among all of them.
Free text with natural-language grammar rules stemming:
little $john$
would return all documents that contain combination of "little" with either word "John” or all stemming forms like “John’s” etc.
Enclose the word in percentage % symbols % to request this word substitution with a list of synonyms and Boolean OR among all of them.
Free text with synonyms using pre-loaded for each XML database file with a simple list of synonyms, used with boolean OR logic in a query:
little jonh %country%
would find terms "little" and "john" in combination with either word "country", or word "region", or word "area", provided that all 3 synonym words were listed in the synonym file line, comma-delimited, that started with word "country": "country, region, area".
Use asterix * symbol as a wildcard template for words.
Free text with a wildcard template for the word part in suffix:
little Jon*
would find all combination matches of word "little" with either "Jones", or "Jonson", or "Jonathan" etc.
Free text with a wildcard template for the word part in the prefix:
little *athan
would find all combination matches of word "little" with either "Jonathan", "Bathan", "Rathan" etc.
Free text with a wildcard template for the part in the middle of word:
little jo*n
would find all combination matches of word "little" with either "Jonathan", "John", "join" etc.
Use a question mark ? symbol in a free text query as a specific word letter wildcard for designated letter positions per word (or per any character string that is not a delimiter symbol):
little J?s?n
would find all combination matches of word "little" with either "Jason", "Josin", "Jasin", "Jeson" etc.
Use square brackets [ ] in a free text query to specify allowed letters in that letter position in the word.
Free text with a selector for only specified letters template in a particular position in the word:
Little Jonn[iy]
would find all combination matches of case-sensitive word "Little" with either "Jonny" or "Jonni" etc, but not "Jonna", "Jonne" etc.
Natural language phrases with wildcards use for individual phrase terms:
"software develop*"
would find either phrase "software developer", or phrase "software developers" or phrase "software development" etc.
Siets Server automatically detects ignored words above certain threshold in natural language queries.
By default, Siets ignores common words and characters such as "a", "the", “and”, ”where”, “how” etc, as well as certain single characters and single letters, because they tend to slow down the search without improving the search results.
The SIETS server detects words that appear in the SIETS storage most often and gradually adds them to the ignored words list when loading and indexing large amounts of data.
It is possible to place reasonable restrictions (percentage thresholds, word lengths etc) on ignored words list for each Siets Server data store collection so that limitations best fit the application business logic, user search requirements and type of data content.
Please note that SIETS server still creates the full-text index with all ignored words and their positions. Ignored words are actually being included into the index exactly as they occur in natural language text content, so that a user can request search queries with normally ignored word matches, if necessary.
If a common word or a character was ignored during search query by Siets Server, yet it is essential to getting the results you want, you can include it by preceding it with a plus sign +:
John +and Anna
will find only results with all three words present: "John", "and" and "Anna".
Enclose two or more words using @ symbol followed with a number, specifying nearby textual distance in maximum number of words to be searched for matches:
Proximity search for two or more words within nearby textual distance:
@ 4 John Smith @
would find document with content "John Armitage Smith", and "John Henry Armitage Smith", but would not find document with content "John went to the nearest underground station. Smith was not there...".
Combination of all above described natural language search options per single search query with ( ) as AND, { } as OR and ~ as NOT.
John Smith
(John Smith)
would return documents that contain both words “John” or “Smith” in any order.
Boolean ( ) as AND is a default free text search criteria and can be skipped in simple one level queries with no nested other terms.
{John Smith}
would return documents that contain either the word “John” or the word “Smith”.
John ~Smith
would return documents that contain the word “John", but do not contain word “Smith”.
Combination meta-data search using free text natural language context is specific data fields only:
<name>john</name> <address>piccadilly london</address>
would find all documents with "John" in <name> field where <address> field is mandatory containing both text terms "Piccadilly" and "London" in any order.
Siets customers can freely choose to provide this type of easy to grasp programmable XML data filtering, slicing and pivoting into different result subsets for their intelligent end-users for powerful search driven analytics and reporting.
Siets Server can be instructed with "index=all" policy to index all content from all individual XML data fields into a single common full-text index.
The following query example would also work to provide reasonably good search results, if XML database has been indexed to perform search across all unstructured text and through the entire XML data model.
piccadilly london john
Above example would probably be less precise than content search per specific XML field only, still it will yield the list of results for review to user with text snippets showing context and allowing user to decide which result is most useful.
Search across all XML text content can be combined with meta data search in XML data fields.
Siets customers can flexibly sanitize all FTS queries for greater safety just leaving allowed options to perform plain text and phrase queries for users.
Then anyone with basic web type search skills and not being aware about underlying XML data model can easily search database retrieving results by default sort orders and limited to pre-programmed access logic.
Siets software Boolean AND ( ), OR { } and NOT ~ query syntax can be nested in multiple levels to make different logical combinations of FTS queries with XML field level context queries.
Complex business logic, if required by customer application, can be easily created. A simple example:
{(John Smith) (Abby Brown)}
would return documents that either contains the word “John” and the word “Smith”, or the word “Abby” and the word “Brown”.
Additionally unstructured text analytics is possible by textual given content similarity to another textual content with tunable number of significant word occurance and concurrance thresholds for the best results in large data sets with billions of objects.
Another helpful text analytics feature is Did you mean that? type of spell-check correction function returning a list of similar by spelling words from the actual your XML database index.
Users can store and process data in multiple languages within the same XML document, avoiding tons of localization efforts in their application software code.
Additional 'perk' is fast server-side XML data conversion between national ISO charachter sets and UTF-8 data store if requested by client API.
All data is always stored in standard UTF-8 data store on Siets Server and can be queried either using UTF-8 or a national ISO character set.
Natural language based querying and search paradigm is the holistic approach of Siets Server software design.
Siets Server powered customer software applications do not require from end-users to learn more complex query syntax forms than those described above.
It enables Siets Server software customers and application developers to start building applications where users need to have only basic skills how to search for relevant information.
Most of users are already pretty familiar with basics of this knowledge today from their web and corporate database search experience.
Default linguistic search options in Siets Server search query syntax will work well enough just using plain text natural language terms.
Siets Server customers can start providing easy, intuitive, fast and relevant user search experience in their own databases similar to customer satisfaction when using the world's leading web search engines.
Siets users are free to use plain text words or phrases for relevant information search in data, being the most intuitive query terms based on everyone's language knowledge.
Siets Server provides a unique policy tool how to rank customer database search index policy through trainable system of ranking weights, uniquely applied on the custom XML data model for data fields and on the natural language text content that those fields contain.
Most web and mobile users can instantly query and analyze large data volumes at the back-end application services systems, run by Siets Server, using just the web browser and free text natural language query terms: few words or a known context phrase. If necessary, users can expanding the NLP-only query with a bit of additional and very easy to learn syntax options described above.
There is no need to learn SQL or similar complex querying languages in order to retrieve information from vast data volumes and rearrange it into grouped, sorted and ordered way, even up to the precise positioning of individual search result entries.
This Siets as a ranking engine feature enables to replace with NLP terms more complex SQL syntax queries for combined full-text and structured search, that would typically in SQL syntax look like this:
SELECT ... LIKE ... GROUP BY ... ORDER BY ... JOIN
with Siets Server ranking policy (set of relative weightings for XML data model), that will instruct Siets Server to index all customer XML data in application-specific data sorting way by desired relevance, so that complex information sorting, grouping and ordering (relevance of search results) is automatically performed when users are doing search with free text terms in natural language.
Essentially the index ranking policy enables Siets Server customers to "merge" contextual search and sorted structured data search into one single "most relevant result set" from a user point of view, when "information relevance" is being estimated in relation to the application data model and specific business need.
In Siets Server system customers, using policy tool for index ranking, can flexibly govern Siets Server search behavior aligning it with desired free text search relevance, when data is queried and analyzed in plain natural language terms. Siets Server will automatically enforce ranking policy changes during data modification and would also make content sorting rules uniformly available for all applications without the need to change application software code in all applications using Siets platform.
This relevance ranking engine feature in combination with natural language processing (NLP) at search is probably one of the most powerful Siets Server capability. It enables Siets customers to build massively scalable distributed data stores with blazing-fast and relevant search enabled in any types of customer databases with text-rich language content.
Please see more details about all NLP syntax search options, indexing and relevance "policy" methods in Developer documentation.
Read more about SIETS API specification in Developer Guide: Developer Guide / Siets API
Siets outstanding functional feature is that for any XML based data Siets Server additionally builds meta-data for each XML field and creates a "virtual search index", which can be queried separately using simple XML syntax in a query, e.g.:
attorney office <city>"new york"<city>
will return only documents containing somewhere in the text words 'attorney' and 'office', and matching in the
XML field <city> containing a very specific sequential two word text phrase value of "new york".
This technique to search data by exact known or expected phrases is a very powerful and intuitive method how users without special knowledge
about SQL or XML coding can perform basic analytical queries on any Siets Server database.
Another analytical feature of Siets Server is that it can automatically discover and index all numeric and date values within document text parts and allow to combine full text searches with numeric range searches using classic column type data or numeric indexes.
Siets is using simple syntax for that:
value1..value2
Siets Server automatically recognizes use of double dots '..' in API queries and invokes columnar type of data or numeric index for specific range analytics or reporting filter instead of using full text index.
Siets Server also supports policy bases indexing of all numeric and date values within only specified XML document text parts, allowing to combine full text searches with numeric range searches in those classic column type data or numeric indexes.
For instance, a syntax in a query, e.g.:
attorney office <year>1980..2000</year>
will return only documents containing matching XML field <year> values from years 1980 until 2002, effectively narrowing search results to the required numeric interval.
Siets Server enables to combine any full text, XML-based fielded or numeric search options to fully exploit Siets top speed performance and capacity.
Siets Server performs nearly instant retrieval of relevant ad hoc query data, filtering it from millions of documents per single server, when using columnar numeric or dates indexes. Since columnar indexes are cached and stored in RAM in Siets Server, software performance at those tasks is stellar.
Example query:
attorney office 1980..2000
will return any documents where terms 'attorney office' will have an occurrence in a document together with any of numbers 1980, 1981, ..., 2001, 2002 anywhere in a document.
If free text search driven query is used in combination with numeric search range option '..', analytical result set can be additionally sorted by Siets Server on-the-fly with descending or ascending sort order requested in the API call exactly how SQL databases do it.
In contrast to SQL databases, where analytics usually require reading, combining and sorting of tons of data every time an SQL query is executed, Siets Server does not need SQL query optimizers or other complex techniques to speed up analytical querying operations.
Context narrowing by text search terms efficiently slice Siets Server internal workload down to analytical processing of only a very small resulting data subset that needs to be sorted.
Siets Server performance for search-query driven analytics and reporting functionality could deliver near real-time, sub-second response times in most common use cases, even if used in very large distributed cluster databases with billions of records.
Read more about SIETS search options in Developer Guide: Developer Guide / Search
Application developers can invoke Siets Server data classifying (grouping) functionality on any XML tag values, that counts together number of all search results occurances per each facet in a query and returns all facets found.
Developers can use this type of facets-generating analytics to build powerful advanced search applications with categorized by facets result sets to narrow (drill-down), expand or combine unstructured data and XML-field based search criteria as needed for a particular business logic or process.
This XML-drill down feature is specified in document policy file for an XML data field to be indexed as an analytical facet-index:
index="classify"
Siets Server with match all categories to the actual query results and will counts totals of matching results per each of the category into the predefined meta tag <menu>, returned along the set of search results to the user application.
This simplicity of data faceting can be used by customer application to build easy to use, multi-level navigation trees of faceted links and used for drill-down or expand-up browsing of search results subsets per each menu category, without asking end-users typing in those filters manually.
This type of analytical content classification feature among found facets can substantially improve customer satisfaction with web site search functionality. It also offers plenty of convenient navigation programming choices to Siets Server software developer.
In particular, blazing-fast responding e-commerce shop entries for catalogs and sub catalogs of goods sold can be build dynamically depending on what the user searches for and what data must be presented only in the facet-matching result set.
For example, if the end user issues a query and receives 1000 hits containing a full text term 'car', with just first 100 results shown per page, and car brand categories have been indexed as index="classify" items, then Siets Server API will return also in resulting XML all matching <menu> value items with numeric counters how many documents could be found per each category in the total results set per entire database, e.g.,
Ford (154)
Mercedes (12)
Toyota (40)
This type of query-driven faceted analytical information returned by Siets API to client application software, can be the used by user application to build web navigation links to instantly expand or narrow search queries beyond the current subset of results visible on the limited screen.
In our example above, use of menu items as next search query filters for Siets Server in clickable web links, it will instantly narrow down or expand up end-user choices of car brand information to respective manufacturers only, without the need to browse for review all 1000 results.
This navigation-by-results driven categories can be similarly applied to other classified XML field navigation values, e.g., a fine-tuned navigation can be built by narrowing end user choice by clicking on car model, color, car engine type, type of fuel etc.
Please note, that Siets Server supports building and using hierarchy of multi-level XML-drill down. If top category XML data field contains subfields in XML with additional values, above indexing option 'index="classify"' would return also hierarchy of XML tag with all subfields and actual search hits matching also in subfields.
In this way Siets Server engine can help to organize and build a query specific top-to-bottom XML-based navigation trees called XML-drill down which contain only those categories and subcategories where some matching results are actually present.
For end-users it is extremely powerful and convenient option which does not require entering more specific search keywords: users can just click on the itemized categories returned from Siets Server within <menu> tag values to launch the next relevant search.
End-user will also be informed about expected number of results, when looking at faceted navigation links, telling him if the search query terms might be improved if too many results seem to be generated. This could help avoiding unnecessarily broad term transactions, saving end user time and reducing workload on back-office server resources.
It also helps to avoid information overload perception by users if a listing of the complete catalog with hundreds or even thousands of all categories is required for review by user, when the end user is interested just into the small subset from all catalog categories.
To summarize, above described analytical data classification feature by simple XML-drill down option and on-the-fly generated facets within Siets Server is a remarkably useful mechanism how to improve user experience for modern web applications.
I am using this faceted search feature extensively in all my own search-driven web projects.
Read more about faceted search here: Siets XML drill-down
Siets Server developers can operate the platform as a distributed XML data store to create, retrieve or update XML documents.
Among its functionality is:
Siets Server is one of the fastest full text search engines on the market with query response times less than 0.005 seconds with RAM use and less than 0.05 seconds with HDD disk use.
All tests are performed on a single server where not stated otherwise. For performance benchmarks of Siets software used in a cluster configuration running a distributed test database on several servers please see Search Performance in a Cluster Configuration. Test results should be comparable to those of Siets Server installed on equivalent hardware:
In all tests number of search transactions per minute is measured on server
side, that is, excluding transport time of the result set over network.
As it can be seen from results and proven in practice, when data collection is small and can be entirely cached into primary memory, search performance is above 15000 transactions per minute, but when data collection is larger and at least one disk seek is required, search performance is about 1600 to 1700 transactions per minute.
Data is presented for 2000 (memory cached) and 25000 document collections respectively.
Both diagrams illustrate how Siets Server can speed up search queries when running in so called 'main-memory' database configuration. Unlike common logic suggests there are faster search queries for more search terms in a query. Many business applications can benefit from this Siets advantage if real high speed query support in needed for time sensitive business or industrial applications.
Note that second custom system is quite limited and below minimal requirements of Siets Server thus we discourage use of such hardware. However even on this environment Siets performance is outstanding.
Please also note that, contrary to Siets Appliance, custom environments in this test work on Linux kernel version 2.4.
Test results of Siets Search Appliance indexing speed on document collections of different number of documents and different sizes (total loading and indexing time in hours:minutes). Indexing performance test methodology:
Note that indexing performance can be significantly improved (2-4 times) if testing would be done on an enterprise level server with double processors and SCSI RAID multi disk array. Also note that very large data sets can be split among multiple Siets Server hardware nodes and reduction of total indexing time is proportional to the number of cluster nodes.
In this section performance tests were performed on a single server and compared to 3-server cluster configuration.
Test equipment for a single server (used for the reference data):
Test equipment for a 3-server cluster node:
In all tests response times for search transactions in seconds are measured on server side, that is, excluding transport time of the result set over network.
Database of 2,1 million different full text newspapers articles in three languages was used as the content for testing in the environment which is maximally close to real life applications.
One of the most competitive advantage of Siets search engine is that search speeds are almost the same both for relevance-based searches and for rate-based searches. It enables greater variety of end-user applications where search results should be sorted differently depending on the application business logic needs.
Response time, searching in cluster by relevance
As it can be seen from test results and proven in practice, cluster
configurations can be highly scalable (even to hundreds and thousands
of nodes) and maintain almost the same very high speed search
performance which basically does not depend from the total size of
data.
Response time, searching in cluster by rate
Using Siets in cluster configuration gives benefits for both methods of search and maintains practically the same performance levels. These two performance goals can not be met simultaneously by most of other search engine products on the market.
This capability of Siets system has been achieved through optimization of inverted index structure and smart algorithms effectively using PC memory to cache disk data.
Siets Server uses advanced optimizations where large portions of the index are being kept into main computer memory.
If RAM memory is large enough, Siets Server can operate as in-memory database, almost completely caching all data in RAM memory.
Siets Server operating as in-memory database can do more than 250 queries per second on a single hardware server.
All Siets Server control, indexing and query tasks are effectively separated in multi-threaded architecture.
All transactions are threaded as internal operating system processes. In this way multiple searches can be performed at the same time running as separate processes.
All indexing and update tasks are performed in background with lower priority than search queries avoiding slowdown of search queries.
Siets Server software does not require special optimization for multi-processor hardware as it is supported by generic multi-threading on the engine.
Siets API is based on simple XML 1.0 standard and exchanging of XML messages over common http protocol. This makes Siets Server programming completely open from any application and programming language. Development for Siets Server is even more simple than for Web services. Siets API does not use any document type definition schemes what .NET requires.
Your corporate knowledge and skills can be well used to develop new applications in your favorite in house programming language. Your legacy applications can be improved by adding search functionality matching world class Internet search engines in speed and quality of results.
No proprietary client software needed - just follow Siets API documentation and use your existing tools.
Siets Server API protocol is network and firewall friendly. It uses standard Internet 'http' protocol for messaging between Siets Server and client application.
The message stream over 'http' can be verified and protected against malware through application level firewalls and malware filtering proxies.Siets engine's built-in numeric search functionality supports additional functionality often used by many e-commerce and mapping applications: sorting of results by geospatial coordinates.
This feature can greatly speed up and at the same time also greatly simplify development of many applications for the location search.
For example, GPS coordinates can be feed to the Siets Server as longitude and latitude based distance and Siets Server will return all matching results sorted according to the shortest distance from the chosen center of reference point.
For example, in Siets Server one can nearly instantly query for all coffee shops in Nevada 1,3,5 or 25 miles around, sorted by closest distance from your GPS navigation enabled car. Siets Server returns results in less than 0.2 second time from database containing millions of records running on a standard PC server hardware.
Many legacy search tools do not support real-time full text index updates needed for many business applications. People have to do complex integration between their database, typically some SQL, and an external search tool, programming for index consistency checks.
Siets Server supports real-time index updates for all types of indexes: full text inverted indexes, columnar type data and numeric indexes and meta-structure XML indexes. This feature enables to use Siets Server as searchable OLTP database for XML document type data.
One can add, modify or delete any XML data document in a Siets Server database in real-time time and from any location over the Internet using TCP/IP networking and Siets API protocol. All per document updates are available for full text search immediately after the data store modification command.
For large volume massive batch data updates Siets Server performs automatic background indexing informing client application about index status through API command 'status', so that one can always verify if bulk loading of data and its indexing is properly finished.
Siets Server checks index reliability and automatically restores index integrity if unexpected shutdowns or equipment failures occur.
Internally the data store and search engine software maintains tables of checksums for all major control data structures.
Engine also does backup logging for all updates.
Finally, for system administrators 'recover' command is supported as the last option if all other methods fail to ensure index integrity, e.g., after sudden hardware failures or unplanned loss of power to equipment without time for proper shutdown.
This allows to complete correctly most of the last updates after unscheduled hardware crashes, when upon restart Siets Server immediately tries to fix detected index inconsistencies.
For mission critical business applications SIETS Server transaction control demons are protected with denial-of-service type of attack stopping filter.
Integrated solutions usually process incoming queries in totally unprotected way and are bound for overload problems in uncontrolled Internet environment.
Siets Server being all-in-one platform for data storage, search and reliable processing, was designed to be protected against this type of common Internet risk.
Siets Server software engine can be configured to run only specified number of parallel queries. When total volume of transactions grows too large in a very short time period, all other search transactions will be put on a queue for waiting until workload levels will even out.
This feature is indispensable for many Internet search applications where sudden peak of activity can overload your servers in few minutes and make it unavailable or even crash because of overload.
Siets Server API supports implementation of context sensitive triggers, which can be activated upon incoming or recent data updates by different user monitoring applications, by scheduled software agents and by reporting tools doing periodical checks on data.
Application developers can easily create triggers for filtering Siets database content by specific keywords, phrases or Boolean expressions matching standard query syntax of Siets API.
Each new trigger created will have its own unique ID code in Siets database. Monitoring and software agent applications can periodically examine any XML documents against select or all established filters for specific Siets storage.
All context matches found are returned as filter IDs to the user monitoring application.
In case of filter matching events Siets Server engine can execute the predefined script for event logging or messaging on the server side.
Using Siets alerting feature a new set of applications can be developed such as subscription services for software agents, which periodically check recent document updates and send notification messages if the document updates contain words or phrases which match filters.
Developers can add, modify, delete triggers and examine any of the document against established triggers in the new API set. This gives them great flexibility to activate monitoring functionality as frequently as necessary, or check for context triggers only upon new or recent updates.
The advantage of processing context filters on the Siets Server engine is dramatically better performance. Siets context filters are processed in real-time, using Siets Server engine's generic full text index data, yielding about 10 to 100 times performance increase compared to the context filtering if done on an application server side accessing some database with separate SQL transactions for checking every content filter matches.
This gives opportunity to examine any document against tens of thousands of filters in sub second time.
This performance improvement becomes very important in large scale enterprise applications or Internet applications with tens of thousands of users having different individual needs for data monitoring.
Alert triggered events can be emails, scripts writing alert messages into database files, SMS tools or any other messaging system for sending alert signal to other applications. This feature helps many businesses to subscribe for context agents and stay tuned when some content changing update events of interest happens on a Siets engine.
For example, using context alerting, users can subscribe for a news agency press releases or technical documentation alerts by specifying keywords or phrases in text for full text search matches.
SIETS alerting functionality is described in section: Siets API Alerting Functions
Generic mirroring (replication) functionality is supported by Siets Server by its software architecture design. There is no need to invest into high-end solutions just for this functionality by Siets software customers. It's built in.
Replication enables Siets customers to implement highly redundant operational environments where multiple Siets servers are being run in parallel on different hardware servers, load-balancing query workload among multiple copies of the entire database.
Any Siets Server deployment present in the mirroring list of multiple servers in configuration file, will be sent the same data update from the mirror server who will receive the update query.
In this way developers should not complicate their application logic: Siets Server software will do automatic mirroring of updates on all servers configured as 'mirrors'.
Read more about mirroring here: Siets Cluster Mirroring
Siets Server does not limit number of documents.
If a single server hardware RAM and disk storage space is too small to accommodate all customer XML documents, Siets Server capacity can be duplicated by installing the second hardware server of the same configuration and then splitting the data corpus by half.
Each half a databases will be serviced then by separate hardware server, with both servers acting as one large "virtual" database storage.
The process of splitting data into more and more parts (shards) can be periodically repeated along adding more hardware servers.
This would effectively scale out even a giant size database, for instance, to build Internet search index with billions of searchable documents.
Siets Server can be configured to run multiple searchable storages (data stores with own XML document collection) per one hardware server.
Each storage runs as its own Siets Server instance in hardware memory, in an isolated OS daemon process, and is using its own RAM and its own local disk storage folder to process and store data.
Customers can start and stop individual Siets Server storages manually through Siets Enterprise Manager GUI, giving Siets Server administrators secure and full control over Siets Server database availability and security at any time.
There are no limits on total number of Siets storages per each server but only available local RAM and disk storage space.
With Siets you can create as many storages you would like to have per server. It is more convenient when you need testing platform or if you want to prove new application concept on a copy of your data.
Typically customers can run some 15-20 storages per single hardware server without running into local RAM and disk storage space limitation problems.
Siets Server does not limit number of users per Siets Server.
Typically Siets Server is being used in corporate environments to service only internal application software of customer, that will take care about all end-user authentication and authorization. In practical setup that would mean at most few or some tens of API users per Siets Server in a typical corporate deployment, where applications then would service thousands and even millions of end-user web queries.
Each Siets API client application user could be safely restricted for use of certain Siets Server storages only or for certain Siets API commands only for internal safety partitioning among developers or production system network administrators.
One can create as many internal SIETS API users as necessary for developer or testing groups to work with a single Siets Server deployment per organization.
For fast performance reasons between Siets Server and client application software, there is no need to encrypt Siets API internal http messaging, if not required by the business for extra security. Encryption is a well-secured environment would just unnecessarily slow down performance among Siets Server and customer application software accessing it though Siets API.
For the same performance maximizing goal Siets Server does not maintain more complex and slower at data processing user-session based authentication system, just providing basic password based authentication per each Siets API call.
It is assumed that client application will work as middleware in a 3-tier computing system and would not allow external users to directly access Siets Server from outside or corporate firewall.
It is recommended to operate Siets Server only behind corporate firewalls and even without public IP address access so that no one can get direct unauthorized access to Siets Server hardware.
It allows for significant savings in any growing business with added new users and new applications every day.
For more customer benefits please visit section: Solutions