r2 - 29 Apr 2004 - 00:38:17 - LeoGalambosYou are here: TWiki >  Egothor Web  >  CoreModel > EngineModel

Core Model of the Engine

The core data structure of EGOTHOR is a barrel that is denoted as B for the following thoughts. A barrel is an autonomous full-text index that is built for a collection of B.size documents. The B.size value is termed the construction size of a barrel. The barrel structure is static (in its basic implementation), that is, it cannot modify its inner structure to reflect the addition, removal, or change of a document in the collection. Nevertheless, an exception exists: the barrel can simulate the removal of documents by denoting the documents as removed. This can be simply achieved by a bit array that stores 1-bit at position k, if and only if the k-th document was already removed. The number of removed (denoted) documents of barrel B is B.deleted. It is obvious that 0<=B.deleted<=B.size.

The barrel offers two main operations: search(Query) and removeDoc(docID), where Query is a user's query and docID is the unique identification of a document that will be denoted as removed.

Moreover, the barrel can construct a special object that is able to read its inner structure without reading values related to removed documents. The object is constructed by the open() method of the respective barrel, and is known as a barrel reader BarrelReader.

Every barrel has its respective barrel writer class BarrelWriter that can make the barrel reader persistent to the form of a barrel structure. This functionality is guaranteed by the append(reader) method, where reader is the input barrel reader object. The name of the method is chosen correctly, as sometimes this method may have to be called several times. For instance, a barrel that is constructed in inner memory has not significant problems implementing it this way (Earlier it was stated that the basic barrel structure cannot modify its inner structure, but a memory barrel is a higher level implementation). On the other hand, a barrel writer that creates barrel structures on a CD ROM disk may not offer it. This last behaviour is supposed to be default.

Having barrel B, a copy can easily be created when its barrel reader is passed to a barrel writer. This action can be written as BarrelWriter.append(B.open()) (static call is made instead of proper allocation of a barrel writer object and calling its append method, in order to simplify the text), with the final product being a new barrel B' that is a clone of the barrel B. A clone is an independent copy that can even use a different data format. This clone schema can be successfully used in many situations, for instance, when moving a barrel from one network node to another, to create a new copy in a new format, or on another device, etc.

Obvious question is, how a barrel of (construction) size k can appear in the system? The core algorithm has a very simple background that is well-known in many other areas of database technology. A barrel of size 1 can be easily built when constructing the respective inverted lists for a document. When there are enough barrels, they can be merged to a larger barrel, and then be removed from the system. In this way larger and larger barrels can be constructed.

Edit | Attach | Printable | Raw View | Backlinks: Web, All Webs | History: r2 < r1 | More topic actions
 
Powered by TWiki
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback