Core Model of the Engine
The core data structure of EGOTHOR is a barrel that is denoted as
B for the following thoughts. A barrel is
an autonomous full-text index that is built for a collection of
B.size documents. The
B.size value is termed the construction size
of a barrel. The barrel structure is static (in its basic
implementation), that is, it cannot modify its inner structure to
reflect the addition, removal, or change of a document in the
collection. Nevertheless, an exception exists: the barrel can
simulate
the removal of documents by denoting the documents as
removed. This can be simply achieved by a bit array that stores 1-bit
at position
k, if and only if the k-th document was already removed.
The number of removed (denoted) documents of
barrel
B is
B.deleted. It is obvious that
0<=B.deleted<=B.size.
The barrel offers two main operations:
search(Query) and
removeDoc(docID), where
Query is a user's query and
docID is the unique identification of a document that will be
denoted as removed.
Moreover, the barrel can construct a special object that is able to
read its inner structure without reading values related to removed
documents. The object is constructed by the
open() method of the
respective barrel, and is known as a barrel reader
BarrelReader.
Every barrel has its respective barrel writer class
BarrelWriter that can make the barrel reader persistent to the
form of a barrel structure. This functionality is guaranteed by the
append(reader) method, where
reader is the input barrel
reader object. The name of the method is chosen correctly, as
sometimes this method may have to be called several times. For
instance, a barrel that is constructed in inner memory has not
significant problems implementing it this way (Earlier it was
stated that the basic barrel structure cannot modify its inner
structure, but a memory barrel is a higher level implementation). On
the other hand, a barrel writer that creates barrel structures on a CD ROM
disk may not offer it. This last behaviour is supposed to be default.
Having barrel
B, a copy can easily be created when its barrel reader
is passed to a barrel writer. This action can be written as
BarrelWriter.append(B.open()) (static call is
made instead of proper allocation of a barrel writer object and
calling its
append method, in order to simplify the text),
with the final product being a new barrel
B' that is a clone of
the barrel
B. A clone is an independent copy that can even use a
different data format. This clone schema can be successfully used
in many situations, for instance, when moving a barrel from one
network node to another, to create a new copy in a new format, or
on another device, etc.
Obvious question is,
how a barrel of (construction) size
k can appear in the system?
The core algorithm has a very simple
background that is well-known in many other areas of database
technology. A barrel of size 1 can be easily built when constructing
the respective inverted lists for a document. When there are enough
barrels, they can be merged to a larger barrel, and then be removed
from the system. In this way larger and larger barrels can be
constructed.