Historically, approaches to the design and construction of DW can be traced to two main directions, initiated by the founders of the industry, Ralph Kimball [1] and Bill Inmon [2].
Kimball's approach follows a bottom-up approach to designing a DW architecture, in which data stores are formed first. An ETL tool is then used to retrieve data from multiple sources and load it into a staging area of a relational database. The next step involves loading the data into the DDW, which is denormalized in nature. The process partitions the data into a fact table, which is numerical transactional data, and a dimension table, which is reference information that defines the facts. To integrate the data, this approach involves aligning the dimensions of the data. This ensures that a single data element is defined in the same way for all facts [3].
Inmon proposes the concept of DW development, which is the definition of the subject area and objects with which the enterprise works, such as customers, products, suppliers, etc. It defines DW as "a subject-oriented, integrated, immutable data set that maintains a timeline and is organized for management support purposes." A logical model is created for each primary entity with all attributes associated with that entity. This logical model can include all details, aspects, relationships, dependencies, and connections. The advantage of this top-down approach is that DW acts as a single source of truth for the entire domain, all data is integrated [3].
Bill Inmon and his company have developed a technology known as "textual disambiguation". This technology applies context to the raw text and reformats it into a standard database format. Textual disambiguation is performed by running custom textual ETL/ELT. It can be applied anywhere there is raw text, such as documents, Hadoop, email, etc. The concept of a semantic layer, representing corporate data through common terms, is introduced. The semantic layer displays complex data in defined terms and makes a unified consolidated view of the data of the entire organization [4] [5].
The DW development methodology, which integrates data from different sources by combining DW and ontology technologies, is considered in [6], where each source can have its own local ontology, which refers to the global one, with the possibility of its specialization or expansion.
The combination of DW, OLAP and SW technologies is considered in [7]. SW technologies are applied to data modeling and representation, including semantic annotation and semantically-aware ETL processes.
In [8], an original methodology for building a text data repository and performing OLAP operations on text data after categorizing text documents in a hierarchy of concepts by fixing the contextual similarity between text documents is proposed.
The KB project for IA follows the path proposed by Bill Inmon, which allows you to avoid data redundancy thanks to the normalized form of the entity structure and to simplify the procedures for updating and forming queries using a single logical model.
The goal of the project is to research approaches and methods of creating KB IA based on SM, which is displayed in a relational data model.
The object of research is IA based on knowledge in the sense of the definition given in [9].
The subject of research is KR in IA in the form of SM, as a mathematical model of a conceptual structure consisting of a set of concepts and cognitive connections between them. SM is represented by a generalized graph, where concepts and entities correspond to graph nodes, and connections between concepts correspond to arcs [10].
Achieving the project goal consists of the following tasks:
- formulation of the mathematical model of the SM structure;
- creation of a relational data model based on a mathematical model;
- implementation of a relational model for a specific database;
- development of a system of requests to the constructed SM by means of logical programming;
- execution of a test example on the built system.