Data Integration Under Integrity Constraints

Abstract

Background: This study is founded upon the understanding that the main objectives of database are to guarantee consistency and this is the reason why concurrency, security, and reliabii8lity and integrity controls tools are used. Integrity controls are used to present any fork of semantic errors caused by users to recklessness of lack of adequate knowledge. Additionally, many organization use heterogeneous data sources that must be semantically integrated

Objectives: This paper explain that to deal with integrity constraints over global schema, and source schema, it is important to understand that queries can be reformulated and sent to the data sources to derive the most relevant answers and this are the main objectives of the paper. The paper seeks to determine how integrity constraints can be used to derive more information from the incomplete sources and databases with incomplete information. Additionally, this paper seeks to determine how inconsistency caused by contradicting data at the sources affects the whole system and how to it can be solved.

Results: A novel model for solving problems caused by data integration constraints was developed. The researcher argues that data mapping can solve both source and global schema problems by developing a retrieval global database that satisfies all the foreign key constraint in the global schema

Conclusion: The paper concludes that data integration system query I = G; S; Mi can easily provide specific tuples on which data to extract from the database. The paper also concludes that data integration under constraints should be directed at search algorithms that can produce more relevant and optimized query retunes.

1. Data Integration Under Integrity Constraints

1.1 Introduction

One of the key objectives of data integration under integrity constraints is efficiency. However, not many organizations have not been able to realize this because they do not know how to optimize the global schema. Data integration system have also been known to provide the users with access to uniforms and autonomous data sources but the success depends on the approach pursued by the organization to design the data integrations system. Form example, one may decide to use the global centric approach or the local centric approach, but the local centric approach is not preferred because it is not efficient and does not provide reliable search queries as it uses incomplete information. It is important to note that the main constraints in data integration include foreign key constraints and key constraints. The other constraints is the integrity constrains (Lembo, Lenzerini, &, Rosati, 2002)

1.2 Why Data Integration

As the economy increasingly become more data driven, most people rely on data to made decisions and this means that these data must be adequate to optimize decision making and realize the best decision . However, over time, it has become increasingly clear that data information system relies on data too and these data must be massive enough, while software companies are also increasingly developing more complex data management tools. This paper posits that data integrations are becoming a challenge because of the demand for data is increasing (Fernandez, et al, 1999, pp. 614–619).

Data sources are increasingly, and the types of data is also increasing over time and so is the vast volumes that means that organizations are n have to contend with variety of sources, types and volume. The main idea behind data integration is to create a robust system in which can help organization and individual access data more so data with the highest level of integrity. Most organizations have adopted the data federation tools but these tools have inherent limitations. Additionally, most organizations still consider them big data users but have not invested inadequate data integrations tools that improve integrity. While the data federation tools mesh up different databases, the drift computing tools helps manage the localized data (Johnson, &, Klug, 1984, pp. 167–189).

1.3 Integrity Constraints

There are various commercial data integration tools and the most common include Oracle 10 g Information Integration, Microsoft SQL Server 2005 and IBM DB2 Information Integrator. These can either be combined to improve efficiency, and data integrity. There are also many forms of integrity constraints in any data schema, but these integrity constraints limit the ability of an organization to realize the maximum from its data unless number of modules is combined together to handles each form of integrity contains (Zhou, et al, 1995, pp. 4–18).

While data integration is geared at combining different data types from different sources, it is important to be able to combine real data stored at the sources and the global data that can be mapped. However, there is another major issue with such systems. For example, data integration requires that the mapping between the global cinema and the data sources be specified. It also requires that the questions expressed on the global schema must be processed

2. The Global Scheme Mappings Can Be Done Over The Source Data.

The most common model is the GAV (Global-As-View) model and this is the most preferred mapping system. Designing a data, integration system can be quite challenging, as they requires choosing the method suitable for computing the desired answers to queries. Grant, Gryz, Minker, &, Raschid, (1997, pp. 444–453) argues that all queries must be conducted in consideration of the global schema and this must be in a way that the queries can be reformulated and sent to the data sources to derive the most relevant answers

2.1 Assumption In Data Integration Mapping

The way in which mapping assertions are interpreted assumes particular importance in defining the semantics of the data integration system. According to the literature, three different assumptions can be made on the mapping assertions: under the sound assumption, data provided by the sources are interpreted as a subset of the global data. Vice versa, the mapping is assumed complete, when sources data provide a superset of the data of the global schema. The mapping is assumed exact, when it is both sound and complete (Kowalski, Sadri, &, Soper, 1987, pp. 61–69).

Another important aspect in a data integration system is whether the system is able to materialize data retrieved from the sources (through the mappings). In the materialized approach, the system computes the extension of the structures in the global schema by replicating the data at the sources. Obviously, maintenance of replicated data against updates at the sources is a central aspect in this context. A possible way to deal with this problem is to recompute materialized data when the sources change, but it could be extremely expensive and impractical for dynamic scenarios.

Integrity constraints can be useful when extracting more information especially when there are incomplete sources or database with incomplete information. Never the less, integrity constraints can also because the problems of inconsistency in the entire system as the data would be contradictory at the sources. To manage these issues, it is important to develop a data integration system (Popa, et al, 2002, pp. 598–609).

2.2 Data Integration Model

In the beginning, it was note d that data integration is faced with integrity constraints in the systems global schema. However, data integration is seen in terms of three main components, the global schema, source schema and mapping. Therefore, this can be expressed in the forms of DI=GS+SS+M

Global schema is similar r to relational schema and may include the integrity constraints. For example, key constraints, general inclusion dependencies as well as foreign key constraints are the main constraints associated with the global schema. The global schema (g) refers to the reconciled, integrated and a virtual view of all the sources irrespective of the users who query the integration system. The global schema is composed of the relational schema, and the constraints. In this case, the constraints will include both the key constraints, and the foreign constraints. The source schema in this case represents all the sources the the integration system can access because they are wrapped in such a way that they cannot be viewed as relations. Finally, the M refers to the main g between the global schema and the source schema and is known to be the only connection linking between all the element of the global schema and the source schema.

In this study, the mapping under consideration is the GAV mapping which is associated to ach of the relations in any querying this case, the data are only accessed during the query processing because it is the most common approach because of the on demand request for information

2.3 Managing Integrity Constraints

To handle integrity constraints it is important to take cognizance of the fact that moist transaction companies are faced with federation constraints. Currently, most users’ domain of interest is to express integrity constraints over the global schema. It is also important to note that data from various sources cannot satisfy the integrity constraints that that are not under the control; of the data integrations system. Additionally, there are also chances that the local data may be consistent, but becomes inconsistent when they are integrated and in such case, if he sources are changed the data gets lost (Fagin, Kolaitis, &, Miller, 2003, pp. 207–224).

One of the main advantages of integrating data under integrity constraints is that it helps in extracting additional information from the incomplete sources and this is the case with this database with incomplete information. Additionally, when integrating data under integrity constraints, one if faced with inconsistency in the whole system this means that one has to tweak the system in order to accommodate the integrity constraints. The contradicting data sources therefore make it important to ensure that

3. Solving The Problems Of Data Integration Yonder Integrity Constraints

Accommodating The Integrity Constraints

To accommodate the integrity constraints especially when there are inconsistencies, it is important to use a system that can accommodate both the key and foreign constraints in the global schema. The GAV approach can be effective in defining all the mappings between the global schema and the source schema because all the heterogeneous data sources especially the relational databases and the legacy databases. In the recent past, the web bases have also become the other source sofa inconsistencies, that is why companies may, and individual query users prefer integrating data. On the other hand, new systems have been developed to deal with non-relational data sources by wrapping them before they are presented to the query processing subsystems. Wrapping the non-relational data sources enable is easy incorporation of data queries and data cleaning thereby solving conflicts (Fernandez, et al, 1998, pp. 414–425; Arenas, Bertossi, &, Chomicki, 1999, pp. 68–79).

To completely realize the GAV mapping between the relational and the global schema, it is important to first determine the arbitrary queries that will be used as it will allow for the incorporation of the queries data cleaning. It is also important to note that by resolving the conflicts through extraction process from the data sources, the foreign and key constraints will not cause any further violation even through the generate tuple. I the first thing is to delegate the responsibility of dealing with the foreign and key constraints to allow the system to provide answers to any queerly depending on the joins on attribute values which are not stored in the sources. These queries existence are guaranteed by the foreign key constraints

Additionally, the system will also deal with the foreign key constraints by automation. For example, it is quite clear that the query-processing algorithm that processes similar set of answers to queries. These answers can be considered as complete with respect to the errantries of the data integrations system. In most case, it is important to realize the conceptually different phases considering that facts that the system works mostly in optimized sub phases leading to complete computations (Li, et al, 1998, pp. 564–566).

3.2 How The System Works

The sub phases are three and the iterative is as shown below:

First the queer is expended in order to accommodate the foreign, and key constrains common with the global schema

2. Secondly, the expanded are unrolled based on their definition in the mapping to obtain a query that can be expressed on the global and the relational sources

Thirdly, the expanded and unfolded query are executed over the global and the rational sources thereby producing the right answers to the initial query

3.2 Alternative Approach

If the query answering approach is as started below: I=(G,S, Mg,S) for the data integrations seem, then for each of the relations of the global schema, the query over the sources schema the mapping s associates to r.. Therefore, the p(r) must implement suitable duplicate record elimination strategy in which not every database can provide any pair of tuple. This eliminates the possible of duplicate returns. Duplicate record elimination and data cleaning are some of the fundamental objectives of data integrations system.

Additionally, if the quivery (q) posed by the I to the database D, then to compute for answers qI,D to the original equation qw.r.t I, and D, the computational algorithm will be as follows:

If For Each R (Global Schema), The Relation RD

If for each r (global schema), the relation rD is computed by simply evaluating the query p(r) over the sources database D then all the relations obtained can form the retrieved global database ret (I, D) of the p(r) does not violate any of the key constraints then it is healthy to assume that the retrieval global database will satisfy all the constraints in the global schema. However, if he retrieved global database can satisfy all the foreign key constraints in the global schema, then the integrity constraints are overwritten by simply evaluating the q over ret (I, D) for the query to return relevant answer. Never the less if the retrieved global database can be used to build a database for me and satisfy the key constraints by adding relevant tulles to the relational of the global schema thereby satisfying the foreign key constraints (Grant, Gryz, Minker, & Raschid, 1997, pp. 444–453)

Queries to a data integration system I = G; S; Mi are posed in terms of the relations in G, and are intended to provide the specification of which data to extract from the virtual database represented by I. The task of specifying which tuples are in the answer to a query is complicated by the existence of several legal global databases, and this requires introducing the notion of certain answers. A tuple t is a certain answer to a query q wrt a source database D, if t 2 qB for all global databases B that are legal for I wrt D.

4. Conclusion

The database is composed of three main components, the extensional database (herein known as the facts), the intentional database (also called the rules) and the constraint theory that are referred to as the integrity constraints. In the modern communication environment, data integrations under integrity constraints is very important because it affects the quality of search ret turn, this means that the semantics of data integrations system must be defined to deal with incomplete information and any inconsistencies in the system. While data integration system is a complex topic, it is one of the key topics that would dominate the search algorithms s it helps in refining the search returns. Therefore, as economies become more data driven, the need for an optimal system that can handle data from global source and relational sources is important. This paper argues that further interest in data integrity under constraints should be directed at search algorithms that can produce more relevant and optimized query retunes.

5. References

J. Grant, J. Gryz, J. Minker, &, L. Raschid, (1997). Semantic query optimization for object database. Proceedings of the13th IEEE International Conference on Data Engineering (ICDE’97), Birmingham, UK, pp. 444–453

C. Li, R. Yerneni, V. Vassalos, H. Garcia-Molina, Y.Papakonstantinou, J.D. Ullman, M. Valiveti, (1998). Capability based mediation in TSIMMIS, in: Proceedings of the ACM SIGMOD International Conference on Management of Data, Seattle, WA, USA, pp. 564–566

M.F. Fernandez, D. Florescu, J. Kang, A.Y. Levy, D. Suciu, (1998). Catching the boat with Strudel: experiences with aweb-site management system, in: Proceedings of the ACM

SIGMOD International Conference on Management ofData, Seattle, WA, USA, pp. 414–425.

M. Arenas, L.E. Bertossi, &, J. Chomicki, (1999), Consistent query answers in inconsistent databases, in: Proceedings of the18th ACM SIGACT SIGMOD SIGART Symposium on Principles of Database Systems (PODS’99), Philadelphia,PA, USA, pp. 68–79.

R. Fagin, P.G. Kolaitis, R.J. Miller, &, L. Popa, (2003). Data exchange: semantics and query answering, in: Proceedingsof the 9th International Conference on Database Theory (ICDT). Siena, Italy, pp. 207–224.

L. Popa, Y. Velegrakis, R.J. Miller, M.A. Hernandez, R. Fagin, (2002). Translating Web data, in: Proceedings of the 28thInternationalConference on Very Large Data Bases (VLDB Hong Kong, China. Pp. 598–609.

R. Kowalski, F. Sadri, &, P. Soper, (1987). Integrity checking in deductive databases, in: Proceedings of the 13th InternationalConference on Very Large Data Bases (VLDB’87), Brighton, UK, , pp. 61–69.

J. Grant, J. Gryz, J. Minker, &, L. Raschid, (1997). Semantic query optimization for object databases, in: Proceedings of the13th IEEE InternationalConference on Data Engineering (ICDE’97), Birmingham, UK, , pp. 444–453

Johnson, D.; &, Klug, A. (1984). Testing containment of conjunctive queries under functional and inclusion dependencies. J. of Computer and System Sciences 28 167–189

Lembo, D., Lenzerini, M., &, Rosati, R. (2002). Source inconsistency and incompleteness in data integration. In: Proc. of KRDB 2002.

Fernandez, F., Florescu, D., Levy, A., &, Suciu, D. (1999).Verifying integrity constraints on web-sites. In: Proc.of IJCAI’99. 614–619

Zhou, G., Hull, R., King, R., &, Franchitti, J. (1995). Using object matching and materialization to integrate heterogeneous databases. In: Proc. of CoopIS’95. 4–18

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s