Research papers on distributed database system

Nested transactions have been proposed to overcome the limitations of flat transaction model. Nested transactions extend the notion that transactions are flat entities by allowing a transaction to invoke atomic transactions as well as atomic operations.

  1. essay on science and technology in our daily life.
  2. malcolm gladwell outliers essays.
  3. annie dillard essay moth.
  4. other cultures poetry essays!
  5. Research papers that changed the world of Big Data.

They provide safe concurrency within transaction, allow potential internal parallelism to be exploited and offer an appropriate control structure to support their execution. In this paper we describe distributed database system and their transaction process. In this paper we also describe advance-nested transactions where the transactions from one system interact with the transactions from another system.

Such nested transactions can expect to become more important with the introduction of network operating systems and heterogeneous distributed database systems.

Research on Distributed Database Query Optimization Based on Genetic Algorithm

Finally, we will study about concurrency issue of nested transaction with respect to distributed database. Home : Introduction. Additionally, this optimized technique is being finely integrated with proposed sites clustering algorithm and cost model for data allocation. It is worth mentioning that the obtained results confirm emphatically that the proposed optimized approach outperforms Abdalla, to the great extent, and proves to be a potential progress not just in lessening TC substantially, but also in promoting DDBS performance significantly.

To sum up, contributions along with motivations of this work are clearly featured as follows;. Objective function of Abdalla, had not delved communication costs into distributed query costs which leads to its being inefficient on either mitigating costs of communications, as distributed query being processed, or even evaluating the whole technique.

Communication costs, however, is the prime rational for which Abdalla, and also present work basically come to find a practical solution capable of minimizing these costs to the most greatest extent. In other words, practically reducing communication costs has been the major concern of present work providing that these costs are being carefully reflected in the intrinsically-amended objective function.

Moreover, having communication costs involved within objective function would satisfy: 1 the reflection of actual, or at least near-optimal, reality of Transmission Costs TC Sewisy et al. On contrary to Abdalla, , to further minimizing communication costs, present work aims at delicately proposing clustering algorithm for sites as sites clustering has significantly proven to be considerably efficient on lessening communication costs Sewisy et al.

Data allocation has widely proven to be super effective factor in DDBS productivity promotion, specifically as it has been done appropriately. In full contrast with Abdalla, , in which just data replication scenario for data is adopted and data replication was bound to be permanently met, present work seeks to adopt replication when it is just necessary, and non-replication scenario as well for both works which would contribute in avoiding unnecessary replication of demoralizing effects. Present work, however, seeks to substantively draw evaluation process for both works while strictly maintaining their circumstances.

This evaluation has been expressively conducted with considering the precisely-modified objective function in mind. To prove proposed concepts of the present work, many different experiments under varied circumstances have been conducted that an internal and external evaluations are extensively drawn in self-explanatory frame. The rest of this paper is elegantly organized as follows; section 2 profoundly covers earlier works, which are closely relevant to this work. Site clustering algorithm is stated in section 4. In section 5 , the proposed data allocation and replication models are elaborately given.

In section 6 , pseudo code algorithm is briefly provided. In section 7 , experimental results are extensively drawn. Finally, conclusions and future work directions are included in section 9. As example, in Ceri et al. On the same line, Zhang and Orlowska, draw two-phase HF method.

distributed system IEEE PAPER 2017

In first phase, relations were fragmented by primary HF using predicate affinity and bond energy algorithm. Secondly, relations were further divided using derived HF. Fragmentation and data allocation was considered together as well. Amer and Abdalla, Presented a cost model to find an optimal HF in which two scenarios for data allocation were considered so that no supplemental complexity was added to data placement. In follow-up work, this model was further extended in Abdalla, and mathematically shown to be an effective at reducing costs of communication.

Experimental results, performance analysis and model practicality were not provided, though. On the other hand, a hybridized fragmentation was proposed in Harikumar and Ramachandran, to reduce database access time based on subspace clustering algorithm. Data fragments were generated with respect to tuple and attribute patterns that the closely correlated data were assembled together. In the meantime, Hauglid et al.

Moreover, approach feasibility was experimentally demonstrated. By the same token, Abdel Raouf and Badr, gave an enhanced system to perform initial-stage fragmentation and data allocation along with replication at run time over cloud environment. Site clustering was addressed as well to enhance DDBS performance through increasing local accesses. Meanwhile, Lin et al.

In terms of reducing communication costs, two heuristic algorithms were then given to find a near-optimal allocation scenario. This algorithms was proven to be close enough from being an optimal compared to Lin et al. In Amita Goyal Chin, , as a new of its kind, a partial data reallocation and full reallocation heuristics were approached to minimize costs and keep complexity controlled. Moreover, to find an optimal data allocation technique, Abdalla et al.

Article Sidebar

Actually, POEA was originally sought to integrate some previously-proposed concepts used in its earlier peers including Mukherjee, Dejan Chandra Gope, , on its turn, developed a dynamic non-replicated data allocation algorithm named, NNA. Data reallocation was done with respect to the changing pattern of data access along with time constraints. By the same token, Singh, approached data allocation framework for non-replicated dynamic DDBS using threshold algorithm Ulus and Uysal, , and time constraint algorithm Singh and Kahlon, Furthermore, this work was shown to be most efficient in terms of long-term performance than threshold algorithms when access frequency pattern changes in rapid paces.

The problem of having more than one site qualified to have data was also discussed. Lastly and most importantly, in Wiese, , Data Replication Problem DRP was formulated to have precise horizontal fragmentation of overlapping fragments. This work aimed at placing N-copy replication scheme of fragments into M distinct sites with ensuring that overlapping is being precluded. In follow-up work, Wiese, was further being extended in Wiese et al. Runtime performance was analyzed and data insertion and deletion were addressed as well.

This work is ultimately driven by different quests with Transmission Costs reduction has been the top primacy. On the other hand, introducing a mathematically-based data allocation, integrating non-replication scenario for data allocation as well as proposing site clustering algorithm have been further aspects of evolving this approach of this paper.

Therefore, motivations could be presented in the next a few lines. Motivations of this work are delicately identified to be either general or particular motivations as follows;. After a comprehensively-done investigation for related work, in context of horizontal method of relations in relational distributed database, and to best of our acknowledge that there has never been any single work seeks to integrate several communication costs-reducing techniques horizontal fragmentation, data allocation and replication, site clustering, mathematical models,.

Therefore, this work meant to be promising, leading and distinguishable approach able to be valid as mathematical-based general solution for most problems of DDBS performance. In the sense that all these techniques are set to be combined together in purpose of finding creative sustainable solution for DDBS performance improvement. Moreover, this work has successfully been shown to be highly promising either theoretically, experimentally or mathematically.

On the other hand, it is worth referring that Hababeh et al. Both of references were adequately cited on our paper, though. After a thoroughly-made investigation and carefully-raised questioning for all closely-relevant earlier studies in terms of DDBS performance enhancement, Abdalla, has been found to be an interesting well-designed technique to be examined and significantly extended. The major purpose of Abdalla, was focused on DDBS rendering promotion through decreasing communication costs. Data allocation process also was not clearly elaborated as it was given merely theoretically.

Moreover, no evaluation for technique was provided so that objective function could be identified as being effective. In other words, those flaws and gaps make Abdalla, to be somehow appearing unable of achieving the acquired goal of reducing communication costs as well as promoting DDBS performance, which it was originally intended to be satisfied in the first place.

  • harvard mba admissions;
  • CSDL | IEEE Computer Society.
  • college essay brainstorming help?
  • american revolution inevitable essay.
  • doctor ambitions essay.
  • parenting styles thesis;
  • Research load balancing technology of distributed database based on consistent hash!
  • This claims, nevertheless, are significantly confirmed to be indisputable facts via discussion and evaluation presented in this work. To sum up, using Abdalla, as essential and initial cornerstone, this work comes to fully and intrinsically perfect and optimize technique of Abdalla, In this work, fragmentation model is set out to be entirely depending upon predicates set, Pr [Pr 1 , …,Pr p ].

    In its turn, these predicates are supposedly assigned to all NA attributes under consideration, A [A 1 ,…. Based on these requirements, this work is set to substantially utilized fragmentation procedure drawn in Abdalla, along with slightly-made modifications Fig. Relations under consideration are set to be defined and their predicates are bound to be identified.

    Individually, all most-constantly used queries that are observed to reach each relation would be kept and considered regardless of their type either retrieval or update. Using these matrices along with fragmentation cost model, data fragmentation is set to be activated. Based on Eq. As per Eq. Then, using Eq. After that, TFAM is used for attributes individually to compute the entire pay total of access costs and then sort all attributes according to their pays.

    While the first equation is set to be used to measure costs incurred as distributed retrieval queries are being processed, Eq. Transmission Costs in Total would therefore be accurately calculated with the use of Eq. The effects of drawn objective function, however, are conspicuously illustrated in the demonstrated-below discussion section. The presented algorithm of site clustering has precisely been designed based on the proved-to be-efficient concept of Least Difference Value LDV proposed in Sewisy et al.

    In this work, however, clustering algorithm behaves differently from that of Sewisy et al. Compared to proposed algorithm of this work, this threshold value-based algorithm seems in some cases to either minimize number of site clusters to inaccurate extent or maximize clusters to an excessive undesirable rang that in both cases bounds to adversely come at the expense of DDBS performance as shown in discussion section.

    Threshold-based algorithm is slightly shown to better behave than Abdalla, , though. Therefore, instead of using threshold algorithm to cluster sites in Sewisy et al. After that, to keep clustering the remaining sites, the least average of communication cost between sites would be used as metric to delicately pull each site into its relative cluster among those already initiated. Number of clusters, on the other hand, is subjected to nothing but behavior of algorithm. This algorithm is carefully drawn so that cluster of sites would be kept at minimum, though. Communication costs within and between clusters CCM thus are of key importance to be taken for data allocation and performance evaluation alike, particularly in non-replication scenario.

    Finally, as per Ceri et al. As a matter of fact, Table 2 is deliberately taken from Sewisy et al. This superiority has further been supported by results drawn in evaluation section. Scenario 1 Phase 1 ; over clusters, replication adopted. Data fragments are set to be individually assigned to all clusters of sites. Such procedure is believed to contribute overwhelmingly at decreasing TC and increasing data locality, chiefly as retrieval operations are outnumbered update operations.

    Each fragment would be placed to cluster of highest access cost. Such mathematically-based process is shown to have undeniable positive effects on DDBS performance, specifically as update operations are outnumbered retrieval operations Sewisy et al. In each cluster, fragments are lined up to be placed into sites of each cluster as follows; firstly, like Abdalla, , a threshold would be tacitly calculated based on Average of Update Cost AUC and Average of Retrieval Cost ARC of each fragment. On the other hand, as constraints violation being recorded, fragment would automatically be placed into site of the next highest AUC inside the same cluster.

    In brief, Eqs. In contrary to above-given equations, whilst flowing the same pattern of Eqs.

    New Era in Distributed Computing with Blockchains and Databases

    For Data Replication, data replication model, which is drawn in Sewisy et al. However, this model is slightly modified to have it skilfully complied with proposed work of this paper. Thus, an integer linear program ILP to represent this problem presented as follows;. For non-replicated scenario, Eq. Finally, Eqs. As mentioned earlier, while fragmentation is the first process, sites clustering have been done ahead of data allocation. Using CA attribute, specify its predicates to activate fragmentation process as follows:.

    Using Ps, the targeted relation is set to be fragmented as per fragmentation cost model. Select pair of sites of lowest cost Least Difference Value to initial first cluster C1 with selecting. Repeat steps 2—4 till all clusters formed and all sites have been involved in clustering process. If Query and fragment at different sits of the same owner cluster. If Query and fragment at different clusters. Procedure assign fragments into Clusters replication adopted.

    If Cc. Take Fi off Cc. Place Fi, Cc. Procedure Assign-fragments-into-Sites fragment, clusters of sites, sites. As per Abdalla, , data requirements of dataset can be explicitly provided by administrator of DDBSs or generated adopted in this implementation using a generator for a given attributes predicates and applications over network sites. It is worth indicating that to implement this work, the same environment, including Software and Hardware, in which Abdalla, was implemented, is sought to be deliberately created. Initial datasets Student Relation which is set to be collected before running the separately-done implementation, of this work, on all experiments of problems 1 and 3.

    These datasets can be either explicitly provided by DBA or generated adopted in this implementation using a generator program for the given metadata Table 8. Execution steps have partly been illustrated as follows all tables and pictures are taken from real implementation. Firstly, all requirements information needed are to be accurately recorded as given in Table 9. As mentioned earlier, for first four sites in both separately-done experiments, only the same frequencies of queries drawn in Abdalla, are intentionally used. Query Retrieval Matrix QRM , which gives how many time each retrieval query is to be running over each Predicate and access data.

    Query Update Matrix QUM , which gives how many time each update query is to be running over each Predicate and access data. Finally, it is worth assuring that fragments information like sizes and cardinalities are indispensable for data allocation and performance evaluation. This matrix is used to assign fragment to clusters based on maximum cost concept Eq. These matrices would be used to individually assign fragments into sites of clusters. This matrix is used to assign fragments to sites, in each cluster, based on maximum cost concept.

    Fragments distribution over sites in replication scenario. Fragments distribution over sites in no replication. This scenario however is newly proposed in this work, in the sense that it was not drawn in Abdalla, Fragments distribution over sites and clusters alike. On the other hand, for network of six sites, information needed would be recorded as shown in Table 24 below;. Like first experiment, only the same query frequencies of first four sites drawn in Abdalla, are purposefully taken.

    This matrix draws how many time each retrieval or update query is to be running over each Predicate and access data. Fragments distribution over sites and clusters. In light of above-addressed contributions of this meticulously-designed work, it can be concluded that this optimized work has come with remarkable progress comparing with Abdalla, on DDBS performance enhancement.

    This progress is technically supported with experimental results and performance evaluation of this section. Data locality promotion and transmission costs TC reduction are the main factors by which this work is critically measured. According to procedure by which this work is produced, it is believed that data would be as local as possible leading to maximal minimization on TC as consequences. On the other hand, to verify these claims, TC is clearly expressed by objective function of this work, and an internal and external evaluation are made for both works.

    Needless to say that performance is measured with respect to how much costs have been incurred inside network as distributed queries being processed. In brief, Table 34 shows that five problems each of which has its own experiments, parameters and variables are carefully addressed to evaluate both works, Abdalla, and the present optimized work of this paper. All Problem Addressed in this work. Each problem has been investigated through conducting three experiments within its own unique dataset cardinality, queries number and number of sites of network considering all data allocation scenarios.

    These scenarios are: Scenario 1 : Abdalla, with Present Work so that Replication Scenario, for both works, is imposed; Scenario 2 : Abdalla, with Present Work so that No-Replication Scenario for both works; and Scenario 3 : Abdalla, with replication scenario and Present Work with no replication scenario.

    All information concerning evaluation process is concisely displayed in Table Additionally, it is most important to indicate that original queries running against dataset for all problems are five queries. However, as per above-discussed methodology, each query released from each site would be treated as a different with different frequency. In other words, each query in each site is set to be processed independently of its replica at other sites.

    As a result, entire number of actual considered queries is bound to be significantly increased based on the rate in which queries are released over sites. For this demonstration, four experiments 1, 2, 7 and 8 have exclusively been illustrated in this paper as per section 7. All experiments including these separately-done experiments are made in self-explanatory frame as they seek to find which work is the best fitting for DDBS design. To begin, for first experiment of first problem, Figs.

    • Main Article Content!
    • Research load balancing technology of distributed database based on consistent hash!
    • science coursework on osmosis!
    • Obviously, present work outperforms Abdalla, in term of TC1 Fig. However, both works are observed to be close to each other regarding TC2 Fig. Nevertheless, for TC in total, these results substantially come in favor of present work Fig. Obviously, present work produces less communication costs when compared to Abdalla, All in all, for this scenario, present work is observed to contribute significantly at highly improving DDBS performance. It is worth repeating that performance is mathematically weighed-up by how much costs in bytes are being yielded as distributed query under processing.

      Problem 1; Experiment 1- Within four-site network, present work is being evaluated against Abdalla, in data replication scenario, as they are both exposed on TC1 Eq.