Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Please briefly explain why you feel this question should be reported.
Please briefly explain why you feel this answer should be reported.
Please briefly explain why you feel this user should be reported.
Discuss the different techniques for executing an equijoin of two files located at different sites. What main factors affect the cost of data transfer?
Technique 1: Transfer second table (from 2nd site) to 2nd table (first site) and join them, then transfer the result of join into site 3 Technique 2: Transfer 1st table (from 1st site) to the 2nd table (second site) and join them, then transfer the result of join into site 3 Main factors: We have toRead more
Technique 1: Transfer second table (from 2nd site) to 2nd table (first site) and join them, then transfer the result of join into site 3
Technique 2: Transfer 1st table (from 1st site) to the 2nd table (second site) and join them, then transfer the result of join into site 3
Main factors: We have to choose the site and table which contains less data in order to minimize the data transfer cost
See lessHow are joins realized in most NOSQL systems?
Many NOSQL systems do not provide join operations as part of the query language itself ➔ The Joins need to be implemented in the application programs
Many NOSQL systems do not provide join operations as part of the query language itself
➔ The Joins need to be implemented in the application programs
See lessWhat are advantages of Distributes Databases? Give at least three examples and explain them briefly!
• Better representation of organizational structures • Improved shareability and local autonomy • increased availability and reliability (due to replication) • improved performance • Economics - it may cost less to create a network of smaller computers with the power of a single large computer 1. ReRead more
• Better representation of organizational structures
• Improved shareability and local autonomy • increased availability and reliability (due to replication)
• improved performance
• Economics – it may cost less to create a network of smaller computers with the power of a single large computer
1. Reflects organizational structure many organizations are naturally distributed over several locations.
2. Improved shareability and local autonomy The geographical distribution of an organization can be reflected in the distribution of the data; users at one site can access data stored at other sites. Data can be placed at the site close to the users who normally use that data.
3. Improved availability In a centralized DBMS, a computer failure terminates the operations of the DBMS. However, a failure at one site of a DDBMS or a failure of a communication link making some sites inaccessible does not make the entire system inoperable. Distributed DBMSs are designed to continue to function despite such failures.
4. Improved reliability because data may be replicated so that it exists at more than one site, the failure of a node or a communication link does not necessarily make the data inaccessible.
5. Improved performance as the data is located near the site of “greatest demand,” and given the inherent parallelism of distributed DBMSs, speed of database access may be better than that achievable from a remote centralized database.
See lessWhat is consistent Hashing?
Consistent hashing • Special kind of hashing, which minimizes the number of keys that have to be remapped when the size of a hash table is changed • Assumes that the result of the hash function h(key) is an integer value, usually in the range 0 to Hmax = 2n-1, where n is chosen based on the desiredRead more
Consistent hashing
• Special kind of hashing, which minimizes the number of keys that have to be remapped when the size of a hash table is changed
• Assumes that the result of the hash function h(key) is an integer value, usually in the range 0 to Hmax = 2n-1, where n is chosen based on the desired range for the hash values
See lessDifferent Categories of NOSQL Systems?
1. Document stores: •Document-based NOSQL systems store data as collections of similar documents. • Resemble complex objects • Do not require to specify a schema, but are specified as self-describing data • Each document can have different data elements (attributes) • Can be specified in various forRead more
1. Document stores:
•Document-based NOSQL systems store data as collections of similar documents.
• Resemble complex objects
• Do not require to specify a schema, but are specified as self-describing data
• Each document can have different data elements (attributes)
• Can be specified in various formats, such as XML or JSON (JavaScript Object Notation)
• Are accessible via their document id
Examples: Mongo DB
2. Key-value stores:
• Every data item (value) must be associated with a unique key
• Retrieving the value by supplying the key must be very fast
• Value-
1. Can have very different formats for different key-value storage systems:
2. String / array of bytes: the application using the key-value store has to interpret the structure of the data value
3. Structured data rows (tuples) similar to relational data
4. Semi structured data using a self-describing data format
Example: Amazon Dynamo DB, Project Voldemort
3. Wide column stores:
A wide column store is a type of key-value database. It uses tables, rows, and columns, but unlike a relational database, the names and format of the columns can vary from row to row in the same table.
Working Principle-
• Vertical partitioning – tables are partitioned by column into column families
• Each column family is stored in its own files
• Versioning of data values is allowed
• The key is multidimensional (in contrast to key-value stores)
Examples: Google distributed storage system (BigTable), Apache Hbase
4. Graph-Databases:
Data is organized as a graph, which is a collection of nodes, relationships, and properties.
Node-
• Can contain properties
• Can contain labels, which groups nodes with the same label into subsets for querying purposes
Relationship-
• Is directed, each relationship has a start node and an end node
• Can contain properties
• Has a relationship type, which helps to identify similar relationship types for querying purposes
Properties-
Store the data items associated with nodes and relationships as list of key-value pairs
Example: Neo4j
5. Hybrid Systems:
Hybrid SQL‐NoSQL database solutions combine the advantage of being compatible with many SQL applications and providing the scalability of NoSQL ones.
Example: Xeround
See lesswhat is Eventual Consistency?
If no new updates are made to a given data item eventual consistency informally guarantees that eventually all accesses to that item will return the last updated value.
If no new updates are made to a given data item eventual consistency informally guarantees that eventually all accesses to that item will return the last updated value.
See lessCAP Theorem
In a distributed system with data replication only two of the following properties can be guaranteed at the same time: • Consistency: the nodes will have the same copies of a replicated data item visible for various transactions • Availability: each read or write request for a data item will eitherRead more
In a distributed system with data replication only two of the following properties can be guaranteed at the same time:
• Consistency: the nodes will have the same copies of a replicated data item visible for various transactions
• Availability: each read or write request for a data item will either be processed successfully or will receive a message that the operation cannot be completed.
• Partition Tolerance: the system can continue to operate if the network connecting the nodes has a fault that results in two or more partitions, where the nodes in each partition can only communicate among each other
See lessHow is a vertical partitioning of a relation specified? How can a relation be put back together from a complete vertical partitioning?
Vertical partitioning involves creating tables with fewer columns and using additional tables to store the remaining columns. Normalization also involves this splitting of columns across tables, but vertical partitioning goes beyond that and partitions columns even when already normalized. The primaRead more
Vertical partitioning involves creating tables with fewer columns and using additional tables to store the remaining columns. Normalization also involves this splitting of columns across tables, but vertical partitioning goes beyond that and partitions columns even when already normalized.
See lessThe primary key is duplicated to allow the original table to be reconstructed. Using join operation to reconstruct them.
How is a horizontal partitioning of a relation specified? How can a relation be put back together from a complete horizontal partitioning?
Horizontal partitioning divides a table into multiple tables. Each table then contains the same number of columns, but fewer rows. For example, a table that contains 1 billion rows could be partitioned horizontally into 12 tables, with each smaller table representing one month of data for a specificRead more
Horizontal partitioning divides a table into multiple tables. Each table then contains the same number of columns, but fewer rows. For example, a table that contains 1 billion rows could be partitioned horizontally into 12 tables, with each smaller table representing one month of data for a specific year.
The primary key is duplicated to allow the original table to be reconstructed. Using union operation to reconstruct them
See lessWhat are Nest, Unnest, and intersection join operations?
Nest: Creates a set of values from one or more attributes if the values of the remaining attributes are identical. Unnest: Inverse operation Intersection Join: If there is a nonempty intersection set between qualifying attributes, tuples are associated.
Nest:
Creates a set of values from one or more attributes if the values of the remaining attributes are identical.
Unnest:
Inverse operation
Intersection Join:
If there is a nonempty intersection set between qualifying attributes, tuples are associated.
See less