Home==>Advance Management of Data(AMD)
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Please briefly explain why you feel this question should be reported.
Please briefly explain why you feel this answer should be reported.
Please briefly explain why you feel this user should be reported.
Compare the focus of traditional relational systems with the focus of NOSQL systems?
Traditional relational systems • centralized data storage • structured data storage • data consistency / ACID compliance(a transaction must be atomic, consistent, isolated, and durable) • aggregation • powerful query languages NOSQL systems • distributed data storage • flexible data storage • schema-Read more
Traditional relational systems
• centralized data storage
• structured data storage
• data consistency / ACID compliance(a transaction must be atomic, consistent, isolated, and durable)
• aggregation
• powerful query languages
NOSQL systems
• distributed data storage
• flexible data storage
• schema-less data sets that include structured and semistructured data
• scalability
• high performance (at scale)
• availability through replication
See lessDiscuss the different techniques for executing an equijoin of two files located at different sites. What main factors affect the cost of data transfer?
Technique 1: Transfer second table (from 2nd site) to 2nd table (first site) and join them, then transfer the result of join into site 3 Technique 2: Transfer 1st table (from 1st site) to the 2nd table (second site) and join them, then transfer the result of join into site 3 Main factors: We have toRead more
Technique 1: Transfer second table (from 2nd site) to 2nd table (first site) and join them, then transfer the result of join into site 3
Technique 2: Transfer 1st table (from 1st site) to the 2nd table (second site) and join them, then transfer the result of join into site 3
Main factors: We have to choose the site and table which contains less data in order to minimize the data transfer cost
See lessHow are joins realized in most NOSQL systems?
Many NOSQL systems do not provide join operations as part of the query language itself ➔ The Joins need to be implemented in the application programs
Many NOSQL systems do not provide join operations as part of the query language itself
➔ The Joins need to be implemented in the application programs
See lessWhat are advantages of Distributes Databases? Give at least three examples and explain them briefly!
• Better representation of organizational structures • Improved shareability and local autonomy • increased availability and reliability (due to replication) • improved performance • Economics - it may cost less to create a network of smaller computers with the power of a single large computer 1. ReRead more
• Better representation of organizational structures
• Improved shareability and local autonomy • increased availability and reliability (due to replication)
• improved performance
• Economics – it may cost less to create a network of smaller computers with the power of a single large computer
1. Reflects organizational structure many organizations are naturally distributed over several locations.
2. Improved shareability and local autonomy The geographical distribution of an organization can be reflected in the distribution of the data; users at one site can access data stored at other sites. Data can be placed at the site close to the users who normally use that data.
3. Improved availability In a centralized DBMS, a computer failure terminates the operations of the DBMS. However, a failure at one site of a DDBMS or a failure of a communication link making some sites inaccessible does not make the entire system inoperable. Distributed DBMSs are designed to continue to function despite such failures.
4. Improved reliability because data may be replicated so that it exists at more than one site, the failure of a node or a communication link does not necessarily make the data inaccessible.
5. Improved performance as the data is located near the site of “greatest demand,” and given the inherent parallelism of distributed DBMSs, speed of database access may be better than that achievable from a remote centralized database.
See lessWhat is consistent Hashing?
Consistent hashing • Special kind of hashing, which minimizes the number of keys that have to be remapped when the size of a hash table is changed • Assumes that the result of the hash function h(key) is an integer value, usually in the range 0 to Hmax = 2n-1, where n is chosen based on the desiredRead more
Consistent hashing
• Special kind of hashing, which minimizes the number of keys that have to be remapped when the size of a hash table is changed
• Assumes that the result of the hash function h(key) is an integer value, usually in the range 0 to Hmax = 2n-1, where n is chosen based on the desired range for the hash values
See lessDifferent Categories of NOSQL Systems?
1. Document stores: •Document-based NOSQL systems store data as collections of similar documents. • Resemble complex objects • Do not require to specify a schema, but are specified as self-describing data • Each document can have different data elements (attributes) • Can be specified in various forRead more
1. Document stores:
•Document-based NOSQL systems store data as collections of similar documents.
• Resemble complex objects
• Do not require to specify a schema, but are specified as self-describing data
• Each document can have different data elements (attributes)
• Can be specified in various formats, such as XML or JSON (JavaScript Object Notation)
• Are accessible via their document id
Examples: Mongo DB
2. Key-value stores:
• Every data item (value) must be associated with a unique key
• Retrieving the value by supplying the key must be very fast
• Value-
1. Can have very different formats for different key-value storage systems:
2. String / array of bytes: the application using the key-value store has to interpret the structure of the data value
3. Structured data rows (tuples) similar to relational data
4. Semi structured data using a self-describing data format
Example: Amazon Dynamo DB, Project Voldemort
3. Wide column stores:
A wide column store is a type of key-value database. It uses tables, rows, and columns, but unlike a relational database, the names and format of the columns can vary from row to row in the same table.
Working Principle-
• Vertical partitioning – tables are partitioned by column into column families
• Each column family is stored in its own files
• Versioning of data values is allowed
• The key is multidimensional (in contrast to key-value stores)
Examples: Google distributed storage system (BigTable), Apache Hbase
4. Graph-Databases:
Data is organized as a graph, which is a collection of nodes, relationships, and properties.
Node-
• Can contain properties
• Can contain labels, which groups nodes with the same label into subsets for querying purposes
Relationship-
• Is directed, each relationship has a start node and an end node
• Can contain properties
• Has a relationship type, which helps to identify similar relationship types for querying purposes
Properties-
Store the data items associated with nodes and relationships as list of key-value pairs
Example: Neo4j
5. Hybrid Systems:
Hybrid SQL‐NoSQL database solutions combine the advantage of being compatible with many SQL applications and providing the scalability of NoSQL ones.
Example: Xeround
See lesswhat is Eventual Consistency?
If no new updates are made to a given data item eventual consistency informally guarantees that eventually all accesses to that item will return the last updated value.
If no new updates are made to a given data item eventual consistency informally guarantees that eventually all accesses to that item will return the last updated value.
See lessCAP Theorem
In a distributed system with data replication only two of the following properties can be guaranteed at the same time: • Consistency: the nodes will have the same copies of a replicated data item visible for various transactions • Availability: each read or write request for a data item will eitherRead more
In a distributed system with data replication only two of the following properties can be guaranteed at the same time:
• Consistency: the nodes will have the same copies of a replicated data item visible for various transactions
• Availability: each read or write request for a data item will either be processed successfully or will receive a message that the operation cannot be completed.
• Partition Tolerance: the system can continue to operate if the network connecting the nodes has a fault that results in two or more partitions, where the nodes in each partition can only communicate among each other
See lessHow is a vertical partitioning of a relation specified? How can a relation be put back together from a complete vertical partitioning?
Vertical partitioning involves creating tables with fewer columns and using additional tables to store the remaining columns. Normalization also involves this splitting of columns across tables, but vertical partitioning goes beyond that and partitions columns even when already normalized. The primaRead more
Vertical partitioning involves creating tables with fewer columns and using additional tables to store the remaining columns. Normalization also involves this splitting of columns across tables, but vertical partitioning goes beyond that and partitions columns even when already normalized.
See lessThe primary key is duplicated to allow the original table to be reconstructed. Using join operation to reconstruct them.
How is a horizontal partitioning of a relation specified? How can a relation be put back together from a complete horizontal partitioning?
Horizontal partitioning divides a table into multiple tables. Each table then contains the same number of columns, but fewer rows. For example, a table that contains 1 billion rows could be partitioned horizontally into 12 tables, with each smaller table representing one month of data for a specificRead more
Horizontal partitioning divides a table into multiple tables. Each table then contains the same number of columns, but fewer rows. For example, a table that contains 1 billion rows could be partitioned horizontally into 12 tables, with each smaller table representing one month of data for a specific year.
The primary key is duplicated to allow the original table to be reconstructed. Using union operation to reconstruct them
See lessWhat are Nest, Unnest, and intersection join operations?
Nest: Creates a set of values from one or more attributes if the values of the remaining attributes are identical. Unnest: Inverse operation Intersection Join: If there is a nonempty intersection set between qualifying attributes, tuples are associated.
Nest:
Creates a set of values from one or more attributes if the values of the remaining attributes are identical.
Unnest:
Inverse operation
Intersection Join:
If there is a nonempty intersection set between qualifying attributes, tuples are associated.
See lesswhat is non first normal form?
The First Normal Form (1NF) is the fundamental requirement for the relational model and states that values of attributes have to be atomic. Simultaneously, 1NF is responsible for several drawbacks of the relational model ➔ an extended data model should not require 1NF. Objects of the Non First NormaRead more
The First Normal Form (1NF) is the fundamental requirement for the relational model and states that values of attributes have to be atomic.
Simultaneously, 1NF is responsible for several drawbacks of the relational model
➔ an extended data model should not require 1NF.
Objects of the Non First Normal Form are defined as follows:
• Every atomic value (such as integer, float, string) is an object.
• If a1, a2, … , an are distinct attribute names and o1, o2, … , on are objects, then (a1:o1, a2:o2, … , an:on) is a tuple object.
• If o1, o2, … , on are objects, then {o1, o2, … , on} is a set object.
Each tuple of a NF2 relation can contain such objects
See lessDiscuss why the weaknesses of the relational data model and relational DBMS may make them unsuitable for advanced database applications?
Poor representation of real-world entities - The process of deriving relations from a conceptual database often leads to the creation of relations that do not directly correspond to modeled entity types. Cause • complex entities consists of other entities • entities are described by multi-valued attRead more
The process of deriving relations from a conceptual database often leads to the creation of relations that do not directly correspond to modeled entity types. Cause • complex entities consists of other entities • entities are described by multi-valued attributes Result • fragmentation of a entity type into many relations • is inefficient and leads to many joins during query processing 2. Semantic Overloading The relational model uses only one construct for representing data and relationships between data: the relation. There is no mechanism to distinguish between entities and relationships, or to distinguish between different kinds of relationship that exist between entities. Often, attributes of different entities are mixed within one relation. Considering only the given relation, we cannot know, which attributes belong to a certain entity or relationship. ➔ the relational model is said to be semantically overloaded 3. Poor support for integrity and general constraints Integrity refers to the validity and consistency of stored data. Integrity is usually expressed in terms of constraints, which are consistency rules that the database is not permitted to violate. Many commercial systems do not fully support these constraints and it is necessary to build them into the applications. This is dangerous and can lead to duplication of effort and inconsistencies. There is no support for general constraints in the relational model, which again means that they have to be built into the DBMS or the application. 4. Homogeneous data structure The relational model assumes both horizontal and vertical homogeneity. • Horizontal homogeneity means that each tuple of a relation must be composed of the same attributes • Vertical homogeneity means that the values in a particular column of a relation must all come from the same domain The first normal form requires that the intersection of a row and column must be an atomic value. Drawbacks of Homogeneity • too restrictive for many (more complex) “real-world” objects • leads to unnatural and inefficient joins. 5. Limited Operations The relational model has only a fixed set of operations, such as set and tuple-oriented operations, operations that are provided in the SQL specification. Since (at least older versions of) SQL is not computationally complete, new operations cannot be specified. This is too restrictive to model the behavior of many real-world objects. 6. Difficulty handling recursive queries Atomicity of data means that repeating groups are not allowed in the relational model. As a result, it is extremely difficult to handle recursive queries, that is, queries about relationships that a relation has with itself (directly or indirectly). 7. Impedance Mismatch Older versions of SQL lack computational completeness and could not be used to develop more complex database applications. To overcome this problem, SQL can be embedded in a procedural programming language. However, this approach produces an impedance mismatch, because different programming paradigms are mixed: • SQL is a declarative language that handles many rows of data at a time • common high-level languages (such as C) are procedural languages that can handle only one row of data at a time Since the type systems are different, type checking is prevented and necessary conversions require a lot of programming effort and runtime resources.
See lessDiscuss the general characteristics of advanced database applications
large number of data types Less portable example: CAD, CAM, OIS Real-time access and large number of users, For example: CAD, Network Management System Large volume of data, For example: GIS, CAM
large number of data types
Less portable example: CAD, CAM, OIS
Real-time access and large number of users, For example: CAD, Network Management System
Large volume of data, For example: GIS, CAM
See lessAdvantage and Disadvantage of Trigger
Advantages: Elimination of redundant code Simplifying modifications Increased security Improved integrity Improved processing power Good fit with the client-server architecture Disadvantage: Performance Overhead Complexity Hidden functionality Cascading effects Cannot be scheduled
Advantages:
Disadvantage:
what are Triggers?
A trigger is an SQL statement that is executed automatically by the DBMS as a side effect of a modification - the triggering event - to a named table. The act of executing a trigger is sometimes known as firing the trigger. Some Use Cases : • validating input data and maintaining complex integrity conRead more
A trigger is an SQL statement that is executed automatically by the DBMS as a side effect of a modification – the triggering event – to a named table.
The act of executing a trigger is sometimes known as firing the trigger.
Some Use Cases :
• validating input data and maintaining complex integrity constraints.
• supporting alerts that action needs to be taken when a table is updated in some way.
• refreshing derived attributes after an update operation.
• maintaining audit information, by recording the changes made, and by whom.
• supporting replication.
The basic format of the CREATE TRIGGER statement is as follows:
CREATE TRIGGER TriggerName
BEFORE | AFTER | INSTEAD OF <triggerEvent> ON <TableName>
[REFERENCING <oldOrNewValuesAliasList>]
[FOR EACH {ROW | STATEMENT}]
[WHEN (triggerCondition)]
<triggerBody>
• Triggering events include insertion, deletion, and update of rows in a table. In the latter case only, a triggering event can also be set to cover specific named columns of a table.
• A trigger has an associated timing of either BEFORE, AFTER, or INSTEAD OF.
• A BEFORE trigger is fired before the associated event occurs
• An AFTER trigger is fired after the associated event occurs
• An INSTEAD OF trigger is fired in place of the trigger event.
Three different types of Triggers
Row-Level Triggers (FOR EACH ROW) • execute for each row of the table that is affected by the triggering event
See lessStatement-Level Triggers (FOR EACH STATEMENT) • execute only once even if multiple rows are affected by the triggering event.
Cascading Triggers .Triggers can also activate themselves one after the other.
This can happen when the trigger action makes a change to the database that has the effect of causing another event that has a trigger associated with it to fire.
How is a procedure different from a function?
Both can modify and return data passed to them as a parameter, but a function can only return a single value to the caller.
Both can modify and return data passed to them as a parameter, but a function can only return a single value to the caller.
See lessWhat are cursors, exceptions, stored procedures
Cursor- A SELECT statement can be used if the query returns exactly one row. To handle a query that can return an arbitrary number of rows, PL/SQL uses cursors to allow the rows of a query result to be accessed one at a time. The cursor can be advanced by one to access the next row. A cursor must beRead more
Cursor- A SELECT statement can be used if the query returns exactly one row. To handle a query that can return an arbitrary number of rows, PL/SQL uses cursors to allow the rows of a query result to be accessed one at a time. The cursor can be advanced by one to access the next row.
A cursor must be
• declared and opened before it can be used
• closed to deactivate it after it is no longer required
Once the cursor has been opened, the rows of the query result can be retrieved one at a time using a FETCH statement.
Exceptions:
An exception is an identifier in PL/SQL raised during the execution of a block that terminates its main body of actions, although some final actions can be performed.
An exception can be raised automatically or explicitly using the RAISE statement
To handle raised exceptions, separate routines called exception handlers are specified.
The exception handler itself is defined at the end of the PL/SQL block.
Stored Procedures:
Sub program in PL/SQL that can take parameter and be invoked. They can modify and return data passed to them as a parameter.
A stored procedure is a subroutine available to applications that access a relational database management system (RDBMS). Such procedures are stored in the database data dictionary.
Typical uses for stored procedures include data-validation (integrated into the database) or access-control mechanisms.
See lessWhat led to the introduction of PL/SQL and SQL/PSM?
Initial versions of SQL have been computationally incomplete (without programming constructs). Later versions of SQL could be embedded in a high-level programming language, but produced an impedance mismatch, because of the mixing of different programming paradigms. SQL/PSM: SQL has been extendedRead more
Initial versions of SQL have been computationally incomplete (without programming constructs). Later versions of SQL could be embedded in a high-level programming language, but produced an impedance mismatch, because of the mixing of different programming paradigms.
SQL/PSM:
SQL has been extended to a full programming language
The extensions are known as SQL/PSM (Persistent Stored Modules)
PL/SQL: PL/SQL (Procedural Language SQL) is the Oracle version of an SQL programming language and has concepts similar to modern programming languages, such as variable and constant declarations, control structures, exception handling, and modularization.
See lessConcepts of the (Extended) Entity Relationship Model
The Extended Entity-Relationship Model is a more complex and high-level model that extends an E-R diagram to include more types of abstraction, and to more clearly express constraints. All of the concepts contained within an E-R diagram are included in the E-R model, along with additional concepts tRead more
The Extended Entity-Relationship Model is a more complex and high-level model that extends an E-R diagram to include more types of abstraction, and to more clearly express constraints. All of the concepts contained within an E-R diagram are included in the E-R model, along with additional concepts that cover more semantic information.
These additional concepts include:- generalization/specialization, union, inheritance, and subclass/super class.
Relation: A relation is a table with columns and rows
Attribute: An attribute is a named column of a relation.
Domain: A domain is the set of allowable values for one or more attributes.
Primary key: the candidate key that is selected to identify tuples uniquely within the relation.
Foreign key: an attribute, or set of attributes, within one relation that matches the candidate key of some (possibly the same) relation.
Integrity constraints:
Domain integrity: Describes the restriction on the set of allowed values for an attribute.
Entity Integrity: No attribute of a primary key can be null.
Referential Integrity: If a foreign key exists in a relation, either the foreign key value must match a candidate key value of some tuple in its home relation or the foreign key value must be wholly null.
Referential integrity is a relational database concept, which states that table relationships must always be consistent. In other words, any foreign key field must agree with the primary key that is referenced by the foreign key.
Views: The dynamic result of one or more relational operations operating on the base relations to produce another relation. A view is a virtual relation that does not necessarily exist in the Database but can be produced upon request by a particular user, at the time of request.
Null represents a value for an attribute that is currently unknown or is not applicable for this tuple.
See lesswhat is Data Independence ?
Data Independence A major objective for the three-level architecture is to provide data independence, which means that upper levels are unaffected by changes to lower levels. There are two kinds of data independence: Logical Data Independence: The immunity of the external schema to changes in theRead more
Data Independence
A major objective for the three-level architecture is to provide data independence, which means that upper levels are unaffected by changes to lower levels.
There are two kinds of data independence:
Logical Data Independence: The immunity of the external schema to changes in the conceptual schema
Physical Data Independence: The immunity of the conceptual schema to changes in the internal schema
See lesswhat is Three-Level Architecture?
Three-Level Architecture:- External Level: The users’ view of the database describes that part of the database that is relevant to each user Conceptual Level: The logical view of the database describes what data is stored in the database and the relationships among the data. Internal Level: ThRead more
Three-Level Architecture:-
External Level: The users’ view of the database describes that part of the database that is relevant to each user
Conceptual Level: The logical view of the database describes what data is stored in the database and the relationships among the data.
Internal Level: The physical representation of the database on the computer describes how the data is stored in the database
See less