Issues caused by distributed data storage and processing
Issue 1: Distributed query processing
Query on distributed database:
Some of the
relational algebra operations
can be performed locally.
E.g.:
Selection
Projection (bag)
However:
Some of the
relational algebra operations
must use relations stored
at multiple sites.
Specifically:
The join operation R ⋈ S
will require
one of the relation to
be transfer to the
storage location of the
other relation before
the join operation
R ⋈ S can be
performed
Example:
Issue:
How to execute
join operations to
minimize the
response (processing) time ???
More on distributed query processing
Performance differences between
a distributed database and
a shared-nothing parallel system:
Distributed database systems
has higher communication cost !!!!
Shared-nothingparallel system:
Storage/processing nodes are
located in the
same room
Interconnection network
(a LAN)
used has Gbpstransmission rate
Distributed database:
Storage/processing nodes can be
located in
different states or
even countries
Interconnection network
used has Mbpstransmission rate
When distributed databases
were first developed, the
network speed was
onlyKbps !!!
Consequently:
Dominantcost in
Distributed databases:
Communication !!!
Dominantcost in
shared-nothingparallel system:
Disk I/O
Issue 2: Distributed transaction processing
Recall:transaction
A transaction must
incorporate all updates or
none of the updates
(= atomic)
Transaction processing in a
distributed system:
Multiple processors (sites)
contain updates to the
database:
Ensure the
atomic property:
All the
updates at
all sites are
incorporated into the
databaseor
None of the
updates at
any site is
incorporated into the
database
We will study:
Distributedcommit protocols
to ensure the
atomic commitproperty
Two-phasecommit
Three-phasecommit
Issue 3: Ensuring data consistency (serializability)
Recall:
Concurrent execution of
transactions can cause the
database state to
become inconsistent
We must alsoensure that the
schedule (of database operations)
is serializable
We have studied locking
to ensureconflict serializability
Locking and
data replication:
When the data is
replicated:
Then it is possible that
multiple transactions
hold a conflicting lock on the
same data item:
In the case of
replicated data, we must:
modify the
locking method
to handle locks on
different (physical) copies of
the same (logical) data item