Some nodes (processors)
will be in a
different state than
other processors
Goal of a
receovery operation:
Ensures that
all nodes (processors)
are in the same state:
All nodes will
commit the
transactionor
All nodes will
abort the
transaction
The state of the distributed transaction
State of a
distributed transaction:
(Global) state of the
transaction =
the state of the
distributed transaction as
agreed on by
all nodes (processors)
The state of a
distributed transaction is
one of the following:
Committed
Aborted
Prelude to the recovery protocol
Fact:
Sometimes,
a node (processor) will
have sufficient informationlocally
to
determine the
(global) state of the
transaction
In this case:
The node can
take action to
redo/undo the
transactionimmediately
Other times,
a node (processor) will
have
insufficient informationlocally to
determine the
(global) state of the
transaction
In this case:
The nodemustcontactothernode(s) to
find out the
state of the
transaction !!!
The recovery protocol for
2-phase committed distributed transaction
Fact:
The
state
of the
distributed transaction
is recorded in the
(local)
log of
a node
(= coordinator or
participant)
Last log record
for the transaction in a
node
(= coordinator or
participant) was
Commit T:
The state (global decision) of
transaction Tmust have been
committed:
If the node was a
coordinator, the
crash must have
happened after this point in
time:
If the node was the
participant, the
crash must have
happened after this point in
time:
In all cases, the
global decision was
commit.
Recovery procedure:
"Redo the
transaction T"
Note:
If we use
undo logging, we don't have to
do anything to
"redo a
transaction" !!!
Last log record
for the transaction in a
node
(= coordinator or
participant) was
Abort T:
The state (global decision) of
transaction Tmust have been
aborted:
If the node was a
coordinator, the
crash must have
happened after this point in
time:
If the node was the
participant, the
crash must have
happened after this point in
time:
In all cases, the
global decision was
abort.
Recovery procedure:
"Undo the
transaction T"
Note:
If we use
redo logging, we don't have to
do anything to
"undo a
transaction" !!!
Last log record
for the transaction in a
node
(= coordinator or
participant) was
Don't commit T:
The state (global decision) of
transaction Tmust have been
aborted:
The crash must have
happened after this point in
time
(during phase 1):
The
global state that will be
reached must have
been: abort:
Recovery procedure:
"Undo the
transaction T"
Note:
If we use
redo logging, we don't have to
do anything to
"undo a
transaction" !!!
(Hard case)
Last log record
for the transaction in a
node
(= coordinator or
participant) was
Ready T:
The state (global decision) of
transaction Tcannot be
determined, because:
The crash must have
happened after this point in
time
(during phase 1):
The
global state that can be
reached can
be: commit:
Butalso:
abort:
Recovery procedure:
if ( the coordinator is operational )
{
Ask the coordinator for the global decision;
If ( decision == commit )
{
"Redo" transaction T;
}
else
{
"Undo" transaction T;
}
}
else
{
Done = false;
for ( each participant P other than myself ) do
{
if ( Done )
{
exit;
}
if ( the P is operational )
{
Ask P for its local decision;
If ( decision == commit )
{
"Redo" transaction T;
Done = true; // Final !
}
else if ( decision == abort )
{
"Undo" transaction T;
Done = true; // Final
}
else // P's local state was at best "Ready T"
{
keep searching...
}
}
}
if ( ! Done )
{
Wait for the coordinator to come back on line....
}
}
Finally:
a node/processor has
no record of the
transaction T
The crash must have happened
before this
point in time:
Fact:
It's always safe to
abort an
incomplete (uncomitted) transactionunilaterally
(because the updates has
not yet been made
permanent -- the
undo logs can
clean up an
incomplete (uncomitted) transaction
Recovery procedure:
Enter "Abort T" into
the local log
We need to do this to
reply to
queries on the
state of
transaction T