CS 494/594 -- Distributed Systems --- Fall 2014

The final will be held in 525 Min Kao at 8:00-10:00am on Tuesday, Dec 9

The description of the first part of the semester project is available here. All parts are due Dec 2.

Instructor: Micah Beck
Office: Min Kao 433
Office Hours: Wed 2-3pm and by appt.
Email: mbeck@eecs.utk.edu

TA: Alok Hota

Textbook

Course Syllabus

The course material will generally follow the contents of the book:
  1. Booleans, Predicates and Quantification
  2. The Computational Model
  3. Reasoning About Programs
  4. Small Example Programs
  5. Time, Clocks, and Syncrhonization
    • What are the necessary properties of a Logical Clock?
    • How can we implement Logical Clocks in a message passing environment?
    • What is the additional property ensured by Vector Clocks?
    • How are vector clocks implemented?
    • How are physical clocks synchronized?
  6. Diffusing Computations (Gossip)
    • What is the definition of the diffusion communication problem.
    • What is the source of ambiguity in the solution?
    • What is the metric used in the proof of termination?
  7. Mutual Exclusion
    • Explain the need for mutual exclusion. Use examples.
    • What are the strengths and weaknesses of the centralized manager solution?
    • Explain Lamport's distributed solution to the mutual exclusion problem.
    • What is a token? How can a token ring be used to implement mutual exclusion?
    • Explain how a token solution can be used on a tree or arbitrary graph.
  8. Dining Philosophers
    • Prove that a completely symmetric solution to the Dining Philosophers problem does not exist.
    • What is deadlock and how can it occur in the Dining Philosophers problem?
    • Give a solution to the Dinish Philosophers problem that works in synchronous rounds and in which the N (for N even) philosophers are given unique ids which are integers assigned clockwise starting with 1. Assume each processor knows its own id.
    • In the Hygienic solution, what is the partial order that is maintained to determine which philosopher can eat next? How is it modified while ensuring that no cycle is introduced.
  9. Snapshots
    • What is a global state of a distributed system?
    • What is a consistent cut or snapshot? What is special about a consistent cut as opposed to an arbitrary global state taken among unsynchronized processors?
    • How can we use logical clocks to take a consistent snapshot?
    • Explain the marker algorithm for taking a consistent snapshot.
  10. Termination Detection
    • What are the conditions required to detect termination of an algorithm in a distributed system?
    • Explain the algorithm for detecting termination using a special "detector" process.
    • What is special about termination that enables this algorithm to work?
  11. Garbage Collection
    • What is the correctness condition for garbage collection, expressed in terms of food, garbage and manure? Define these terms.
    • What is the role of the mutator in garbage collection? What is the necessary restriction on the functioning of the mutator?
    • What is the role of the propagator in garbage collection?
    • What is the ok condition? Use it to express the termination condition for garbage collection.
  12. Byzantine Agreement
    • Give a statement of the Byzantine generals problem.
    • Explain the importance of the validity condition to any specification of the Byzantine Agreement Problem.
    • Explain why the problem is not solvable in an environment where messages between the two generals can be lost.
    • What is the limit on faulty behavior that enables the Byzantine problem to be solved in a fail-stop environment?

Reading Assignments

DateAssignment
Aug 26Sivilotti Chapters 1-3
Sept 2Sivilotti Chapter 4
Sept 4Sivilotti Chapter 5
Sept 10Sivilotti Chapter 6
Sept 16Sivilotti Chapter 7
Oct 7Sivilotti Chapter 8
Oct 21Sivilotti Chapter 9
Oct 28Sivilotti Chapter 10
Oct 30Sivilotti Chapter 11
Nov 4Sivilotti Chapter 12

Homework Assignments

  1. (due 9/9) Fill in blanks and do all proofs left to reader in Chapter 3.
  2. (due 9/26) (Solutions)
    1. Complete blanks and proofs in Chapters 5 and 6.
    2. Consider the partial order imposed on students by grades each receives. We say that the grade for student X is "greater or equal" to the grade for student Y (X ≤ Y) if X has recieved a grade no worse than Y in every course. Show that this is a partial order, assuming that all students take exactly the same courses. Is there any total order on all students that extends the partial order? Explain why or why not.
    3. Suppose that a processor used vector time in which each vector element had two components, an "epoch" number and the usual timestamp. A processor that recieved a message with a later "epoch" would update its own epoch to match it, but it would ignore the timestamps on messages it received from an earlier "epoch". A processor could update the epoch in its own timestamp at will. Aside from these rules concerning epochs, the timestamp would be updated and transmitted as usual. What property that such a modifed vector clock guarantee? Can you see an application for such a modified vector timestamp?
    4. Give an algorithm that uses a gossip protocol to compute each of the following:
      1. if an integer value is associated with each node, the maximum of those values.
      2. if an integer value is associated with each node, the sum of those values.
  3. (due 11/6)
    1. Complete blanks and proofs in Chapters 7 and 8.
    2. What is the scenario in which the naive solution to the Dining Philosopher's problem leads to deadlock?
    3. How does the hygienic solution maintain the property that there are no cycles in the prioritization of philosophers?
    4. If a consistent cut through a distributed system may not have ever been a true state of the system, why is it considered to be an acceptable state to store and role back to in case of error?
    5. Explain why a snapshot taken at a pre-determined logical time will always be consistent.

Midterm Practice Questions

  1. Prove that this program terminates and that at termination N = M*d+r ^ r < M where N and M are initial values.

        Program DIV
        var d, r: int

        initially d = 0, r = N
        assign
          (r ≥ M) → d, r := d+1, r-M

  2. How is the value of a vector clock computed? What is the difference between the meaning of an ordinary logical (Lamport) clock and a vector clock?
  3. In the diffusion algorithm every node is in one of three states, idle, active or complete. Explain the difference between the active and complete states.
  4. Explain Lamport's algorithm for mutual exclusion. A formal proof of correctness is not required, but a clear explanation of the algorithm's functioning is.

Final Practice Questions

  1. One way to generate a snapshot of a distributed system is to for every processor to maintain a logical clock, for a logical time T to be predetermined, and for each processor to save its state when its logical clock reaches that time. How do we know that the processor states so saved define a consistent cut, and what further work is necessary to capture a complete snapshot?
  2. What must be known about the state of a distributed computation to be able to conclude that it has achieved termination?
  3. In the garbage collection algorithm in Chapter 11, it is not sufficient to allow the propagator to simply add and edge pointing to a node that is food. What more must be done and why?
  4. State all the requirements of the binary consensus problem.

Semseter Project

The description of the semester project can be found here. Part 1 is due Nov 13.

Course Grading

10%(Homework) + 20%(Project) + 30%(MAX[Midterm, Final]) + 40%(Final)