C560 Lecture notes -- Simulating Scheduling Algorithms

  • Jim Plank
  • CS560: Operating Systems
  • Directory: /bluegreen/homes/plank/cs560/notes/SchedSim
  • Lecture notes: http://web.eecs.utk.edu/~jplank/plank/classes/cs560/560/notes/SchedSim/lecture.html

    Why Bother?

    Ok, I'll admit having become bored with the material in CS560 on processor scheduling. Sure, you can list scheduling algorithms along with their plusses and minuses; however, learning scheduling from the book does not give you any insight into exactly how the algorithms work.

    So, I've written a simulator to help evaluate scheduling algorithms. I will not go into the details of the code, because it is too complex at this time. But we will go through a few scheduling scenarios to learn some things.

    Our Job Model

    We are going to simulate a single-processor system which has a mix of jobs. These jobs are parameterized in the following way. Each job is defined by the following values:

    Jobs are specified in a job file, in which each line specifies a job. Each line has 18 words:

    1. Job Name.
    2. Interarrival time distribution type (Uniform/Exponential/Normal).
    3. Interarrival time distribution mean.
    4. Interarrival time distribution minimum.
    5. Interarrival type: Serial or Periodic.
    6. Number of iterations distribution type.
    7. Number of iterations distribution mean.
    8. Number of iterations distribution minimum.
    9. Sleep time distribution type.
    10. Sleep time distribution mean. (Units: seconds)
    11. Sleep time distribution minimum.
    12. CPU-burst time distribution type.
    13. CPU-burst time distribution mean. (Units: machine cycles)
    14. CPU-burst time distribution minimum.
    15. IO-burst time distribution type.
    16. IO-burst time distribution mean. (Units: device units)
    17. IO-burst time distribution minimum.
    18. IO-burst device.

    For example, the file job-cpu-1.txt defines one job, called CPU-B. These jobs are generated serially, one after another, with no interarrival time (i.e. as soon as one job finishes, the next begins). Each CPU-B job iterates 10 times, sleeping for zero seconds, then performing 180 million CPU operations, then doing 1 unit's worth of I/O to the console.

    The file job-vi-1.txt also has one job, whose intent is to model interactive editing jobs such as vi. These are executed serially, and the job interarrival times are defined by an exponential whose mean is 30 minutes (1800 seconds), and whose minimum is 20 seconds. The jobs themselves iterate according to an exponential whose mean is 900, and whose minimum is 10, and each iteration sleeps for an average of 2 seconds, followed by an average of 500 cycle's worth of CPU time, and one unit of console I/O.

    The file job-cpu-vi.txt mixes a CPU-bound job and one VI job. It should run well, since the VI job doesn't use much CPU.

    The file job-cpu-vi-wav.txt has six different VI jobs, plus a WAV job, which represents a CD player (each iteration reads a second's worth of music from the CD-ROM drive, and then sleeps for a second to emulate playing the music).

    Obviously, you may design your own job files.


    The Machine

    You describe a machine with a machine file. This file needs to have a SPEED (in millions of instructions per second), a context switch overhead (in instructions), an optional timer interrupt frequency (specified in instructions), and any number of devices. Devices are specified with names and unit speeds.

    The file machine-1.txt defines a machine running at one million instructions per second, with a context switch overhead of 200 instructions, and four devices of varying speedss -- a disk, a cd-rom drive, a network card, and the console.

    The file machine-2.txt defines a machine equivalent to machine-1.txt but with a timer that interrupts the CPU every 50000 instructions.

    Note in both files, the units of DISK, CD-ROM and NETWORK are MB/s, while the console's units are bytes per second.


    Other Simulator Parameters

    To use the simulator, you have to link it with a scheduler (for example ss-fcfs.c is a non-preemptive first-come-first-serve scheduler), then you execute it with the following parameters:

  • A job description file
  • A machine description file
  • Two random number seeds
  • An output type. There are five of these:

    1. OFF -- no output (although SIGQUIT will show the state of the machine at any time).
    2. EVENTS -- show a line for each simulator event.
    3. SYSTEM -- show the system state after each simulator event.
    4. SAMPLE=delta -- After each delta seconds of simulated time, print out the state of the machine.
    5. QUIT=qtime -- After qtime seconds of simulated time, print out the machine state and quit.

    So, for example, if you run the simulator as follows:

    UNIX> ss-fcfs job-cpu-1.txt machine-1.txt 100 100 EVENTS | more
           0.000000: [CPU-B:000]    Starting
           0.000000: [CPU-B:000] -> Ready
           0.000000: [CPU-B:000] -> Running
         180.000200: [CPU-B:000]    Running
         180.000200: [CPU-B:000] -> IO on CONSOLE (0.000100)
         180.000300: [CPU-B:000]    IO on CONSOLE (0.000100)
         180.000300: [CPU-B:000] -> Ready
         180.000300: [CPU-B:000] -> Running
         360.000300: [CPU-B:000]    Running
         360.000300: [CPU-B:000] -> IO on CONSOLE (0.000100)
         360.000400: [CPU-B:000]    IO on CONSOLE (0.000100)
         360.000400: [CPU-B:000] -> Ready
         360.000400: [CPU-B:000] -> Running
         540.000400: [CPU-B:000]    Running
         540.000400: [CPU-B:000] -> IO on CONSOLE (0.000100)
         540.000500: [CPU-B:000]    IO on CONSOLE (0.000100)
         540.000500: [CPU-B:000] -> Ready
         540.000500: [CPU-B:000] -> Running
    ...
    
    You see the events of running that lone CPU-B process. It starts and runs for 180.0002 seconds (the 0.0002 comes from context switch overhead). Then it does one byte's worth of I/O, which takes 0.0001 seconds, and then it uses the CPU for another 180 seconds. There is no context switch overhead here, because no other process has had the CPU. Then another byte's worth of I/O, etc.....

    Here's the result of running that one process using the QUIT output flag:

    UNIX> ss-fcfs job-cpu-1.txt machine-1.txt 100 100 QUIT=1000000
    SYSTEM STATE ---------------------------------------------------------
    
      Time: 1000000.000000
    
      Processes:
        555 [CPU-B:555] - Running
    
      Completed-Jobs:
    
           CPU-B COMP:    555 ELAP: 1800.001  CPU: 1800.000  RQ:    0.000   CS:    0.000
                 SL:    0.000   IO:    0.001  IOW:    0.000 MRQ:    0.000 MIOW:    0.000
    
      CPU Utilization:   99.99%  Throughput: 0.00056  Turaround: 1800.00
      Avg-Wait-Time:      0.00   MAXRQ:        0.000  MAXIOW:      0.000
    UNIX>
    
    Note -- the output tells you at that point in the machine, what processes are running and in what state, then information on the completed jobs, then statistics on the whole machine. You may use format.awk to turn that output into html that looks like:

    Time: 1000000.000000 -- Processes:

    PID Name State
    555 [CPU-B:555] Running

    Job Stats (Averages unless specified)

    Name Completed Elapsed CPU RQ CS Sleep IO IO-Wait Max RQ Max IO-Wait
    CPU-B 555 1800.001 1800.000 0.000 0.000 0.000 0.001 0.000 0.000 0.000

    Overall Stats

    CPU Utilization Job Throughput Turnaround Time Avg. Wait Time Max. RQ Wait Time Max. IO Wait Time
    99.99% 0.00056 1800.00 0.00 0.000 0.000


    Test 0 -- An IO/Bound job: A VI process

    Below we test running the IO-Bound VI job in the simulator:

    ss-fcfs job-one-vi.txt machine-1.txt 100 100 QUIT=1000000
    

    Time: 1000000.000000 -- Processes:

    PID Name State
    264 [VI-1:264] Starting

    Job Stats (Averages unless specified)

    Name Completed Elapsed CPU RQ CS Sleep IO IO-Wait Max RQ Max IO-Wait
    VI-1 264 1811.081 0.452 0.000 0.000 1810.538 0.090 0.000 0.000 0.000

    Overall Stats

    CPU Utilization Job Throughput Turnaround Time Avg. Wait Time Max. RQ Wait Time Max. IO Wait Time
    0.01% 0.00026 1811.08 0.00 0.000 0.000

    Are things as we'd expect? Well, yes: Each job averages 900 events with a sleep time of 2 seconds, so each job should average 1800 seconds (same as a CPU-bound job), followed by an average wait (interarrival) time of 1800 seconds. Therefore, roughly 555/2=277 VI jobs should finish in 1,000,000 seconds.


    Test 1 -- A CPU-bound job and a VI job

    This first test shows the results of running the simulator on job-cpu-vi.txt. What would we expect, given a decent scheduler, plus what we know from the above run of one CPU-bound job? Well, I'd say the following:

    Of course, when we run this job file with the non-preemptive FCFS scheduler, we don't get that behavior:

    ss-fcfs job-cpu-vi.txt machine-1.txt 100 100 QUIT=1000000

    Time: 1000000.000000 -- Processes:

    PID Name State
    362 [VI-1:004] Ready
    560 [CPU-A:555] Running

    Job Stats (Averages unless specified)

    Name Completed Elapsed CPU RQ CS Sleep IO IO-Wait Max RQ Max IO-Wait
    CPU-A 555 1800.009 1800.000 0.006 0.002 0.000 0.001 0.000 0.005 0.000
    VI-1 4 159273.622 0.442 157526.242 0.177 1746.673 0.089 0.000 179.800 0.000

    Overall Stats

    CPU Utilization Job Throughput Turnaround Time Avg. Wait Time Max. RQ Wait Time Max. IO Wait Time
    99.99% 0.00056 2926.83 1127.21 179.800 0.000

    The CPU-bound jobs finish fine, but since they do not get preempted by the VI jobs, the VI jobs have to wait up to (roughly) 180 seconds before they get the CPU (to see that, look at the "Max RQ" value of VI-1). This has dire consequences -- the VI jobs have an average of over 150,000 seconds of wait time, and only four of them finish in 1,000,000 seconds.

    A second scheduler is in ss-fcfsp. This is a first-come-first-served scheduler, but it is preemptive, which means that at every scheduling point, the running job is preempted and put at the end of a ready queue. With job-cpu-vi, this scheduler has great performance, because every time the VI process needs the CPU, it gets it, and when it it not running, the CPU process may run:

    ss-fcfsp job-cpu-vi.txt machine-1.txt 100 100 QUIT=1000000

    Time: 1000000.000000 -- Processes:

    PID Name State
    817 [VI-1:262] Sleeping
    818 [CPU-A:555] Running

    Job Stats (Averages unless specified)

    Name Completed Elapsed CPU RQ CS Sleep IO IO-Wait Max RQ Max IO-Wait
    CPU-A 555 1800.409 1800.000 0.317 0.091 0.000 0.001 0.000 0.006 0.000
    VI-1 262 1920.192 0.480 0.000 0.192 1919.424 0.096 0.000 0.000 0.000

    Overall Stats

    CPU Utilization Job Throughput Turnaround Time Avg. Wait Time Max. RQ Wait Time Max. IO Wait Time
    99.99% 0.00082 1838.82 0.34 0.006 0.000

    This is nice -- 262 VI jobs complete, and they spend a negligible amount of time on the ready queue. The CPU jobs do spend some time on the ready queue, when they are preempted so that the VI jobs can do their little amount of processing.

    However, if we extrapolate this and run the simulation longer, we see a problem:

    ss-fcfsp job-cpu-vi.txt machine-1.txt 100 100 QUIT=10000000

    Time: 10000000.000000 -- Processes:

    PID Name State
    1996 [VI-1:647] Ready
    6203 [CPU-A:5555] Running

    Job Stats (Averages unless specified)

    Name Completed Elapsed CPU RQ CS Sleep IO IO-Wait Max RQ Max IO-Wait
    CPU-A 5555 1800.102 1800.000 0.078 0.024 0.000 0.001 0.000 0.006 0.000
    VI-1 647 1902.069 0.476 0.000 0.190 1901.307 0.095 0.000 0.000 0.000

    Overall Stats

    CPU Utilization Job Throughput Turnaround Time Avg. Wait Time Max. RQ Wait Time Max. IO Wait Time
    100.00% 0.00062 1810.74 0.11 180.000 0.000

    There are only 647 completed VI processes, and the 647th one is caught on the ready queue. What has happened? Well, this VI process was unfortunate enough to have its CPU burst (which must be fairly big -- a possibility with exponential distributions) get preempted by the CPU-bound process (this can only happen when the CPU process returns from its IO to the console). At this point, the CPU processes only give up the CPU for 100 cycles at a time, and the VI process exhibits horrendous performance. Note -- the "Max RQ Wait Time" for the machine is 180 seconds -- this is the VI process waiting for the CPU processes. Note also, that the "Job Stats" are only for completed processes, which is what the "Max RQ" for VI processes says 0.006 rather than 180.000.

    To illustrate this problem more clearly, look at job-cpu-2.txt. This file has two identical CPU-bound jobs. When we run it, we see the following:

    ss-fcfsp job-cpu-2.txt machine-1.txt 100 100 QUIT=1000000

    Time: 1000000.000000 -- Processes:

    PID Name State
    1 [CPU-B:000] Ready
    556 [CPU-A:555] Running

    Job Stats (Averages unless specified)

    Name Completed Elapsed CPU RQ CS Sleep IO IO-Wait Max RQ Max IO-Wait
    CPU-A 555 1800.003 1800.000 0.000 0.002 0.000 0.001 0.000 0.000 0.000
    CPU-B 0 1000000.000 0.000 999901.111 0.555 0.000 0.000 0.000 180.000 0.000

    Overall Stats

    CPU Utilization Job Throughput Turnaround Time Avg. Wait Time Max. RQ Wait Time Max. IO Wait Time
    99.99% 0.00056 1800.00 0.00 180.000 0.000

    What is happening is that the CPU-B job never gets executed, except when CPU-A stops for I/O (and since the I/O time is quicker than the context-switch time, CPU-B still does no work). This is because CPU-A gets the CPU whenever it tries to schedule it. To see this perhaps more clearly, look at the first few events when this job is executed:

           0.000000: [CPU-B:000]    Starting
           0.000000: [CPU-B:000] -> Ready
           0.000000: [CPU-B:000] -> Running
           0.000000: [CPU-A:000]    Starting
           0.000000: [CPU-A:000] -> Ready
           0.000000: [CPU-B:000] -> Ready
           0.000000: [CPU-A:000] -> Running
         180.000200: [CPU-A:000]    Running
         180.000200: [CPU-A:000] -> IO on CONSOLE (0.000100)
         180.000200: [CPU-B:000] -> Running
         180.000300: [CPU-A:000]    IO on CONSOLE (0.000100)
         180.000300: [CPU-A:000] -> Ready
         180.000300: [CPU-B:000] -> Ready
         180.000300: [CPU-A:000] -> Running
         360.000500: [CPU-A:000]    Running
         360.000500: [CPU-A:000] -> IO on CONSOLE (0.000100)
         360.000500: [CPU-B:000] -> Running
         360.000600: [CPU-A:000]    IO on CONSOLE (0.000100)
         360.000600: [CPU-A:000] -> Ready
         360.000600: [CPU-B:000] -> Ready
         360.000600: [CPU-A:000] -> Running
         540.000800: [CPU-A:000]    Running
         540.000800: [CPU-A:000] -> IO on CONSOLE (0.000100)
         etc.
    

    Test #2 -- A Round-Robin Scheduler

    One solution to these problems is to employ a timer interrupt so that jobs get slices of the CPU. Interestingly, simply adding a timer event to the preemptive first-come first-serve scheduler does the trick. As you can see, this alleviates the problem of the one process starving the other:

    ss-fcfsp job-cpu-2.txt machine-2.txt 100 100 QUIT=1000000

    Time: 1000000.000000 -- Processes:

    PID Name State
    552 [CPU-A:276] Ready
    553 [CPU-B:276] Running

    Job Stats (Averages unless specified)

    Name Completed Elapsed CPU RQ CS Sleep IO IO-Wait Max RQ Max IO-Wait
    CPU-A 276 3614.464 1800.000 1807.231 7.232 0.000 0.001 0.000 0.050 0.000
    CPU-B 276 3614.464 1800.000 1807.231 7.232 0.000 0.001 0.000 0.050 0.000

    Overall Stats

    CPU Utilization Job Throughput Turnaround Time Avg. Wait Time Max. RQ Wait Time Max. IO Wait Time
    99.60% 0.00055 3614.46 1814.46 0.050 0.000

    (You will note that this takes a while to run -- about 30 seconds on my LINUX box). Also, you'll see that the CPU utilization and throughput are slightly lower, since more time is spent context switching.

    Now, if we run job-cpu-vi.txt, we'll see that the VI job has a much more acceptable max wait time:

    ss-fcfsp job-cpu-vi.txt machine-2.txt 100 100 QUIT=1000000

    Time: 1000000.000000 -- Processes:

    PID Name State
    838 [VI-1:285] Starting
    841 [CPU-A:555] Running

    Job Stats (Averages unless specified)

    Name Completed Elapsed CPU RQ CS Sleep IO IO-Wait Max RQ Max IO-Wait
    CPU-A 555 1800.390 1800.000 0.302 0.087 0.000 0.001 0.000 0.006 0.000
    VI-1 285 1677.013 0.418 0.586 0.169 1675.757 0.084 0.000 0.050 0.000

    Overall Stats

    CPU Utilization Job Throughput Turnaround Time Avg. Wait Time Max. RQ Wait Time Max. IO Wait Time
    99.98% 0.00084 1758.53 0.51 0.050 0.000

    And, if we run six VI jobs and a WAV player along with the CPU-bound job, they all run pretty well:

    ss-fcfsp job-cpu-vi-wav.txt machine-2.txt 100 100 QUIT=1000000

    Time: 1000000.000000 -- Processes:

    PID Name State
    4939 [VI-6:266] Sleeping
    5013 [VI-5:270] Sleeping
    5026 [VI-2:292] Sleeping
    5029 [VI-4:280] Sleeping
    5031 [CPU-A:554] Running
    5032 [VI-1:289] Sleeping
    5033 [VI-3:301] Starting
    5036 [WAV:2777] Sleeping

    Job Stats (Averages unless specified)

    Name Completed Elapsed CPU RQ CS Sleep IO IO-Wait Max RQ Max IO-Wait
    CPU-A 554 1803.337 1800.000 2.450 0.886 0.000 0.001 0.000 0.008 0.000
    VI-1 289 1787.156 0.447 0.694 0.181 1785.745 0.089 0.000 0.051 0.000
    VI-2 292 1846.704 0.461 0.755 0.187 1845.209 0.092 0.000 0.050 0.000
    VI-3 301 1629.924 0.410 0.655 0.165 1628.612 0.082 0.000 0.052 0.000
    VI-4 280 1782.503 0.446 0.710 0.181 1781.076 0.089 0.000 0.051 0.000
    VI-5 270 1901.600 0.476 0.760 0.193 1900.076 0.095 0.000 0.051 0.000
    VI-6 266 2064.695 0.515 0.805 0.209 2063.063 0.103 0.000 0.051 0.000
    WAV 2777 360.084 0.031 0.135 0.063 312.550 47.305 0.000 0.054 0.000

    Overall Stats

    CPU Utilization Job Throughput Turnaround Time Avg. Wait Time Max. RQ Wait Time Max. IO Wait Time
    99.90% 0.00503 1015.53 0.79 0.054 0.000

    However, suppose we run job-10-cpu-vi-wav.txt, which has ten identical CPU-bound jobs, plus two VI jobs and a WAV player:

    ss-fcfsp job-10-cpu-vi-wav.txt machine-2.txt 100 100 QUIT=1000000

    Time: 1000000.000000 -- Processes:

    PID Name State
    3102 [CPU-C:055] Ready
    3103 [CPU-B:055] Ready
    3104 [CPU-I:055] Ready
    3105 [CPU-A:055] Running
    3106 [CPU-H:055] Ready
    3108 [CPU-E:055] Ready
    3109 [CPU-F:055] Ready
    3110 [CPU-G:055] Ready
    3111 [CPU-D:055] Ready
    3112 [CPU-J:055] Ready
    3120 [VI-2:261] Sleeping
    3126 [WAV:2041] Sleeping
    3127 [VI-1:263] Starting

    Job Stats (Averages unless specified)

    Name Completed Elapsed CPU RQ CS Sleep IO IO-Wait Max RQ Max IO-Wait
    CPU-A 55 18086.186 1800.000 16278.248 7.937 0.000 0.001 0.000 0.453 0.000
    CPU-B 55 18083.919 1800.000 16275.982 7.936 0.000 0.001 0.000 0.452 0.000
    CPU-C 55 18081.610 1800.000 16273.675 7.935 0.000 0.001 0.000 0.453 0.000
    CPU-D 55 18091.963 1800.000 16284.023 7.938 0.000 0.001 0.000 0.453 0.000
    CPU-E 55 18090.888 1800.000 16282.948 7.939 0.000 0.001 0.000 0.454 0.000
    CPU-F 55 18090.895 1800.000 16282.955 7.939 0.000 0.001 0.000 0.452 0.000
    CPU-G 55 18091.356 1800.000 16283.417 7.938 0.000 0.001 0.000 0.454 0.000
    CPU-H 55 18088.099 1800.000 16280.161 7.937 0.000 0.001 0.000 0.453 0.000
    CPU-I 55 18085.833 1800.000 16277.896 7.937 0.000 0.001 0.000 0.455 0.000
    CPU-J 55 18094.326 1800.000 16286.386 7.940 0.000 0.001 0.000 0.453 0.000
    VI-1 263 2081.723 0.435 341.471 0.175 1739.555 0.087 0.000 0.502 0.000
    VI-2 261 2008.683 0.419 329.326 0.169 1678.685 0.084 0.000 0.502 0.000
    WAV 2041 489.677 0.031 133.132 0.062 309.595 46.858 0.000 0.502 0.000

    Overall Stats

    CPU Utilization Job Throughput Turnaround Time Avg. Wait Time Max. RQ Wait Time Max. IO Wait Time
    99.54% 0.00312 3858.71 3019.70 0.502 0.000

    While everything runs as you think it should, the MAX ready queue time of the VI and WAV jobs is very bad. Why? Because whenever one of these I/O-bound jobs needs scheduling, it starts off at the end of the ready queue, and has to wait for the 10 CPU-bound jobs to finish their time slices. Clearly, this motivates the need for a smarter scheduling algorithm.


    Test 3 -- The "Convoy" effect

    The book talks about the "Convoy" effect, where a long CPU burst can cause I/O-bound jobs to stack up. To illustrate this, first look at job-convoy-1.txt, which has three WAV players sharing the same CD-ROM. When we run this with the FCFS scheduler, all works pretty well, because after their first jobs, the processes end up sleeping for different times, and basically access the CD-ROM drive randomly:

    ss-fcfs job-convoy-1.txt machine-1.txt 100 100 QUIT=1000000

    Time: 1000000.000000 -- Processes:

    PID Name State
    632 [WAV-2:220] Sleeping
    633 [WAV-3:207] Sleeping
    634 [WAV-1:205] Sleeping

    Job Stats (Averages unless specified)

    Name Completed Elapsed CPU RQ CS Sleep IO IO-Wait Max RQ Max IO-Wait
    WAV-1 205 4877.081 0.032 0.000 0.043 4827.831 48.676 0.499 0.000 0.266
    WAV-2 220 4524.453 0.030 0.000 0.040 4478.773 45.150 0.459 0.000 0.260
    WAV-3 207 4825.881 0.032 0.000 0.043 4777.081 48.264 0.462 0.000 0.247

    Overall Stats

    CPU Utilization Job Throughput Turnaround Time Avg. Wait Time Max. RQ Wait Time Max. IO Wait Time
    0.00% 0.00063 4737.56 0.51 0.000 0.266

    Now, add a CPU-bound job to the mix (job-convoy-2.txt). Since the scheduler is the non-preemptive FCFS scheduler, the WAV jobs all wait for the CPU-bound job, and when that job gives up the CPU, all three WAV jobs end up competing for the IO device:

    ss-fcfs job-convoy-2.txt machine-1.txt 100 100 QUIT=1000000

    Time: 1000000.000000 -- Processes:

    PID Name State
    565 [WAV-1:012] Ready
    569 [WAV-3:014] Ready
    587 [WAV-2:019] Ready
    603 [CPU-A:555] Running

    Job Stats (Averages unless specified)

    Name Completed Elapsed CPU RQ CS Sleep IO IO-Wait Max RQ Max IO-Wait
    CPU-A 555 1800.011 1800.000 0.008 0.002 0.000 0.001 0.000 0.001 0.000
    WAV-1 12 78000.514 0.043 71349.284 0.087 6520.085 65.586 65.430 179.847 0.302
    WAV-2 19 51110.847 0.028 46668.675 0.057 4356.395 42.976 42.716 179.841 0.302
    WAV-3 14 67281.861 0.037 61558.926 0.075 5609.703 56.573 56.547 179.838 0.302

    Overall Stats

    CPU Utilization Job Throughput Turnaround Time Avg. Wait Time Max. RQ Wait Time Max. IO Wait Time
    99.99% 0.00060 6413.44 4345.20 179.847 0.302

    Interesting, no? We see the "convoy" effect here, because the long CPU-bound job causes the IO-bound processes to wait not only on the ready queue, but also on the CD-ROM queue. You see this with the high IO-Wait times for the three WAV jobs.

    The time-slicing scheduler doesn't exhibit this effect:

    As you can see, the IO-Wait times are now negligible once again.

    ss-fcfsp job-convoy-2.txt machine-2.txt 100 100 QUIT=1000000

    Time: 1000000.000000 -- Processes:

    PID Name State
    1180 [WAV-3:206] Sleeping
    1182 [WAV-2:208] Sleeping
    1183 [CPU-A:555] Running
    1185 [WAV-1:213] Sleeping

    Job Stats (Averages unless specified)

    Name Completed Elapsed CPU RQ CS Sleep IO IO-Wait Max RQ Max IO-Wait
    CPU-A 555 1800.180 1800.000 0.107 0.072 0.000 0.001 0.000 0.001 0.000
    WAV-1 213 4694.719 0.031 0.089 0.062 4647.171 46.878 0.489 0.050 0.274
    WAV-2 208 4801.816 0.032 0.095 0.063 4753.326 47.815 0.486 0.050 0.288
    WAV-3 206 4837.843 0.032 0.098 0.064 4789.006 48.150 0.494 0.050 0.265

    Overall Stats

    CPU Utilization Job Throughput Turnaround Time Avg. Wait Time Max. RQ Wait Time Max. IO Wait Time
    99.99% 0.00118 3379.40 0.43 0.050 0.288

    As an interesting mental exercise, look at the following job file: job-convoy-3.txt. This has 20 CPU-bound jobs with the 3 WAV jobs. Would you expect this to exhibit the convoy effect or not? Think about it.

    My feeling was that this would exhibit the convoy effect, since each WAV job now has to wait for 20 CPU-bound time slices to use the IO device. Look at the output, though:

    ss-fcfsp job-convoy-3.txt machine-2.txt 100 100 QUIT=1000000

    Time: 1000000.000000 -- Processes:

    PID Name State
    1114 [CPU-20:027] Ready
    1115 [CPU-14:027] Ready
    1116 [CPU-10:027] Ready
    1117 [CPU-06:027] Ready
    1118 [CPU-04:027] Ready
    1119 [CPU-12:027] Ready
    1120 [CPU-19:027] Ready
    1121 [CPU-09:027] Ready
    1122 [CPU-11:027] Ready
    1123 [CPU-15:027] Ready
    1124 [CPU-08:027] Ready
    1125 [CPU-13:027] Ready
    1126 [CPU-07:027] Ready
    1127 [CPU-02:027] Running
    1128 [CPU-16:027] Ready
    1129 [CPU-17:027] Ready
    1130 [CPU-03:027] Ready
    1131 [CPU-05:027] Ready
    1132 [CPU-18:027] Ready
    1133 [CPU-01:027] Ready
    1144 [WAV-1:190] Sleeping
    1147 [WAV-3:199] Sleeping
    1148 [WAV-2:197] Sleeping

    Job Stats (Averages unless specified)

    Name Completed Elapsed CPU RQ CS Sleep IO IO-Wait Max RQ Max IO-Wait
    CPU-01 27 36155.040 1800.000 34347.670 7.369 0.000 0.001 0.000 0.950 0.000
    CPU-02 27 36150.613 1800.000 34343.244 7.368 0.000 0.001 0.000 0.950 0.000
    CPU-03 27 36153.168 1800.000 34345.798 7.368 0.000 0.001 0.000 0.950 0.000
    CPU-04 27 36145.935 1800.000 34338.567 7.367 0.000 0.001 0.000 0.951 0.000
    CPU-05 27 36153.629 1800.000 34346.260 7.368 0.000 0.001 0.000 0.950 0.000
    CPU-06 27 36145.745 1800.000 34338.378 7.367 0.000 0.001 0.000 0.950 0.000
    CPU-07 27 36150.544 1800.000 34343.176 7.368 0.000 0.001 0.000 0.950 0.000
    CPU-08 27 36149.996 1800.000 34342.628 7.368 0.000 0.001 0.000 0.950 0.000
    CPU-09 27 36147.912 1800.000 34340.544 7.367 0.000 0.001 0.000 0.950 0.000
    CPU-10 27 36145.458 1800.000 34338.090 7.367 0.000 0.001 0.000 0.950 0.000
    CPU-11 27 36148.581 1800.000 34341.213 7.367 0.000 0.001 0.000 0.951 0.000
    CPU-12 27 36146.400 1800.000 34339.032 7.367 0.000 0.001 0.000 0.950 0.000
    CPU-13 27 36150.162 1800.000 34342.793 7.368 0.000 0.001 0.000 0.951 0.000
    CPU-14 27 36144.828 1800.000 34337.460 7.367 0.000 0.001 0.000 0.950 0.000
    CPU-15 27 36149.776 1800.000 34342.407 7.367 0.000 0.001 0.000 0.950 0.000
    CPU-16 27 36151.401 1800.000 34344.032 7.368 0.000 0.001 0.000 0.950 0.000
    CPU-17 27 36152.058 1800.000 34344.689 7.368 0.000 0.001 0.000 0.951 0.000
    CPU-18 27 36154.770 1800.000 34347.401 7.369 0.000 0.001 0.000 0.950 0.000
    CPU-19 27 36147.479 1800.000 34340.111 7.367 0.000 0.001 0.000 0.951 0.000
    CPU-20 27 36143.850 1800.000 34336.483 7.366 0.000 0.001 0.000 0.951 0.000
    WAV-1 190 5233.246 0.033 298.251 0.065 4885.256 49.419 0.221 1.000 0.203
    WAV-2 197 5071.679 0.032 288.103 0.063 4735.549 47.741 0.192 1.000 0.203
    WAV-3 199 5015.408 0.031 283.834 0.062 4684.260 47.021 0.200 1.000 0.203

    Overall Stats

    CPU Utilization Job Throughput Turnaround Time Avg. Wait Time Max. RQ Wait Time Max. IO Wait Time
    99.59% 0.00113 19993.04 16624.09 1.000 0.203

    I see no convoy effect. Why?

    Well, the answer is that although each WAV job will have to wait for a bunch of CPU-bound jobs, since the WAV jobs sleep for random periods of time, they interleave with the CPU-bound jobs randomly, and it is unlikely that two WAV jobs get placed on the ready queue adjacently. Thus, no convoy effect. Interesting, no?


    Test 4 -- Reducing the latency of I/O-bound jobs

    The time-sliced scheduler did a poor job of getting quick response times to the I/O-bound jobs when there were 10 CPU-bound jobs in the system. Following the book's lead, I implemented a shortest-job-first (SJF) scheduler. While intractible in a normal setting (you don't usually know how long a job will run when you schedule it), since we know the CPU-burst time of our jobs, we can use it to always schedule the job with the shortest CPU-burst time. This is done in a simple red-black tree, and is compiled in the exectuable ss-sjf.

    Here is how it performs on job-10-cpu-vi-wav.txt:

    ss-sjf job-10-cpu-vi-wav.txt machine-2.txt 100 100 QUIT=1000000

    Time: 1000000.000000 -- Processes:

    PID Name State
    2 [CPU-C:000] Ready
    3799 [CPU-D:061] Ready
    3817 [CPU-A:061] Ready
    3821 [CPU-I:058] Ready
    3826 [CPU-G:060] Ready
    3828 [CPU-H:062] Running
    3835 [CPU-J:062] Ready
    3840 [CPU-F:063] Ready
    3842 [VI-1:269] Starting
    3849 [CPU-E:063] Ready
    3851 [VI-2:296] Sleeping
    3859 [WAV:2734] Sleeping
    3860 [CPU-B:059] Ready

    Job Stats (Averages unless specified)

    Name Completed Elapsed CPU RQ CS Sleep IO IO-Wait Max RQ Max IO-Wait
    CPU-A 61 16211.851 1799.978 14403.740 8.132 0.000 0.001 0.000 724.125 0.000
    CPU-B 59 16948.417 1799.980 15140.302 8.135 0.000 0.001 0.000 724.114 0.000
    CPU-C 0 1000000.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
    CPU-D 61 16128.788 1799.979 14320.676 8.132 0.000 0.001 0.000 724.121 0.000
    CPU-E 63 15835.012 1799.978 14026.898 8.135 0.000 0.001 0.000 724.123 0.000
    CPU-F 63 15789.067 1799.979 13980.946 8.141 0.000 0.001 0.000 724.122 0.000
    CPU-G 60 16506.173 1799.979 14698.058 8.134 0.000 0.001 0.000 724.123 0.000
    CPU-H 62 15994.142 1799.979 14186.026 8.137 0.000 0.001 0.000 724.121 0.000
    CPU-I 58 17065.990 1799.978 15257.884 8.127 0.000 0.001 0.000 724.122 0.000
    CPU-J 62 16011.645 1799.979 14203.531 8.135 0.000 0.001 0.000 724.122 0.000
    VI-1 269 1942.931 0.483 0.000 0.196 1942.154 0.097 0.000 0.001 0.000
    VI-2 296 1635.607 0.408 0.000 0.165 1634.951 0.082 0.000 0.002 0.000
    WAV 2734 365.646 0.032 0.000 0.064 317.497 48.054 0.000 0.000 0.000

    Overall Stats

    CPU Utilization Job Throughput Turnaround Time Avg. Wait Time Max. RQ Wait Time Max. IO Wait Time
    99.52% 0.00385 2842.01 2063.85 724.125 0.000

    Well, fortunately, this solves the VI problem -- the max RQ times of these jobs have dropped to nearly nothing, since they always get scheduled in preference to the CPU-bound jobs.

    However, there is a problem -- the CPU-C process is getting starved. Why? Well, think about what happens when there is no IO-Bound process. At first, all CPU processes have the same cpu-burst time. One of them is selected, and then it will get to run to completion before all the others. This is because once it does some work, its CPU burst time is decremented, and is thus less than the others. When it is done, another of the CPU-bound processes is selected at random, and then it is executed to completion. Which CPU-bound process is selected is up to the vagaries of the red-black tree library, and I would put money that CPU-C is at the end of the tree; no process gets put after it.

    Is this the right thing? While you're thinking about that, see how SJF performs on job-10-cpu-one-shorter.txt, which is the same as job-10-cpu-vi-wav.txt, except the first CPU-bound process has 179-second CPU bursts instead of 180-second CPU bursts:

    ss-sjf job-10-cpu-one-shorter.txt machine-2.txt 100 100 QUIT=1000000

    Time: 1000000.000000 -- Processes:

    PID Name State
    1 [CPU-B:000] Ready
    2 [CPU-C:000] Ready
    3 [CPU-D:000] Ready
    4 [CPU-E:000] Ready
    5 [CPU-F:000] Ready
    6 [CPU-G:000] Ready
    7 [CPU-H:000] Ready
    8 [CPU-I:000] Ready
    9 [CPU-J:000] Ready
    3815 [VI-1:282] Sleeping
    3888 [VI-2:266] Sleeping
    3893 [CPU-A:555] Running
    3901 [WAV:2786] Sleeping

    Job Stats (Averages unless specified)

    Name Completed Elapsed CPU RQ CS Sleep IO IO-Wait Max RQ Max IO-Wait
    CPU-A 555 1799.144 1789.979 1.085 8.079 0.000 0.001 0.000 0.008 0.000
    CPU-B 0 1000000.000 0.000 998884.725 0.069 0.000 0.000 0.000 1440.103 0.000
    CPU-C 0 1000000.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
    CPU-D 0 1000000.000 0.000 999604.262 0.070 0.000 0.000 0.000 1440.098 0.000
    CPU-E 0 1000000.000 0.000 999964.137 0.069 0.000 0.000 0.000 1440.076 0.000
    CPU-F 0 1000000.000 0.000 999244.421 0.069 0.000 0.000 0.000 1440.108 0.000
    CPU-G 0 1000000.000 0.000 999784.168 0.069 0.000 0.000 0.000 1440.097 0.000
    CPU-H 0 1000000.000 0.000 998704.896 0.070 0.000 0.000 0.000 1440.138 0.000
    CPU-I 0 1000000.000 0.000 999064.552 0.070 0.000 0.000 0.000 1440.112 0.000
    CPU-J 0 1000000.000 0.000 999424.338 0.069 0.000 0.000 0.000 1440.098 0.000
    VI-1 282 1767.495 0.438 0.000 0.178 1766.790 0.088 0.000 0.001 0.000
    VI-2 266 1802.754 0.450 0.000 0.182 1802.031 0.090 0.000 0.001 0.000
    WAV 2786 358.877 0.031 0.000 0.062 311.620 47.164 0.000 0.000 0.000

    Overall Stats

    CPU Utilization Job Throughput Turnaround Time Avg. Wait Time Max. RQ Wait Time Max. IO Wait Time
    99.52% 0.00389 765.32 1.38 1440.138 0.000

    Again, the VI and WAV jobs work perfectly. However, since CPU-A's burst time is less than all the others, it gets scheduled in preference to the others, and the others get starved.

    You'll note that all the others (with the exception of CPU-C) do get scheduled when CPU-A does its I/O, but that does not affect them, because the I/O time of CPU-A is less than the context-switch overhead. CPU-C is once again caught at the end of the red-black tree, and never gets scheduled.

    Now, is it right that one job runs while all the others starve? In terms of throughput, yes. You'll have to think that through for yourself. However, in terms of fairness, no. So I tweaked ss-sjf so that if a process's CPU burst is more than the quantum, then it goes into a queue that gets serviced in a FCFS manner. Here's how it runs on job-10-cpu-one-shorter.txt:

    ss-sjf-nostarve job-10-cpu-one-shorter.txt machine-2.txt 100 100 QUIT=1000000

    Time: 1000000.000000 -- Processes:

    PID Name State
    3808 [CPU-A:055] Ready
    3826 [CPU-B:055] Ready
    3828 [CPU-G:055] Ready
    3829 [CPU-F:055] Ready
    3831 [CPU-H:055] Ready
    3832 [CPU-E:055] Ready
    3833 [CPU-C:055] Ready
    3834 [CPU-J:055] Ready
    3835 [CPU-D:055] Running
    3838 [CPU-I:055] Ready
    3841 [VI-1:279] Sleeping
    3843 [VI-2:282] Starting
    3854 [WAV:2731] Sleeping

    Job Stats (Averages unless specified)

    Name Completed Elapsed CPU RQ CS Sleep IO IO-Wait Max RQ Max IO-Wait
    CPU-A 55 17995.543 1789.998 16197.457 8.087 0.000 0.001 0.000 0.501 0.000
    CPU-B 55 18083.681 1799.998 16275.555 8.127 0.000 0.001 0.000 0.500 0.000
    CPU-C 55 18093.465 1799.998 16285.336 8.131 0.000 0.001 0.000 0.499 0.000
    CPU-D 55 18095.703 1799.998 16287.573 8.131 0.000 0.001 0.000 0.500 0.000
    CPU-E 55 18090.966 1799.998 16282.837 8.130 0.000 0.001 0.000 0.500 0.000
    CPU-F 55 18090.249 1799.998 16282.120 8.131 0.000 0.001 0.000 0.499 0.000
    CPU-G 55 18089.382 1799.998 16281.254 8.129 0.000 0.001 0.000 0.500 0.000
    CPU-H 55 18090.680 1799.998 16282.551 8.131 0.000 0.001 0.000 0.501 0.000
    CPU-I 55 18099.806 1799.998 16291.672 8.135 0.000 0.001 0.000 0.500 0.000
    CPU-J 55 18093.599 1799.998 16285.469 8.131 0.000 0.001 0.000 0.500 0.000
    VI-1 279 1850.821 0.462 0.000 0.187 1850.080 0.092 0.000 0.001 0.000
    VI-2 282 1702.365 0.424 0.000 0.172 1701.684 0.085 0.000 0.001 0.000
    WAV 2731 366.138 0.032 0.000 0.064 317.925 48.118 0.000 0.000 0.000

    Overall Stats

    CPU Utilization Job Throughput Turnaround Time Avg. Wait Time Max. RQ Wait Time Max. IO Wait Time
    99.52% 0.00384 3108.18 2331.10 0.501 0.000

    That looks a lot better, doesn't it?


    Test 5 -- We are not omniscient.

    Unfortunately, in a real scheduler, we don't know how long the CPU bursts take. So we need to predict. The book gives a prediction algorithm based on exponential decay:

    P = (alpha) lastburst + (1 - alpha) P

    P starts with an initial value (we set it to zero). Alpha is a number between 0 and 1. If 0, then our prediction is always the initial value of P. If 1, then our prediction is always the last cpu burst. Otherwise, it's some combination of the initial value and the most recent CPU bursts. Obviously, the closer that alpha is to one, the more it weighs the most recent bursts. If you look at the equation, it is an exponentially decaying equation -- meaning that the most recent CPU bursts are always weighed more than the less recent ones.

    I've hacked this up in ss-exp, where alpha is compiled in. Here are a few values of alpha:

    ss-exp job-10-cpu-one-shorter.txt machine-2.txt 100 100 QUIT=1000000 -- ALPHA=0

    Time: 1000000.000000 -- Processes:

    PID Name State
    2 [CPU-C:000] Ready
    3166 [CPU-A:061] Ready
    3180 [CPU-I:061] Ready
    3181 [CPU-B:061] Ready
    3182 [CPU-F:061] Ready
    3183 [CPU-H:061] Running
    3185 [CPU-G:061] Ready
    3186 [CPU-D:061] Ready
    3187 [CPU-E:061] Ready
    3193 [CPU-J:061] Ready
    3204 [WAV:2139] Sleeping
    3205 [VI-2:255] Starting
    3206 [VI-1:251] Starting

    Job Stats (Averages unless specified)

    Name Completed Elapsed CPU RQ CS Sleep IO IO-Wait Max RQ Max IO-Wait
    CPU-A 61 16179.135 1790.000 14381.222 7.912 0.000 0.001 0.000 0.404 0.000
    CPU-B 61 16275.727 1800.000 14467.766 7.960 0.000 0.001 0.000 0.404 0.000
    CPU-C 0 1000000.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
    CPU-D 61 16279.946 1800.000 14471.983 7.962 0.000 0.001 0.000 0.404 0.000
    CPU-E 61 16285.371 1800.000 14477.406 7.964 0.000 0.001 0.000 0.404 0.000
    CPU-F 61 16277.357 1800.000 14469.396 7.961 0.000 0.001 0.000 0.403 0.000
    CPU-G 61 16279.455 1800.000 14471.493 7.961 0.000 0.001 0.000 0.404 0.000
    CPU-H 61 16277.934 1800.000 14469.972 7.961 0.000 0.001 0.000 0.404 0.000
    CPU-I 61 16273.796 1800.000 14465.836 7.959 0.000 0.001 0.000 0.404 0.000
    CPU-J 61 16302.120 1800.000 14494.147 7.972 0.000 0.001 0.000 0.403 0.000
    VI-1 251 2097.899 0.447 308.569 0.180 1788.613 0.089 0.000 0.451 0.000
    VI-2 255 2230.469 0.474 327.345 0.190 1902.365 0.095 0.000 0.452 0.000
    WAV 2139 467.093 0.030 116.246 0.061 304.647 46.109 0.000 0.452 0.000

    Overall Stats

    CPU Utilization Job Throughput Turnaround Time Avg. Wait Time Max. RQ Wait Time Max. IO Wait Time
    99.54% 0.00319 3452.33 2615.68 0.452 0.000

    As you can see, since the prediction is always 0 (the initial value), all processes will have the same prediction, and the process that gets scheduled is the one lucky enough to be at the front of the red-black tree. The unfortunate process (CPU-C) gets starved.

    Here are the results when alpha is 1.0, and we only look at the last CPU burst:

    ss-exp job-10-cpu-one-shorter.txt machine-2.txt 100 100 QUIT=1000000 -- ALPHA = 1

    Time: 1000000.000000 -- Processes:

    PID Name State
    2049 [CPU-I:113] Ready
    3881 [CPU-A:033] Ready
    3885 [CPU-B:037] Ready
    3889 [CPU-C:042] Ready
    3901 [CPU-J:035] Running
    3902 [CPU-G:035] Ready
    3903 [CPU-E:035] Ready
    3904 [CPU-D:035] Ready
    3907 [CPU-H:145] Ready
    3933 [VI-1:294] Starting
    3936 [VI-2:270] Sleeping
    3938 [CPU-F:036] Ready
    3939 [WAV:2817] Sleeping

    Job Stats (Averages unless specified)

    Name Completed Elapsed CPU RQ CS Sleep IO IO-Wait Max RQ Max IO-Wait
    CPU-A 33 29826.609 1789.997 28028.510 8.101 0.000 0.001 0.000 491520.500 0.000
    CPU-B 37 26633.161 1799.998 24825.024 8.138 0.000 0.001 0.000 393216.600 0.000
    CPU-C 42 23486.506 1799.997 21678.371 8.137 0.000 0.001 0.000 393216.600 0.000
    CPU-D 35 28268.342 1800.000 26460.195 8.146 0.000 0.001 0.000 393216.600 0.000
    CPU-E 35 28268.160 1800.000 26460.013 8.146 0.000 0.001 0.000 393216.600 0.000
    CPU-F 36 27762.958 1799.999 25954.809 8.148 0.000 0.001 0.000 393216.600 0.000
    CPU-G 35 28267.211 1800.000 26459.062 8.148 0.000 0.001 0.000 393216.600 0.000
    CPU-H 145 6828.059 1799.984 5019.935 8.139 0.000 0.001 0.000 24576.450 0.000
    CPU-I 113 4608.716 1799.980 2800.600 8.136 0.000 0.001 0.000 30720.350 0.000
    CPU-J 35 28261.093 1800.000 26452.946 8.146 0.000 0.001 0.000 393216.600 0.000
    VI-1 294 1793.394 0.445 0.602 0.181 1792.077 0.090 0.000 0.093 0.000
    VI-2 270 1844.981 0.458 0.591 0.186 1843.655 0.092 0.000 0.053 0.000
    WAV 2817 354.949 0.031 0.159 0.062 308.070 46.627 0.000 0.056 0.000

    Overall Stats

    CPU Utilization Job Throughput Turnaround Time Avg. Wait Time Max. RQ Wait Time Max. IO Wait Time
    99.52% 0.00393 2915.46 2149.81 491520.500 0.000

    This is much better, since the IO-Bound processes get the CPU in preference to the CPU-bound processes. However, the CPU-bound processes appear to be slave to the luck of the red-black tree. Some of them (CPU-H and CPU-I) are getting scheduled more than the others. I haven't done any tracing to see why this is the case.

    Here we set alpha to 0.5. The performance looks pretty much identical to when alpha is 1, with one main difference -- the VI/WAV processes have smaller Max wait times. Think about why? (The answer is what happens when the CPU-Bound process does IO, and looks like a better process to schedule, perhaps, than a VI/WAV process).

    ss-exp job-10-cpu-one-shorter.txt machine-2.txt 100 100 QUIT=1000000 - ALPHA = 0.5

    Time: 1000000.000000 -- Processes:

    PID Name State
    2199 [CPU-A:007] Ready
    2363 [CPU-D:011] Ready
    2610 [CPU-E:031] Ready
    2663 [CPU-G:146] Ready
    2932 [CPU-J:026] Ready
    3062 [CPU-H:025] Ready
    3680 [CPU-I:030] Ready
    3839 [CPU-F:091] Ready
    3932 [VI-1:278] Sleeping
    3933 [CPU-C:153] Running
    3934 [CPU-B:028] Ready
    3935 [WAV:2796] Sleeping
    3936 [VI-2:302] Starting

    Job Stats (Averages unless specified)

    Name Completed Elapsed CPU RQ CS Sleep IO IO-Wait Max RQ Max IO-Wait
    CPU-A 7 80427.720 1789.982 78629.684 8.053 0.000 0.001 0.000 489903.924 0.000
    CPU-B 28 35704.675 1799.979 33896.577 8.117 0.000 0.001 0.000 457083.407 0.000
    CPU-C 153 6532.803 1799.980 4724.686 8.136 0.000 0.001 0.000 268510.822 0.000
    CPU-D 11 54949.375 1799.978 53141.265 8.131 0.000 0.001 0.000 487968.439 0.000
    CPU-E 31 21425.752 1799.980 19617.651 8.119 0.000 0.001 0.000 400598.916 0.000
    CPU-F 91 10728.378 1799.978 8920.276 8.122 0.000 0.001 0.000 460242.783 0.000
    CPU-G 146 4630.732 1799.980 2822.624 8.128 0.000 0.001 0.000 38335.150 0.000
    CPU-H 25 31185.983 1799.978 29377.894 8.109 0.000 0.001 0.000 461193.262 0.000
    CPU-I 30 31122.171 1799.980 29314.073 8.117 0.000 0.001 0.000 399565.231 0.000
    CPU-J 26 28719.107 1799.978 26911.018 8.110 0.000 0.001 0.000 434248.020 0.000
    VI-1 278 1841.408 0.460 0.000 0.186 1840.670 0.092 0.000 0.003 0.000
    VI-2 302 1529.457 0.380 0.000 0.155 1528.845 0.076 0.000 0.002 0.000
    WAV 2796 357.575 0.031 0.000 0.062 310.489 46.993 0.000 0.001 0.000

    Overall Stats

    CPU Utilization Job Throughput Turnaround Time Avg. Wait Time Max. RQ Wait Time Max. IO Wait Time
    99.52% 0.00392 2527.23 1772.99 489903.924 0.000

    Check out the wild asymmetry with the CPU-bound processes. Think about why that would be the case.


    Test 6 -- Predictions with a threshhold

    As with our "nostarve" version of SJF, we'd like to treat those CPU-bound jobs fairly, and clearly the CPU burst prediction is not enough. So, what I've done is modify the scheduler so that all predictions that are over a threshhold (90% of the timer quantum) get put into a queue which is serviced in a FCFS manner. The results (with alpha equal to 0.5) are excellent:

    ss-exp-thresh job-10-cpu-one-shorter.txt machine-2.txt 100 100 QUIT=1000000 -- ALPHA = 0.5

    Time: 1000000.000000 -- Processes:

    PID Name State
    3861 [CPU-A:055] Ready
    3875 [CPU-D:055] Ready
    3883 [CPU-I:055] Ready
    3885 [CPU-J:055] Running
    3886 [CPU-B:055] Ready
    3888 [CPU-C:055] Ready
    3889 [CPU-E:055] Ready
    3890 [CPU-F:055] Ready
    3891 [CPU-G:055] Ready
    3892 [CPU-H:055] Ready
    3907 [VI-1:284] Starting
    3908 [WAV:2763] Sleeping
    3909 [VI-2:300] Starting

    Job Stats (Averages unless specified)

    Name Completed Elapsed CPU RQ CS Sleep IO IO-Wait Max RQ Max IO-Wait
    CPU-A 55 17985.957 1789.997 16187.859 8.100 0.000 0.001 0.000 5.800 0.000
    CPU-B 55 18091.333 1799.996 16283.192 8.144 0.000 0.001 0.000 6.050 0.000
    CPU-C 55 18097.953 1799.996 16289.812 8.143 0.000 0.001 0.000 5.750 0.000
    CPU-D 55 18056.583 1799.997 16248.438 8.148 0.000 0.001 0.000 5.750 0.000
    CPU-E 55 18101.097 1799.997 16292.956 8.143 0.000 0.001 0.000 4.900 0.000
    CPU-F 55 18105.307 1799.997 16297.166 8.143 0.000 0.001 0.000 5.800 0.000
    CPU-G 55 18109.048 1799.997 16300.907 8.143 0.000 0.001 0.000 4.750 0.000
    CPU-H 55 18109.588 1799.996 16301.448 8.142 0.000 0.001 0.000 5.800 0.000
    CPU-I 55 18084.637 1799.996 16276.493 8.146 0.000 0.001 0.000 5.800 0.000
    CPU-J 55 18087.457 1799.996 16279.315 8.145 0.000 0.001 0.000 5.800 0.000
    VI-1 284 1921.134 0.479 0.000 0.194 1920.364 0.096 0.000 0.049 0.000
    VI-2 300 1670.144 0.415 0.000 0.169 1669.477 0.083 0.000 0.002 0.000
    WAV 2763 361.856 0.031 0.000 0.063 314.206 47.556 0.000 0.001 0.000

    Overall Stats

    CPU Utilization Job Throughput Turnaround Time Avg. Wait Time Max. RQ Wait Time Max. IO Wait Time
    99.52% 0.00390 3077.25 2298.29 6.050 0.000


    Test 7 -- Using a Multilevel Feedback Queue

    Finally, the only part of this that is unoptimal is the context switch overhead of the CPU-bound processes. This is high because these processes are being time-sliced at the granularity of one time quantum, which means that every 50 ms of CPU time is mitigated by .2 ms of context switch overhead. To fix this, we can use a multilevel feedback queue both to partition the jobs into IO and CPU-bound queues, and to give larger time quanta to the CPU-bound queue.

    First, ss-mlfq-1 implements a three-level queue. The first level is where processes go that have either just started running (due to starting, or due to sleeping/IO) or gave up the CPU voluntarily due to a non-timer event. The second level is for processes that have been evicted from the CPU due to one timer interrupt. The third level is for processes that have been evicted due to multiple timer interrupts. These are the CPU-bound jobs. The scheduler processes each queue in a round-robin fashion, but it only gives the CPU to a job in the second-level queue if there are none in the first-level queue. Similarly, it only gives the CPU to a job in the third-level queue if there are none in the other queues.

    Here is the result of running ss-mlfq-1:

    ss-mlfq-1 job-10-cpu-one-shorter.txt machine-2.txt 100 100 QUIT=1000000

    Time: 1000000.000000 -- Processes:

    PID Name State
    3892 [CPU-A:055] Ready
    3909 [CPU-C:055] Ready
    3910 [CPU-D:055] Ready
    3911 [CPU-F:055] Ready
    3912 [CPU-I:055] Ready
    3913 [CPU-H:055] Running
    3914 [CPU-J:055] Ready
    3915 [CPU-B:055] Ready
    3916 [CPU-E:055] Ready
    3919 [CPU-G:055] Ready
    3922 [VI-1:263] Starting
    3934 [VI-2:291] Starting
    3936 [WAV:2820] Sleeping

    Job Stats (Averages unless specified)

    Name Completed Elapsed CPU RQ CS Sleep IO IO-Wait Max RQ Max IO-Wait
    CPU-A 55 18000.294 1790.000 16202.224 8.069 0.000 0.001 0.000 0.500 0.000
    CPU-B 55 18093.843 1800.000 16285.730 8.112 0.000 0.001 0.000 0.500 0.000
    CPU-C 55 18085.517 1800.000 16277.407 8.109 0.000 0.001 0.000 0.542 0.000
    CPU-D 55 18085.738 1800.000 16277.629 8.108 0.000 0.001 0.000 0.500 0.000
    CPU-E 55 18095.912 1800.000 16287.797 8.114 0.000 0.001 0.000 0.500 0.000
    CPU-F 55 18087.531 1800.000 16279.420 8.110 0.000 0.001 0.000 0.500 0.000
    CPU-G 55 18098.993 1800.000 16290.877 8.114 0.000 0.001 0.000 0.543 0.000
    CPU-H 55 18088.812 1800.000 16280.702 8.110 0.000 0.001 0.000 0.500 0.000
    CPU-I 55 18087.600 1800.000 16279.491 8.109 0.000 0.001 0.000 0.500 0.000
    CPU-J 55 18093.169 1800.000 16285.057 8.112 0.000 0.001 0.000 0.500 0.000
    VI-1 263 1717.583 0.428 0.000 0.172 1716.897 0.086 0.000 0.050 0.000
    VI-2 291 1657.332 0.414 0.001 0.166 1656.668 0.083 0.000 0.050 0.000
    WAV 2820 354.542 0.031 0.000 0.062 307.855 46.594 0.000 0.050 0.000

    Overall Stats

    CPU Utilization Job Throughput Turnaround Time Avg. Wait Time Max. RQ Wait Time Max. IO Wait Time
    99.52% 0.00392 3027.21 2282.31 0.543 0.000

    Note, this doesn't solve the context-switch problem, but it does work nicely, just like our predictive scheduler.

    Now, ss-mlfq-2 is just like ss-mlfq-1, except that it only evicts a CPU-bound job from the CPU if there is a job on another queue, or if 10 time quanta have passed since it was scheduled. In that way, the context-switch overhead of CPU-bound jobs should be reduced:

    ss-mlfq-2 job-10-cpu-one-shorter.txt machine-2.txt 100 100 QUIT=1000000

    Time: 1000000.000000 -- Processes:

    PID Name State
    3911 [CPU-A:055] Ready
    3931 [CPU-J:055] Ready
    3933 [CPU-G:055] Ready
    3934 [CPU-C:055] Ready
    3935 [CPU-B:055] Running
    3939 [CPU-I:055] Ready
    3940 [CPU-D:055] Ready
    3942 [CPU-E:055] Ready
    3944 [CPU-F:055] Ready
    3950 [CPU-H:055] Ready
    3953 [VI-1:276] Sleeping
    3969 [VI-2:289] Sleeping
    3972 [WAV:2845] Sleeping

    Job Stats (Averages unless specified)

    Name Completed Elapsed CPU RQ CS Sleep IO IO-Wait Max RQ Max IO-Wait
    CPU-A 55 17909.054 1790.000 16118.001 1.052 0.000 0.001 0.000 3.795 0.000
    CPU-B 55 18015.246 1800.000 16214.188 1.058 0.000 0.001 0.000 3.779 0.000
    CPU-C 55 18012.957 1800.000 16211.899 1.057 0.000 0.001 0.000 3.750 0.000
    CPU-D 55 18024.064 1800.000 16223.005 1.058 0.000 0.001 0.000 3.750 0.000
    CPU-E 55 18029.074 1800.000 16228.015 1.059 0.000 0.001 0.000 3.769 0.000
    CPU-F 55 18032.209 1800.000 16231.149 1.059 0.000 0.001 0.000 3.797 0.000
    CPU-G 55 18011.490 1800.000 16210.431 1.057 0.000 0.001 0.000 3.750 0.000
    CPU-H 55 18057.137 1800.000 16256.076 1.060 0.000 0.001 0.000 3.760 0.000
    CPU-I 55 18024.013 1800.000 16222.954 1.058 0.000 0.001 0.000 3.759 0.000
    CPU-J 55 18000.321 1800.000 16199.263 1.057 0.000 0.001 0.000 3.764 0.000
    VI-1 276 1846.165 0.460 0.001 0.184 1845.428 0.092 0.000 0.050 0.000
    VI-2 289 1705.380 0.427 0.001 0.171 1704.695 0.085 0.000 0.050 0.000
    WAV 2845 351.397 0.031 0.000 0.061 305.124 46.181 0.000 0.050 0.000

    Overall Stats

    CPU Utilization Job Throughput Turnaround Time Avg. Wait Time Max. RQ Wait Time Max. IO Wait Time
    99.91% 0.00396 3007.19 2251.81 3.797 0.000

    Note two things here -- first, the context switch overhead has indeed been reduced. Second, as a result, the CPU utilization has gone up, the job throughput has gone up, and the turnaround time has gone down. In fact, the only metric that has gone the "wrong way" is the Max RQ time, which has increased dramatically. However, since these CPU-bound jobs should not care about response time, this is not really important.

    As a final aside, if I change the quantum for CPU-bound jobs to 100 times the regular quantum, there is improvement, but not ten-fold:

    Time: 1000000.000000 -- Processes:

    PID Name State
    3902 [CPU-A:055] Ready
    3914 [CPU-C:055] Ready
    3921 [CPU-F:055] Running
    3922 [CPU-B:055] Ready
    3925 [CPU-E:055] Ready
    3930 [CPU-G:055] Ready
    3933 [CPU-J:055] Ready
    3935 [CPU-H:055] Ready
    3940 [CPU-D:055] Ready
    3943 [CPU-I:055] Ready
    3967 [WAV:2839] Sleeping
    3968 [VI-1:296] Starting
    3969 [VI-2:272] Starting

    Job Stats (Averages unless specified)

    Name Completed Elapsed CPU RQ CS Sleep IO IO-Wait Max RQ Max IO-Wait
    CPU-A 55 17900.692 1790.000 16110.203 0.488 0.000 0.001 0.000 10.453 0.000
    CPU-B 55 18003.654 1800.000 16203.162 0.491 0.000 0.001 0.000 10.450 0.000
    CPU-C 55 17973.835 1800.000 16173.344 0.491 0.000 0.001 0.000 10.440 0.000
    CPU-D 55 18049.502 1800.000 16249.009 0.492 0.000 0.001 0.000 10.463 0.000
    CPU-E 55 18008.011 1800.000 16207.519 0.492 0.000 0.001 0.000 10.452 0.000
    CPU-F 55 18002.155 1800.000 16201.663 0.491 0.000 0.001 0.000 10.450 0.000
    CPU-G 55 18009.626 1800.000 16209.133 0.492 0.000 0.001 0.000 10.444 0.000
    CPU-H 55 18025.738 1800.000 16225.245 0.492 0.000 0.001 0.000 10.445 0.000
    CPU-I 55 18062.129 1800.000 16261.635 0.493 0.000 0.001 0.000 10.437 0.000
    CPU-J 55 18021.723 1800.000 16221.230 0.492 0.000 0.001 0.000 10.447 0.000
    VI-1 296 1597.780 0.401 0.001 0.160 1597.137 0.080 0.000 0.050 0.000
    VI-2 272 1867.860 0.465 0.000 0.187 1867.115 0.093 0.000 0.042 0.000
    WAV 2839 351.925 0.031 0.000 0.061 305.583 46.250 0.000 0.050 0.000

    Overall Stats

    CPU Utilization Job Throughput Turnaround Time Avg. Wait Time Max. RQ Wait Time Max. IO Wait Time
    99.95% 0.00396 3003.10 2252.71 10.463 0.000

    There should be one final fix, in my opinion. I think that jobs on the first queue should be ordered by prediction, so that when the CPU bounds revert to the top queue, they are scheduled with lower priority than the I/O bound jobs. That will lower the Max RQ time, most likely.