C560 Lecture notes -- Simulating Scheduling Algorithms

Directory: /bluegreen/homes/plank/cs560/notes/SchedSim

Lecture notes: http://web.eecs.utk.edu/~jplank/plank/classes/cs560/560/notes/SchedSim/lecture.html

Why Bother?

Ok, I'll admit having become bored with the material in CS560 on processor scheduling. Sure, you can list scheduling algorithms along with their plusses and minuses; however, learning scheduling from the book does not give you any insight into exactly how the algorithms work.

So, I've written a simulator to help evaluate scheduling algorithms. I will not go into the details of the code, because it is too complex at this time. But we will go through a few scheduling scenarios to learn some things.

Our Job Model

We are going to simulate a single-processor system which has a mix of jobs. These jobs are parameterized in the following way. Each job is defined by the following values:

A name.
An interarrival probability distribution. This (and all subsequent probability distributions) is defined to be Uniform (UNI), Exponential (EXP) or Normal (NORM), with a given mean and minimum value. This is straightforward for Uniform distributions. For exponentials, one typically defines a distribution solely with a mean, and its minimum is zero. However, we allow for non-zero minima with the following equation: exp(mean,min) = exp(mean-min,0)+min.. For Normals, we assume that the standard deviation is 0.75*(avg-min).
The interarrival probability distribution defines the period of time that passes between jobs of that type. There is another parameter whose value is either SERIAL or PERIODIC. If SERIAL, then a job of one type does not start until the interarrival period has passed after the last job of that type finished. If PERIODIC, then the interarrival time is measure from when the previous job started.
A probability distribution for number of iterations. Jobs work with the following pseudo-code:
The number of iterations is defined by a probability distribution.
A probability distribution for the sleep time.
A probability distribution for the cpu burst time.
A probability distribution for the IO burst time. An IO device must also be specified.

Jobs are specified in a job file, in which each line specifies a job. Each line has 18 words:

Job Name.
Interarrival time distribution type (Uniform/Exponential/Normal).
Interarrival time distribution mean.
Interarrival time distribution minimum.
Interarrival type: Serial or Periodic.
Number of iterations distribution type.
Number of iterations distribution mean.
Number of iterations distribution minimum.
Sleep time distribution type.
Sleep time distribution mean. (Units: seconds)
Sleep time distribution minimum.
CPU-burst time distribution type.
CPU-burst time distribution mean. (Units: machine cycles)
CPU-burst time distribution minimum.
IO-burst time distribution type.
IO-burst time distribution mean. (Units: device units)
IO-burst time distribution minimum.
IO-burst device.

For example, the file job-cpu-1.txt defines one job, called CPU-B. These jobs are generated serially, one after another, with no interarrival time (i.e. as soon as one job finishes, the next begins). Each CPU-B job iterates 10 times, sleeping for zero seconds, then performing 180 million CPU operations, then doing 1 unit's worth of I/O to the console.

The file job-vi-1.txt also has one job, whose intent is to model interactive editing jobs such as vi. These are executed serially, and the job interarrival times are defined by an exponential whose mean is 30 minutes (1800 seconds), and whose minimum is 20 seconds. The jobs themselves iterate according to an exponential whose mean is 900, and whose minimum is 10, and each iteration sleeps for an average of 2 seconds, followed by an average of 500 cycle's worth of CPU time, and one unit of console I/O.

The file job-cpu-vi.txt mixes a CPU-bound job and one VI job. It should run well, since the VI job doesn't use much CPU.

The file job-cpu-vi-wav.txt has six different VI jobs, plus a WAV job, which represents a CD player (each iteration reads a second's worth of music from the CD-ROM drive, and then sleeps for a second to emulate playing the music).

Obviously, you may design your own job files.

The Machine

You describe a machine with a machine file. This file needs to have a SPEED (in millions of instructions per second), a context switch overhead (in instructions), an optional timer interrupt frequency (specified in instructions), and any number of devices. Devices are specified with names and unit speeds.

The file machine-1.txt defines a machine running at one million instructions per second, with a context switch overhead of 200 instructions, and four devices of varying speedss -- a disk, a cd-rom drive, a network card, and the console.

The file machine-2.txt defines a machine equivalent to machine-1.txt but with a timer that interrupts the CPU every 50000 instructions.

Note in both files, the units of DISK, CD-ROM and NETWORK are MB/s, while the console's units are bytes per second.

Other Simulator Parameters

To use the simulator, you have to link it with a scheduler (for example ss-fcfs.c is a non-preemptive first-come-first-serve scheduler), then you execute it with the following parameters:

A job description file

A machine description file

Two random number seeds

An output type. There are five of these:

OFF -- no output (although SIGQUIT will show the state of the machine at any time).
EVENTS -- show a line for each simulator event.
SYSTEM -- show the system state after each simulator event.
SAMPLE=delta -- After each delta seconds of simulated time, print out the state of the machine.
QUIT=qtime -- After qtime seconds of simulated time, print out the machine state and quit.

So, for example, if you run the simulator as follows:

UNIX> ss-fcfs job-cpu-1.txt machine-1.txt 100 100 EVENTS | more
       0.000000: [CPU-B:000]    Starting
       0.000000: [CPU-B:000] -> Ready
       0.000000: [CPU-B:000] -> Running
     180.000200: [CPU-B:000]    Running
     180.000200: [CPU-B:000] -> IO on CONSOLE (0.000100)
     180.000300: [CPU-B:000]    IO on CONSOLE (0.000100)
     180.000300: [CPU-B:000] -> Ready
     180.000300: [CPU-B:000] -> Running
     360.000300: [CPU-B:000]    Running
     360.000300: [CPU-B:000] -> IO on CONSOLE (0.000100)
     360.000400: [CPU-B:000]    IO on CONSOLE (0.000100)
     360.000400: [CPU-B:000] -> Ready
     360.000400: [CPU-B:000] -> Running
     540.000400: [CPU-B:000]    Running
     540.000400: [CPU-B:000] -> IO on CONSOLE (0.000100)
     540.000500: [CPU-B:000]    IO on CONSOLE (0.000100)
     540.000500: [CPU-B:000] -> Ready
     540.000500: [CPU-B:000] -> Running
...

You see the events of running that lone CPU-B process. It starts and runs for 180.0002 seconds (the 0.0002 comes from context switch overhead). Then it does one byte's worth of I/O, which takes 0.0001 seconds, and then it uses the CPU for another 180 seconds. There is no context switch overhead here, because no other process has had the CPU. Then another byte's worth of I/O, etc.....

Here's the result of running that one process using the QUIT output flag:

UNIX> ss-fcfs job-cpu-1.txt machine-1.txt 100 100 QUIT=1000000
SYSTEM STATE ---------------------------------------------------------

  Time: 1000000.000000

  Processes:
    555 [CPU-B:555] - Running

  Completed-Jobs:

       CPU-B COMP:    555 ELAP: 1800.001  CPU: 1800.000  RQ:    0.000   CS:    0.000
             SL:    0.000   IO:    0.001  IOW:    0.000 MRQ:    0.000 MIOW:    0.000

  CPU Utilization:   99.99%  Throughput: 0.00056  Turaround: 1800.00
  Avg-Wait-Time:      0.00   MAXRQ:        0.000  MAXIOW:      0.000
UNIX>

Note -- the output tells you at that point in the machine, what processes are running and in what state, then information on the completed jobs, then statistics on the whole machine. You may use format.awk to turn that output into html that looks like:

Time: 1000000.000000 -- Processes:

PID	Name	State
555	[CPU-B:555]	Running

Job Stats (Averages unless specified)

Name	Completed	Elapsed	CPU	RQ	CS	Sleep	IO	IO-Wait	Max RQ	Max IO-Wait
CPU-B	555	1800.001	1800.000	0.000	0.000	0.000	0.001	0.000	0.000	0.000

Overall Stats

CPU Utilization	Job Throughput	Turnaround Time	Avg. Wait Time	Max. RQ Wait Time	Max. IO Wait Time
99.99%	0.00056	1800.00	0.00	0.000	0.000

Test 0 -- An IO/Bound job: A VI process

Below we test running the IO-Bound VI job in the simulator:

ss-fcfs job-one-vi.txt machine-1.txt 100 100 QUIT=1000000

Time: 1000000.000000 -- Processes:

PID	Name	State
264	[VI-1:264]	Starting

Job Stats (Averages unless specified)

Name	Completed	Elapsed	CPU	RQ	CS	Sleep	IO	IO-Wait	Max RQ	Max IO-Wait
VI-1	264	1811.081	0.452	0.000	0.000	1810.538	0.090	0.000	0.000	0.000

Overall Stats

CPU Utilization	Job Throughput	Turnaround Time	Avg. Wait Time	Max. RQ Wait Time	Max. IO Wait Time
0.01%	0.00026	1811.08	0.00	0.000	0.000

Are things as we'd expect? Well, yes: Each job averages 900 events with a sleep time of 2 seconds, so each job should average 1800 seconds (same as a CPU-bound job), followed by an average wait (interarrival) time of 1800 seconds. Therefore, roughly 555/2=277 VI jobs should finish in 1,000,000 seconds.

Test 1 -- A CPU-bound job and a VI job

This first test shows the results of running the simulator on job-cpu-vi.txt. What would we expect, given a decent scheduler, plus what we know from the above run of one CPU-bound job? Well, I'd say the following:

Since the VI jobs use very little CPU, we would expect that the performance of our CPU-bound jobs will change very little. In fact, after 1,000,000 seconds, I would expect 555 CPU-bound jobs to be completed, just as in the run above where there was no VI job.
If the scheduler is working well, I'd expect the VI jobs to complete with little overhead, so roughly 277 jobs should finish.

Of course, when we run this job file with the non-preemptive FCFS scheduler, we don't get that behavior:

ss-fcfs job-cpu-vi.txt machine-1.txt 100 100 QUIT=1000000

Time: 1000000.000000 -- Processes:

PID	Name	State
362	[VI-1:004]	Ready
560	[CPU-A:555]	Running

Job Stats (Averages unless specified)

Name	Completed	Elapsed	CPU	RQ	CS	Sleep	IO	IO-Wait	Max RQ	Max IO-Wait
CPU-A	555	1800.009	1800.000	0.006	0.002	0.000	0.001	0.000	0.005	0.000
VI-1	4	159273.622	0.442	157526.242	0.177	1746.673	0.089	0.000	179.800	0.000

Overall Stats

CPU Utilization	Job Throughput	Turnaround Time	Avg. Wait Time	Max. RQ Wait Time	Max. IO Wait Time
99.99%	0.00056	2926.83	1127.21	179.800	0.000

The CPU-bound jobs finish fine, but since they do not get preempted by the VI jobs, the VI jobs have to wait up to (roughly) 180 seconds before they get the CPU (to see that, look at the "Max RQ" value of VI-1). This has dire consequences -- the VI jobs have an average of over 150,000 seconds of wait time, and only four of them finish in 1,000,000 seconds.

A second scheduler is in ss-fcfsp. This is a first-come-first-served scheduler, but it is preemptive, which means that at every scheduling point, the running job is preempted and put at the end of a ready queue. With job-cpu-vi, this scheduler has great performance, because every time the VI process needs the CPU, it gets it, and when it it not running, the CPU process may run:

ss-fcfsp job-cpu-vi.txt machine-1.txt 100 100 QUIT=1000000

Time: 1000000.000000 -- Processes:

PID	Name	State
817	[VI-1:262]	Sleeping
818	[CPU-A:555]	Running

Job Stats (Averages unless specified)

Name	Completed	Elapsed	CPU	RQ	CS	Sleep	IO	IO-Wait	Max RQ	Max IO-Wait
CPU-A	555	1800.409	1800.000	0.317	0.091	0.000	0.001	0.000	0.006	0.000
VI-1	262	1920.192	0.480	0.000	0.192	1919.424	0.096	0.000	0.000	0.000

Overall Stats

CPU Utilization	Job Throughput	Turnaround Time	Avg. Wait Time	Max. RQ Wait Time	Max. IO Wait Time
99.99%	0.00082	1838.82	0.34	0.006	0.000

This is nice -- 262 VI jobs complete, and they spend a negligible amount of time on the ready queue. The CPU jobs do spend some time on the ready queue, when they are preempted so that the VI jobs can do their little amount of processing.

However, if we extrapolate this and run the simulation longer, we see a problem:

ss-fcfsp job-cpu-vi.txt machine-1.txt 100 100 QUIT=10000000

Time: 10000000.000000 -- Processes:

PID	Name	State
1996	[VI-1:647]	Ready
6203	[CPU-A:5555]	Running

Job Stats (Averages unless specified)

Name	Completed	Elapsed	CPU	RQ	CS	Sleep	IO	IO-Wait	Max RQ	Max IO-Wait
CPU-A	5555	1800.102	1800.000	0.078	0.024	0.000	0.001	0.000	0.006	0.000
VI-1	647	1902.069	0.476	0.000	0.190	1901.307	0.095	0.000	0.000	0.000

Overall Stats

CPU Utilization	Job Throughput	Turnaround Time	Avg. Wait Time	Max. RQ Wait Time	Max. IO Wait Time
100.00%	0.00062	1810.74	0.11	180.000	0.000

There are only 647 completed VI processes, and the 647th one is caught on the ready queue. What has happened? Well, this VI process was unfortunate enough to have its CPU burst (which must be fairly big -- a possibility with exponential distributions) get preempted by the CPU-bound process (this can only happen when the CPU process returns from its IO to the console). At this point, the CPU processes only give up the CPU for 100 cycles at a time, and the VI process exhibits horrendous performance. Note -- the "Max RQ Wait Time" for the machine is 180 seconds -- this is the VI process waiting for the CPU processes. Note also, that the "Job Stats" are only for completed processes, which is what the "Max RQ" for VI processes says 0.006 rather than 180.000.

To illustrate this problem more clearly, look at job-cpu-2.txt. This file has two identical CPU-bound jobs. When we run it, we see the following:

ss-fcfsp job-cpu-2.txt machine-1.txt 100 100 QUIT=1000000

Time: 1000000.000000 -- Processes:

PID	Name	State
1	[CPU-B:000]	Ready
556	[CPU-A:555]	Running

Job Stats (Averages unless specified)

Name	Completed	Elapsed	CPU	RQ	CS	Sleep	IO	IO-Wait	Max RQ	Max IO-Wait
CPU-A	555	1800.003	1800.000	0.000	0.002	0.000	0.001	0.000	0.000	0.000
CPU-B	0	1000000.000	0.000	999901.111	0.555	0.000	0.000	0.000	180.000	0.000

Overall Stats

CPU Utilization	Job Throughput	Turnaround Time	Avg. Wait Time	Max. RQ Wait Time	Max. IO Wait Time
99.99%	0.00056	1800.00	0.00	180.000	0.000

What is happening is that the CPU-B job never gets executed, except when CPU-A stops for I/O (and since the I/O time is quicker than the context-switch time, CPU-B still does no work). This is because CPU-A gets the CPU whenever it tries to schedule it. To see this perhaps more clearly, look at the first few events when this job is executed:

       0.000000: [CPU-B:000]    Starting
       0.000000: [CPU-B:000] -> Ready
       0.000000: [CPU-B:000] -> Running
       0.000000: [CPU-A:000]    Starting
       0.000000: [CPU-A:000] -> Ready
       0.000000: [CPU-B:000] -> Ready
       0.000000: [CPU-A:000] -> Running
     180.000200: [CPU-A:000]    Running
     180.000200: [CPU-A:000] -> IO on CONSOLE (0.000100)
     180.000200: [CPU-B:000] -> Running
     180.000300: [CPU-A:000]    IO on CONSOLE (0.000100)
     180.000300: [CPU-A:000] -> Ready
     180.000300: [CPU-B:000] -> Ready
     180.000300: [CPU-A:000] -> Running
     360.000500: [CPU-A:000]    Running
     360.000500: [CPU-A:000] -> IO on CONSOLE (0.000100)
     360.000500: [CPU-B:000] -> Running
     360.000600: [CPU-A:000]    IO on CONSOLE (0.000100)
     360.000600: [CPU-A:000] -> Ready
     360.000600: [CPU-B:000] -> Ready
     360.000600: [CPU-A:000] -> Running
     540.000800: [CPU-A:000]    Running
     540.000800: [CPU-A:000] -> IO on CONSOLE (0.000100)
     etc.

Test #2 -- A Round-Robin Scheduler

One solution to these problems is to employ a timer interrupt so that jobs get slices of the CPU. Interestingly, simply adding a timer event to the preemptive first-come first-serve scheduler does the trick. As you can see, this alleviates the problem of the one process starving the other:

ss-fcfsp job-cpu-2.txt machine-2.txt 100 100 QUIT=1000000

Time: 1000000.000000 -- Processes:

PID	Name	State
552	[CPU-A:276]	Ready
553	[CPU-B:276]	Running

Job Stats (Averages unless specified)

Name	Completed	Elapsed	CPU	RQ	CS	Sleep	IO	IO-Wait	Max RQ	Max IO-Wait
CPU-A	276	3614.464	1800.000	1807.231	7.232	0.000	0.001	0.000	0.050	0.000
CPU-B	276	3614.464	1800.000	1807.231	7.232	0.000	0.001	0.000	0.050	0.000

Overall Stats

CPU Utilization	Job Throughput	Turnaround Time	Avg. Wait Time	Max. RQ Wait Time	Max. IO Wait Time
99.60%	0.00055	3614.46	1814.46	0.050	0.000

(You will note that this takes a while to run -- about 30 seconds on my LINUX box). Also, you'll see that the CPU utilization and throughput are slightly lower, since more time is spent context switching.

Now, if we run job-cpu-vi.txt, we'll see that the VI job has a much more acceptable max wait time:

ss-fcfsp job-cpu-vi.txt machine-2.txt 100 100 QUIT=1000000

Time: 1000000.000000 -- Processes:

PID	Name	State
838	[VI-1:285]	Starting
841	[CPU-A:555]	Running

Job Stats (Averages unless specified)

Name	Completed	Elapsed	CPU	RQ	CS	Sleep	IO	IO-Wait	Max RQ	Max IO-Wait
CPU-A	555	1800.390	1800.000	0.302	0.087	0.000	0.001	0.000	0.006	0.000
VI-1	285	1677.013	0.418	0.586	0.169	1675.757	0.084	0.000	0.050	0.000

Overall Stats

CPU Utilization	Job Throughput	Turnaround Time	Avg. Wait Time	Max. RQ Wait Time	Max. IO Wait Time
99.98%	0.00084	1758.53	0.51	0.050	0.000

And, if we run six VI jobs and a WAV player along with the CPU-bound job, they all run pretty well:

ss-fcfsp job-cpu-vi-wav.txt machine-2.txt 100 100 QUIT=1000000

Time: 1000000.000000 -- Processes:

PID	Name	State
4939	[VI-6:266]	Sleeping
5013	[VI-5:270]	Sleeping
5026	[VI-2:292]	Sleeping
5029	[VI-4:280]	Sleeping
5031	[CPU-A:554]	Running
5032	[VI-1:289]	Sleeping
5033	[VI-3:301]	Starting
5036	[WAV:2777]	Sleeping

Job Stats (Averages unless specified)

Name	Completed	Elapsed	CPU	RQ	CS	Sleep	IO	IO-Wait	Max RQ	Max IO-Wait
CPU-A	554	1803.337	1800.000	2.450	0.886	0.000	0.001	0.000	0.008	0.000
VI-1	289	1787.156	0.447	0.694	0.181	1785.745	0.089	0.000	0.051	0.000
VI-2	292	1846.704	0.461	0.755	0.187	1845.209	0.092	0.000	0.050	0.000
VI-3	301	1629.924	0.410	0.655	0.165	1628.612	0.082	0.000	0.052	0.000
VI-4	280	1782.503	0.446	0.710	0.181	1781.076	0.089	0.000	0.051	0.000
VI-5	270	1901.600	0.476	0.760	0.193	1900.076	0.095	0.000	0.051	0.000
VI-6	266	2064.695	0.515	0.805	0.209	2063.063	0.103	0.000	0.051	0.000
WAV	2777	360.084	0.031	0.135	0.063	312.550	47.305	0.000	0.054	0.000

Overall Stats

CPU Utilization	Job Throughput	Turnaround Time	Avg. Wait Time	Max. RQ Wait Time	Max. IO Wait Time
99.90%	0.00503	1015.53	0.79	0.054	0.000

However, suppose we run job-10-cpu-vi-wav.txt, which has ten identical CPU-bound jobs, plus two VI jobs and a WAV player:

ss-fcfsp job-10-cpu-vi-wav.txt machine-2.txt 100 100 QUIT=1000000

Time: 1000000.000000 -- Processes:

PID	Name	State
3102	[CPU-C:055]	Ready
3103	[CPU-B:055]	Ready
3104	[CPU-I:055]	Ready
3105	[CPU-A:055]	Running
3106	[CPU-H:055]	Ready
3108	[CPU-E:055]	Ready
3109	[CPU-F:055]	Ready
3110	[CPU-G:055]	Ready
3111	[CPU-D:055]	Ready
3112	[CPU-J:055]	Ready
3120	[VI-2:261]	Sleeping
3126	[WAV:2041]	Sleeping
3127	[VI-1:263]	Starting

Job Stats (Averages unless specified)

Name	Completed	Elapsed	CPU	RQ	CS	Sleep	IO	IO-Wait	Max RQ	Max IO-Wait
CPU-A	55	18086.186	1800.000	16278.248	7.937	0.000	0.001	0.000	0.453	0.000
CPU-B	55	18083.919	1800.000	16275.982	7.936	0.000	0.001	0.000	0.452	0.000
CPU-C	55	18081.610	1800.000	16273.675	7.935	0.000	0.001	0.000	0.453	0.000
CPU-D	55	18091.963	1800.000	16284.023	7.938	0.000	0.001	0.000	0.453	0.000
CPU-E	55	18090.888	1800.000	16282.948	7.939	0.000	0.001	0.000	0.454	0.000
CPU-F	55	18090.895	1800.000	16282.955	7.939	0.000	0.001	0.000	0.452	0.000
CPU-G	55	18091.356	1800.000	16283.417	7.938	0.000	0.001	0.000	0.454	0.000
CPU-H	55	18088.099	1800.000	16280.161	7.937	0.000	0.001	0.000	0.453	0.000
CPU-I	55	18085.833	1800.000	16277.896	7.937	0.000	0.001	0.000	0.455	0.000
CPU-J	55	18094.326	1800.000	16286.386	7.940	0.000	0.001	0.000	0.453	0.000
VI-1	263	2081.723	0.435	341.471	0.175	1739.555	0.087	0.000	0.502	0.000
VI-2	261	2008.683	0.419	329.326	0.169	1678.685	0.084	0.000	0.502	0.000
WAV	2041	489.677	0.031	133.132	0.062	309.595	46.858	0.000	0.502	0.000

Overall Stats

CPU Utilization	Job Throughput	Turnaround Time	Avg. Wait Time	Max. RQ Wait Time	Max. IO Wait Time
99.54%	0.00312	3858.71	3019.70	0.502	0.000

While everything runs as you think it should, the MAX ready queue time of the VI and WAV jobs is very bad. Why? Because whenever one of these I/O-bound jobs needs scheduling, it starts off at the end of the ready queue, and has to wait for the 10 CPU-bound jobs to finish their time slices. Clearly, this motivates the need for a smarter scheduling algorithm.

Test 3 -- The "Convoy" effect

The book talks about the "Convoy" effect, where a long CPU burst can cause I/O-bound jobs to stack up. To illustrate this, first look at job-convoy-1.txt, which has three WAV players sharing the same CD-ROM. When we run this with the FCFS scheduler, all works pretty well, because after their first jobs, the processes end up sleeping for different times, and basically access the CD-ROM drive randomly:

ss-fcfs job-convoy-1.txt machine-1.txt 100 100 QUIT=1000000

Time: 1000000.000000 -- Processes:

PID	Name	State
632	[WAV-2:220]	Sleeping
633	[WAV-3:207]	Sleeping
634	[WAV-1:205]	Sleeping

Job Stats (Averages unless specified)

Name	Completed	Elapsed	CPU	RQ	CS	Sleep	IO	IO-Wait	Max RQ	Max IO-Wait
WAV-1	205	4877.081	0.032	0.000	0.043	4827.831	48.676	0.499	0.000	0.266
WAV-2	220	4524.453	0.030	0.000	0.040	4478.773	45.150	0.459	0.000	0.260
WAV-3	207	4825.881	0.032	0.000	0.043	4777.081	48.264	0.462	0.000	0.247

Overall Stats

CPU Utilization	Job Throughput	Turnaround Time	Avg. Wait Time	Max. RQ Wait Time	Max. IO Wait Time
0.00%	0.00063	4737.56	0.51	0.000	0.266

Now, add a CPU-bound job to the mix (job-convoy-2.txt). Since the scheduler is the non-preemptive FCFS scheduler, the WAV jobs all wait for the CPU-bound job, and when that job gives up the CPU, all three WAV jobs end up competing for the IO device:

ss-fcfs job-convoy-2.txt machine-1.txt 100 100 QUIT=1000000

Time: 1000000.000000 -- Processes:

PID	Name	State
565	[WAV-1:012]	Ready
569	[WAV-3:014]	Ready
587	[WAV-2:019]	Ready
603	[CPU-A:555]	Running

Job Stats (Averages unless specified)

Name	Completed	Elapsed	CPU	RQ	CS	Sleep	IO	IO-Wait	Max RQ	Max IO-Wait
CPU-A	555	1800.011	1800.000	0.008	0.002	0.000	0.001	0.000	0.001	0.000
WAV-1	12	78000.514	0.043	71349.284	0.087	6520.085	65.586	65.430	179.847	0.302
WAV-2	19	51110.847	0.028	46668.675	0.057	4356.395	42.976	42.716	179.841	0.302
WAV-3	14	67281.861	0.037	61558.926	0.075	5609.703	56.573	56.547	179.838	0.302

Overall Stats

CPU Utilization	Job Throughput	Turnaround Time	Avg. Wait Time	Max. RQ Wait Time	Max. IO Wait Time
99.99%	0.00060	6413.44	4345.20	179.847	0.302

Interesting, no? We see the "convoy" effect here, because the long CPU-bound job causes the IO-bound processes to wait not only on the ready queue, but also on the CD-ROM queue. You see this with the high IO-Wait times for the three WAV jobs.

The time-slicing scheduler doesn't exhibit this effect:

As you can see, the IO-Wait times are now negligible once again.

ss-fcfsp job-convoy-2.txt machine-2.txt 100 100 QUIT=1000000

Time: 1000000.000000 -- Processes:

PID	Name	State
1180	[WAV-3:206]	Sleeping
1182	[WAV-2:208]	Sleeping
1183	[CPU-A:555]	Running
1185	[WAV-1:213]	Sleeping

Job Stats (Averages unless specified)

Name	Completed	Elapsed	CPU	RQ	CS	Sleep	IO	IO-Wait	Max RQ	Max IO-Wait
CPU-A	555	1800.180	1800.000	0.107	0.072	0.000	0.001	0.000	0.001	0.000
WAV-1	213	4694.719	0.031	0.089	0.062	4647.171	46.878	0.489	0.050	0.274
WAV-2	208	4801.816	0.032	0.095	0.063	4753.326	47.815	0.486	0.050	0.288
WAV-3	206	4837.843	0.032	0.098	0.064	4789.006	48.150	0.494	0.050	0.265

Overall Stats

CPU Utilization	Job Throughput	Turnaround Time	Avg. Wait Time	Max. RQ Wait Time	Max. IO Wait Time
99.99%	0.00118	3379.40	0.43	0.050	0.288

As an interesting mental exercise, look at the following job file: job-convoy-3.txt. This has 20 CPU-bound jobs with the 3 WAV jobs. Would you expect this to exhibit the convoy effect or not? Think about it.

My feeling was that this would exhibit the convoy effect, since each WAV job now has to wait for 20 CPU-bound time slices to use the IO device. Look at the output, though:

ss-fcfsp job-convoy-3.txt machine-2.txt 100 100 QUIT=1000000

Time: 1000000.000000 -- Processes:

PID	Name	State
1114	[CPU-20:027]	Ready
1115	[CPU-14:027]	Ready
1116	[CPU-10:027]	Ready
1117	[CPU-06:027]	Ready
1118	[CPU-04:027]	Ready
1119	[CPU-12:027]	Ready
1120	[CPU-19:027]	Ready
1121	[CPU-09:027]	Ready
1122	[CPU-11:027]	Ready
1123	[CPU-15:027]	Ready
1124	[CPU-08:027]	Ready
1125	[CPU-13:027]	Ready
1126	[CPU-07:027]	Ready
1127	[CPU-02:027]	Running
1128	[CPU-16:027]	Ready
1129	[CPU-17:027]	Ready
1130	[CPU-03:027]	Ready
1131	[CPU-05:027]	Ready
1132	[CPU-18:027]	Ready
1133	[CPU-01:027]	Ready
1144	[WAV-1:190]	Sleeping
1147	[WAV-3:199]	Sleeping
1148	[WAV-2:197]	Sleeping

Job Stats (Averages unless specified)

Name	Completed	Elapsed	CPU	RQ	CS	Sleep	IO	IO-Wait	Max RQ	Max IO-Wait
CPU-01	27	36155.040	1800.000	34347.670	7.369	0.000	0.001	0.000	0.950	0.000
CPU-02	27	36150.613	1800.000	34343.244	7.368	0.000	0.001	0.000	0.950	0.000
CPU-03	27	36153.168	1800.000	34345.798	7.368	0.000	0.001	0.000	0.950	0.000
CPU-04	27	36145.935	1800.000	34338.567	7.367	0.000	0.001	0.000	0.951	0.000
CPU-05	27	36153.629	1800.000	34346.260	7.368	0.000	0.001	0.000	0.950	0.000
CPU-06	27	36145.745	1800.000	34338.378	7.367	0.000	0.001	0.000	0.950	0.000
CPU-07	27	36150.544	1800.000	34343.176	7.368	0.000	0.001	0.000	0.950	0.000
CPU-08	27	36149.996	1800.000	34342.628	7.368	0.000	0.001	0.000	0.950	0.000
CPU-09	27	36147.912	1800.000	34340.544	7.367	0.000	0.001	0.000	0.950	0.000
CPU-10	27	36145.458	1800.000	34338.090	7.367	0.000	0.001	0.000	0.950	0.000
CPU-11	27	36148.581	1800.000	34341.213	7.367	0.000	0.001	0.000	0.951	0.000
CPU-12	27	36146.400	1800.000	34339.032	7.367	0.000	0.001	0.000	0.950	0.000
CPU-13	27	36150.162	1800.000	34342.793	7.368	0.000	0.001	0.000	0.951	0.000
CPU-14	27	36144.828	1800.000	34337.460	7.367	0.000	0.001	0.000	0.950	0.000
CPU-15	27	36149.776	1800.000	34342.407	7.367	0.000	0.001	0.000	0.950	0.000
CPU-16	27	36151.401	1800.000	34344.032	7.368	0.000	0.001	0.000	0.950	0.000
CPU-17	27	36152.058	1800.000	34344.689	7.368	0.000	0.001	0.000	0.951	0.000
CPU-18	27	36154.770	1800.000	34347.401	7.369	0.000	0.001	0.000	0.950	0.000
CPU-19	27	36147.479	1800.000	34340.111	7.367	0.000	0.001	0.000	0.951	0.000
CPU-20	27	36143.850	1800.000	34336.483	7.366	0.000	0.001	0.000	0.951	0.000
WAV-1	190	5233.246	0.033	298.251	0.065	4885.256	49.419	0.221	1.000	0.203
WAV-2	197	5071.679	0.032	288.103	0.063	4735.549	47.741	0.192	1.000	0.203
WAV-3	199	5015.408	0.031	283.834	0.062	4684.260	47.021	0.200	1.000	0.203

Overall Stats

CPU Utilization	Job Throughput	Turnaround Time	Avg. Wait Time	Max. RQ Wait Time	Max. IO Wait Time
99.59%	0.00113	19993.04	16624.09	1.000	0.203

I see no convoy effect. Why?

Well, the answer is that although each WAV job will have to wait for a bunch of CPU-bound jobs, since the WAV jobs sleep for random periods of time, they interleave with the CPU-bound jobs randomly, and it is unlikely that two WAV jobs get placed on the ready queue adjacently. Thus, no convoy effect. Interesting, no?

Test 4 -- Reducing the latency of I/O-bound jobs

The time-sliced scheduler did a poor job of getting quick response times to the I/O-bound jobs when there were 10 CPU-bound jobs in the system. Following the book's lead, I implemented a shortest-job-first (SJF) scheduler. While intractible in a normal setting (you don't usually know how long a job will run when you schedule it), since we know the CPU-burst time of our jobs, we can use it to always schedule the job with the shortest CPU-burst time. This is done in a simple red-black tree, and is compiled in the exectuable ss-sjf.

Here is how it performs on job-10-cpu-vi-wav.txt:

ss-sjf job-10-cpu-vi-wav.txt machine-2.txt 100 100 QUIT=1000000

Time: 1000000.000000 -- Processes:

PID	Name	State
2	[CPU-C:000]	Ready
3799	[CPU-D:061]	Ready
3817	[CPU-A:061]	Ready
3821	[CPU-I:058]	Ready
3826	[CPU-G:060]	Ready
3828	[CPU-H:062]	Running
3835	[CPU-J:062]	Ready
3840	[CPU-F:063]	Ready
3842	[VI-1:269]	Starting
3849	[CPU-E:063]	Ready
3851	[VI-2:296]	Sleeping
3859	[WAV:2734]	Sleeping
3860	[CPU-B:059]	Ready

Job Stats (Averages unless specified)

Name	Completed	Elapsed	CPU	RQ	CS	Sleep	IO	IO-Wait	Max RQ	Max IO-Wait
CPU-A	61	16211.851	1799.978	14403.740	8.132	0.000	0.001	0.000	724.125	0.000
CPU-B	59	16948.417	1799.980	15140.302	8.135	0.000	0.001	0.000	724.114	0.000
CPU-C	0	1000000.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000
CPU-D	61	16128.788	1799.979	14320.676	8.132	0.000	0.001	0.000	724.121	0.000
CPU-E	63	15835.012	1799.978	14026.898	8.135	0.000	0.001	0.000	724.123	0.000
CPU-F	63	15789.067	1799.979	13980.946	8.141	0.000	0.001	0.000	724.122	0.000
CPU-G	60	16506.173	1799.979	14698.058	8.134	0.000	0.001	0.000	724.123	0.000
CPU-H	62	15994.142	1799.979	14186.026	8.137	0.000	0.001	0.000	724.121	0.000
CPU-I	58	17065.990	1799.978	15257.884	8.127	0.000	0.001	0.000	724.122	0.000
CPU-J	62	16011.645	1799.979	14203.531	8.135	0.000	0.001	0.000	724.122	0.000
VI-1	269	1942.931	0.483	0.000	0.196	1942.154	0.097	0.000	0.001	0.000
VI-2	296	1635.607	0.408	0.000	0.165	1634.951	0.082	0.000	0.002	0.000
WAV	2734	365.646	0.032	0.000	0.064	317.497	48.054	0.000	0.000	0.000

Overall Stats

CPU Utilization	Job Throughput	Turnaround Time	Avg. Wait Time	Max. RQ Wait Time	Max. IO Wait Time
99.52%	0.00385	2842.01	2063.85	724.125	0.000

Well, fortunately, this solves the VI problem -- the max RQ times of these jobs have dropped to nearly nothing, since they always get scheduled in preference to the CPU-bound jobs.

However, there is a problem -- the CPU-C process is getting starved. Why? Well, think about what happens when there is no IO-Bound process. At first, all CPU processes have the same cpu-burst time. One of them is selected, and then it will get to run to completion before all the others. This is because once it does some work, its CPU burst time is decremented, and is thus less than the others. When it is done, another of the CPU-bound processes is selected at random, and then it is executed to completion. Which CPU-bound process is selected is up to the vagaries of the red-black tree library, and I would put money that CPU-C is at the end of the tree; no process gets put after it.

Is this the right thing? While you're thinking about that, see how SJF performs on job-10-cpu-one-shorter.txt, which is the same as job-10-cpu-vi-wav.txt, except the first CPU-bound process has 179-second CPU bursts instead of 180-second CPU bursts:

ss-sjf job-10-cpu-one-shorter.txt machine-2.txt 100 100 QUIT=1000000

Time: 1000000.000000 -- Processes:

PID	Name	State
1	[CPU-B:000]	Ready
2	[CPU-C:000]	Ready
3	[CPU-D:000]	Ready
4	[CPU-E:000]	Ready
5	[CPU-F:000]	Ready
6	[CPU-G:000]	Ready
7	[CPU-H:000]	Ready
8	[CPU-I:000]	Ready
9	[CPU-J:000]	Ready
3815	[VI-1:282]	Sleeping
3888	[VI-2:266]	Sleeping
3893	[CPU-A:555]	Running
3901	[WAV:2786]	Sleeping

Job Stats (Averages unless specified)

Name	Completed	Elapsed	CPU	RQ	CS	Sleep	IO	IO-Wait	Max RQ	Max IO-Wait
CPU-A	555	1799.144	1789.979	1.085	8.079	0.000	0.001	0.000	0.008	0.000
CPU-B	0	1000000.000	0.000	998884.725	0.069	0.000	0.000	0.000	1440.103	0.000
CPU-C	0	1000000.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000
CPU-D	0	1000000.000	0.000	999604.262	0.070	0.000	0.000	0.000	1440.098	0.000
CPU-E	0	1000000.000	0.000	999964.137	0.069	0.000	0.000	0.000	1440.076	0.000
CPU-F	0	1000000.000	0.000	999244.421	0.069	0.000	0.000	0.000	1440.108	0.000
CPU-G	0	1000000.000	0.000	999784.168	0.069	0.000	0.000	0.000	1440.097	0.000
CPU-H	0	1000000.000	0.000	998704.896	0.070	0.000	0.000	0.000	1440.138	0.000
CPU-I	0	1000000.000	0.000	999064.552	0.070	0.000	0.000	0.000	1440.112	0.000
CPU-J	0	1000000.000	0.000	999424.338	0.069	0.000	0.000	0.000	1440.098	0.000
VI-1	282	1767.495	0.438	0.000	0.178	1766.790	0.088	0.000	0.001	0.000
VI-2	266	1802.754	0.450	0.000	0.182	1802.031	0.090	0.000	0.001	0.000
WAV	2786	358.877	0.031	0.000	0.062	311.620	47.164	0.000	0.000	0.000

Overall Stats

CPU Utilization	Job Throughput	Turnaround Time	Avg. Wait Time	Max. RQ Wait Time	Max. IO Wait Time
99.52%	0.00389	765.32	1.38	1440.138	0.000

Again, the VI and WAV jobs work perfectly. However, since CPU-A's burst time is less than all the others, it gets scheduled in preference to the others, and the others get starved.

You'll note that all the others (with the exception of CPU-C) do get scheduled when CPU-A does its I/O, but that does not affect them, because the I/O time of CPU-A is less than the context-switch overhead. CPU-C is once again caught at the end of the red-black tree, and never gets scheduled.

Now, is it right that one job runs while all the others starve? In terms of throughput, yes. You'll have to think that through for yourself. However, in terms of fairness, no. So I tweaked ss-sjf so that if a process's CPU burst is more than the quantum, then it goes into a queue that gets serviced in a FCFS manner. Here's how it runs on job-10-cpu-one-shorter.txt:

ss-sjf-nostarve job-10-cpu-one-shorter.txt machine-2.txt 100 100 QUIT=1000000

Time: 1000000.000000 -- Processes:

PID	Name	State
3808	[CPU-A:055]	Ready
3826	[CPU-B:055]	Ready
3828	[CPU-G:055]	Ready
3829	[CPU-F:055]	Ready
3831	[CPU-H:055]	Ready
3832	[CPU-E:055]	Ready
3833	[CPU-C:055]	Ready
3834	[CPU-J:055]	Ready
3835	[CPU-D:055]	Running
3838	[CPU-I:055]	Ready
3841	[VI-1:279]	Sleeping
3843	[VI-2:282]	Starting
3854	[WAV:2731]	Sleeping

Job Stats (Averages unless specified)

Name	Completed	Elapsed	CPU	RQ	CS	Sleep	IO	IO-Wait	Max RQ	Max IO-Wait
CPU-A	55	17995.543	1789.998	16197.457	8.087	0.000	0.001	0.000	0.501	0.000
CPU-B	55	18083.681	1799.998	16275.555	8.127	0.000	0.001	0.000	0.500	0.000
CPU-C	55	18093.465	1799.998	16285.336	8.131	0.000	0.001	0.000	0.499	0.000
CPU-D	55	18095.703	1799.998	16287.573	8.131	0.000	0.001	0.000	0.500	0.000
CPU-E	55	18090.966	1799.998	16282.837	8.130	0.000	0.001	0.000	0.500	0.000
CPU-F	55	18090.249	1799.998	16282.120	8.131	0.000	0.001	0.000	0.499	0.000
CPU-G	55	18089.382	1799.998	16281.254	8.129	0.000	0.001	0.000	0.500	0.000
CPU-H	55	18090.680	1799.998	16282.551	8.131	0.000	0.001	0.000	0.501	0.000
CPU-I	55	18099.806	1799.998	16291.672	8.135	0.000	0.001	0.000	0.500	0.000
CPU-J	55	18093.599	1799.998	16285.469	8.131	0.000	0.001	0.000	0.500	0.000
VI-1	279	1850.821	0.462	0.000	0.187	1850.080	0.092	0.000	0.001	0.000
VI-2	282	1702.365	0.424	0.000	0.172	1701.684	0.085	0.000	0.001	0.000
WAV	2731	366.138	0.032	0.000	0.064	317.925	48.118	0.000	0.000	0.000

Overall Stats

CPU Utilization	Job Throughput	Turnaround Time	Avg. Wait Time	Max. RQ Wait Time	Max. IO Wait Time
99.52%	0.00384	3108.18	2331.10	0.501	0.000

That looks a lot better, doesn't it?

Test 5 -- We are not omniscient.

Unfortunately, in a real scheduler, we don't know how long the CPU bursts take. So we need to predict. The book gives a prediction algorithm based on exponential decay:

P = (alpha) lastburst + (1 - alpha) P

P starts with an initial value (we set it to zero). Alpha is a number between 0 and 1. If 0, then our prediction is always the initial value of P. If 1, then our prediction is always the last cpu burst. Otherwise, it's some combination of the initial value and the most recent CPU bursts. Obviously, the closer that alpha is to one, the more it weighs the most recent bursts. If you look at the equation, it is an exponentially decaying equation -- meaning that the most recent CPU bursts are always weighed more than the less recent ones.

I've hacked this up in ss-exp, where alpha is compiled in. Here are a few values of alpha:

ss-exp job-10-cpu-one-shorter.txt machine-2.txt 100 100 QUIT=1000000 -- ALPHA=0

Time: 1000000.000000 -- Processes:

PID	Name	State
2	[CPU-C:000]	Ready
3166	[CPU-A:061]	Ready
3180	[CPU-I:061]	Ready
3181	[CPU-B:061]	Ready
3182	[CPU-F:061]	Ready
3183	[CPU-H:061]	Running
3185	[CPU-G:061]	Ready
3186	[CPU-D:061]	Ready
3187	[CPU-E:061]	Ready
3193	[CPU-J:061]	Ready
3204	[WAV:2139]	Sleeping
3205	[VI-2:255]	Starting
3206	[VI-1:251]	Starting

Job Stats (Averages unless specified)

Name	Completed	Elapsed	CPU	RQ	CS	Sleep	IO	IO-Wait	Max RQ	Max IO-Wait
CPU-A	61	16179.135	1790.000	14381.222	7.912	0.000	0.001	0.000	0.404	0.000
CPU-B	61	16275.727	1800.000	14467.766	7.960	0.000	0.001	0.000	0.404	0.000
CPU-C	0	1000000.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000
CPU-D	61	16279.946	1800.000	14471.983	7.962	0.000	0.001	0.000	0.404	0.000
CPU-E	61	16285.371	1800.000	14477.406	7.964	0.000	0.001	0.000	0.404	0.000
CPU-F	61	16277.357	1800.000	14469.396	7.961	0.000	0.001	0.000	0.403	0.000
CPU-G	61	16279.455	1800.000	14471.493	7.961	0.000	0.001	0.000	0.404	0.000
CPU-H	61	16277.934	1800.000	14469.972	7.961	0.000	0.001	0.000	0.404	0.000
CPU-I	61	16273.796	1800.000	14465.836	7.959	0.000	0.001	0.000	0.404	0.000
CPU-J	61	16302.120	1800.000	14494.147	7.972	0.000	0.001	0.000	0.403	0.000
VI-1	251	2097.899	0.447	308.569	0.180	1788.613	0.089	0.000	0.451	0.000
VI-2	255	2230.469	0.474	327.345	0.190	1902.365	0.095	0.000	0.452	0.000
WAV	2139	467.093	0.030	116.246	0.061	304.647	46.109	0.000	0.452	0.000

Overall Stats

CPU Utilization	Job Throughput	Turnaround Time	Avg. Wait Time	Max. RQ Wait Time	Max. IO Wait Time
99.54%	0.00319	3452.33	2615.68	0.452	0.000

As you can see, since the prediction is always 0 (the initial value), all processes will have the same prediction, and the process that gets scheduled is the one lucky enough to be at the front of the red-black tree. The unfortunate process (CPU-C) gets starved.

Here are the results when alpha is 1.0, and we only look at the last CPU burst:

ss-exp job-10-cpu-one-shorter.txt machine-2.txt 100 100 QUIT=1000000 -- ALPHA = 1

Time: 1000000.000000 -- Processes:

PID	Name	State
2049	[CPU-I:113]	Ready
3881	[CPU-A:033]	Ready
3885	[CPU-B:037]	Ready
3889	[CPU-C:042]	Ready
3901	[CPU-J:035]	Running
3902	[CPU-G:035]	Ready
3903	[CPU-E:035]	Ready
3904	[CPU-D:035]	Ready
3907	[CPU-H:145]	Ready
3933	[VI-1:294]	Starting
3936	[VI-2:270]	Sleeping
3938	[CPU-F:036]	Ready
3939	[WAV:2817]	Sleeping

Job Stats (Averages unless specified)

Name	Completed	Elapsed	CPU	RQ	CS	Sleep	IO	IO-Wait	Max RQ	Max IO-Wait
CPU-A	33	29826.609	1789.997	28028.510	8.101	0.000	0.001	0.000	491520.500	0.000
CPU-B	37	26633.161	1799.998	24825.024	8.138	0.000	0.001	0.000	393216.600	0.000
CPU-C	42	23486.506	1799.997	21678.371	8.137	0.000	0.001	0.000	393216.600	0.000
CPU-D	35	28268.342	1800.000	26460.195	8.146	0.000	0.001	0.000	393216.600	0.000
CPU-E	35	28268.160	1800.000	26460.013	8.146	0.000	0.001	0.000	393216.600	0.000
CPU-F	36	27762.958	1799.999	25954.809	8.148	0.000	0.001	0.000	393216.600	0.000
CPU-G	35	28267.211	1800.000	26459.062	8.148	0.000	0.001	0.000	393216.600	0.000
CPU-H	145	6828.059	1799.984	5019.935	8.139	0.000	0.001	0.000	24576.450	0.000
CPU-I	113	4608.716	1799.980	2800.600	8.136	0.000	0.001	0.000	30720.350	0.000
CPU-J	35	28261.093	1800.000	26452.946	8.146	0.000	0.001	0.000	393216.600	0.000
VI-1	294	1793.394	0.445	0.602	0.181	1792.077	0.090	0.000	0.093	0.000
VI-2	270	1844.981	0.458	0.591	0.186	1843.655	0.092	0.000	0.053	0.000
WAV	2817	354.949	0.031	0.159	0.062	308.070	46.627	0.000	0.056	0.000

Overall Stats

CPU Utilization	Job Throughput	Turnaround Time	Avg. Wait Time	Max. RQ Wait Time	Max. IO Wait Time
99.52%	0.00393	2915.46	2149.81	491520.500	0.000

This is much better, since the IO-Bound processes get the CPU in preference to the CPU-bound processes. However, the CPU-bound processes appear to be slave to the luck of the red-black tree. Some of them (CPU-H and CPU-I) are getting scheduled more than the others. I haven't done any tracing to see why this is the case.

Here we set alpha to 0.5. The performance looks pretty much identical to when alpha is 1, with one main difference -- the VI/WAV processes have smaller Max wait times. Think about why? (The answer is what happens when the CPU-Bound process does IO, and looks like a better process to schedule, perhaps, than a VI/WAV process).

ss-exp job-10-cpu-one-shorter.txt machine-2.txt 100 100 QUIT=1000000 - ALPHA = 0.5

Time: 1000000.000000 -- Processes:

PID	Name	State
2199	[CPU-A:007]	Ready
2363	[CPU-D:011]	Ready
2610	[CPU-E:031]	Ready
2663	[CPU-G:146]	Ready
2932	[CPU-J:026]	Ready
3062	[CPU-H:025]	Ready
3680	[CPU-I:030]	Ready
3839	[CPU-F:091]	Ready
3932	[VI-1:278]	Sleeping
3933	[CPU-C:153]	Running
3934	[CPU-B:028]	Ready
3935	[WAV:2796]	Sleeping
3936	[VI-2:302]	Starting

Job Stats (Averages unless specified)

Name	Completed	Elapsed	CPU	RQ	CS	Sleep	IO	IO-Wait	Max RQ	Max IO-Wait
CPU-A	7	80427.720	1789.982	78629.684	8.053	0.000	0.001	0.000	489903.924	0.000
CPU-B	28	35704.675	1799.979	33896.577	8.117	0.000	0.001	0.000	457083.407	0.000
CPU-C	153	6532.803	1799.980	4724.686	8.136	0.000	0.001	0.000	268510.822	0.000
CPU-D	11	54949.375	1799.978	53141.265	8.131	0.000	0.001	0.000	487968.439	0.000
CPU-E	31	21425.752	1799.980	19617.651	8.119	0.000	0.001	0.000	400598.916	0.000
CPU-F	91	10728.378	1799.978	8920.276	8.122	0.000	0.001	0.000	460242.783	0.000
CPU-G	146	4630.732	1799.980	2822.624	8.128	0.000	0.001	0.000	38335.150	0.000
CPU-H	25	31185.983	1799.978	29377.894	8.109	0.000	0.001	0.000	461193.262	0.000
CPU-I	30	31122.171	1799.980	29314.073	8.117	0.000	0.001	0.000	399565.231	0.000
CPU-J	26	28719.107	1799.978	26911.018	8.110	0.000	0.001	0.000	434248.020	0.000
VI-1	278	1841.408	0.460	0.000	0.186	1840.670	0.092	0.000	0.003	0.000
VI-2	302	1529.457	0.380	0.000	0.155	1528.845	0.076	0.000	0.002	0.000
WAV	2796	357.575	0.031	0.000	0.062	310.489	46.993	0.000	0.001	0.000

Overall Stats

CPU Utilization	Job Throughput	Turnaround Time	Avg. Wait Time	Max. RQ Wait Time	Max. IO Wait Time
99.52%	0.00392	2527.23	1772.99	489903.924	0.000

Check out the wild asymmetry with the CPU-bound processes. Think about why that would be the case.

Test 6 -- Predictions with a threshhold

As with our "nostarve" version of SJF, we'd like to treat those CPU-bound jobs fairly, and clearly the CPU burst prediction is not enough. So, what I've done is modify the scheduler so that all predictions that are over a threshhold (90% of the timer quantum) get put into a queue which is serviced in a FCFS manner. The results (with alpha equal to 0.5) are excellent:

ss-exp-thresh job-10-cpu-one-shorter.txt machine-2.txt 100 100 QUIT=1000000 -- ALPHA = 0.5

Time: 1000000.000000 -- Processes:

PID	Name	State
3861	[CPU-A:055]	Ready
3875	[CPU-D:055]	Ready
3883	[CPU-I:055]	Ready
3885	[CPU-J:055]	Running
3886	[CPU-B:055]	Ready
3888	[CPU-C:055]	Ready
3889	[CPU-E:055]	Ready
3890	[CPU-F:055]	Ready
3891	[CPU-G:055]	Ready
3892	[CPU-H:055]	Ready
3907	[VI-1:284]	Starting
3908	[WAV:2763]	Sleeping
3909	[VI-2:300]	Starting

Job Stats (Averages unless specified)

Name	Completed	Elapsed	CPU	RQ	CS	Sleep	IO	IO-Wait	Max RQ	Max IO-Wait
CPU-A	55	17985.957	1789.997	16187.859	8.100	0.000	0.001	0.000	5.800	0.000
CPU-B	55	18091.333	1799.996	16283.192	8.144	0.000	0.001	0.000	6.050	0.000
CPU-C	55	18097.953	1799.996	16289.812	8.143	0.000	0.001	0.000	5.750	0.000
CPU-D	55	18056.583	1799.997	16248.438	8.148	0.000	0.001	0.000	5.750	0.000
CPU-E	55	18101.097	1799.997	16292.956	8.143	0.000	0.001	0.000	4.900	0.000
CPU-F	55	18105.307	1799.997	16297.166	8.143	0.000	0.001	0.000	5.800	0.000
CPU-G	55	18109.048	1799.997	16300.907	8.143	0.000	0.001	0.000	4.750	0.000
CPU-H	55	18109.588	1799.996	16301.448	8.142	0.000	0.001	0.000	5.800	0.000
CPU-I	55	18084.637	1799.996	16276.493	8.146	0.000	0.001	0.000	5.800	0.000
CPU-J	55	18087.457	1799.996	16279.315	8.145	0.000	0.001	0.000	5.800	0.000
VI-1	284	1921.134	0.479	0.000	0.194	1920.364	0.096	0.000	0.049	0.000
VI-2	300	1670.144	0.415	0.000	0.169	1669.477	0.083	0.000	0.002	0.000
WAV	2763	361.856	0.031	0.000	0.063	314.206	47.556	0.000	0.001	0.000

Overall Stats

CPU Utilization	Job Throughput	Turnaround Time	Avg. Wait Time	Max. RQ Wait Time	Max. IO Wait Time
99.52%	0.00390	3077.25	2298.29	6.050	0.000

Test 7 -- Using a Multilevel Feedback Queue

Finally, the only part of this that is unoptimal is the context switch overhead of the CPU-bound processes. This is high because these processes are being time-sliced at the granularity of one time quantum, which means that every 50 ms of CPU time is mitigated by .2 ms of context switch overhead. To fix this, we can use a multilevel feedback queue both to partition the jobs into IO and CPU-bound queues, and to give larger time quanta to the CPU-bound queue.

First, ss-mlfq-1 implements a three-level queue. The first level is where processes go that have either just started running (due to starting, or due to sleeping/IO) or gave up the CPU voluntarily due to a non-timer event. The second level is for processes that have been evicted from the CPU due to one timer interrupt. The third level is for processes that have been evicted due to multiple timer interrupts. These are the CPU-bound jobs. The scheduler processes each queue in a round-robin fashion, but it only gives the CPU to a job in the second-level queue if there are none in the first-level queue. Similarly, it only gives the CPU to a job in the third-level queue if there are none in the other queues.

Here is the result of running ss-mlfq-1:

ss-mlfq-1 job-10-cpu-one-shorter.txt machine-2.txt 100 100 QUIT=1000000

Time: 1000000.000000 -- Processes:

PID	Name	State
3892	[CPU-A:055]	Ready
3909	[CPU-C:055]	Ready
3910	[CPU-D:055]	Ready
3911	[CPU-F:055]	Ready
3912	[CPU-I:055]	Ready
3913	[CPU-H:055]	Running
3914	[CPU-J:055]	Ready
3915	[CPU-B:055]	Ready
3916	[CPU-E:055]	Ready
3919	[CPU-G:055]	Ready
3922	[VI-1:263]	Starting
3934	[VI-2:291]	Starting
3936	[WAV:2820]	Sleeping

Job Stats (Averages unless specified)

Name	Completed	Elapsed	CPU	RQ	CS	Sleep	IO	IO-Wait	Max RQ	Max IO-Wait
CPU-A	55	18000.294	1790.000	16202.224	8.069	0.000	0.001	0.000	0.500	0.000
CPU-B	55	18093.843	1800.000	16285.730	8.112	0.000	0.001	0.000	0.500	0.000
CPU-C	55	18085.517	1800.000	16277.407	8.109	0.000	0.001	0.000	0.542	0.000
CPU-D	55	18085.738	1800.000	16277.629	8.108	0.000	0.001	0.000	0.500	0.000
CPU-E	55	18095.912	1800.000	16287.797	8.114	0.000	0.001	0.000	0.500	0.000
CPU-F	55	18087.531	1800.000	16279.420	8.110	0.000	0.001	0.000	0.500	0.000
CPU-G	55	18098.993	1800.000	16290.877	8.114	0.000	0.001	0.000	0.543	0.000
CPU-H	55	18088.812	1800.000	16280.702	8.110	0.000	0.001	0.000	0.500	0.000
CPU-I	55	18087.600	1800.000	16279.491	8.109	0.000	0.001	0.000	0.500	0.000
CPU-J	55	18093.169	1800.000	16285.057	8.112	0.000	0.001	0.000	0.500	0.000
VI-1	263	1717.583	0.428	0.000	0.172	1716.897	0.086	0.000	0.050	0.000
VI-2	291	1657.332	0.414	0.001	0.166	1656.668	0.083	0.000	0.050	0.000
WAV	2820	354.542	0.031	0.000	0.062	307.855	46.594	0.000	0.050	0.000

Overall Stats

CPU Utilization	Job Throughput	Turnaround Time	Avg. Wait Time	Max. RQ Wait Time	Max. IO Wait Time
99.52%	0.00392	3027.21	2282.31	0.543	0.000

Note, this doesn't solve the context-switch problem, but it does work nicely, just like our predictive scheduler.

Now, ss-mlfq-2 is just like ss-mlfq-1, except that it only evicts a CPU-bound job from the CPU if there is a job on another queue, or if 10 time quanta have passed since it was scheduled. In that way, the context-switch overhead of CPU-bound jobs should be reduced:

ss-mlfq-2 job-10-cpu-one-shorter.txt machine-2.txt 100 100 QUIT=1000000

Time: 1000000.000000 -- Processes:

PID	Name	State
3911	[CPU-A:055]	Ready
3931	[CPU-J:055]	Ready
3933	[CPU-G:055]	Ready
3934	[CPU-C:055]	Ready
3935	[CPU-B:055]	Running
3939	[CPU-I:055]	Ready
3940	[CPU-D:055]	Ready
3942	[CPU-E:055]	Ready
3944	[CPU-F:055]	Ready
3950	[CPU-H:055]	Ready
3953	[VI-1:276]	Sleeping
3969	[VI-2:289]	Sleeping
3972	[WAV:2845]	Sleeping

Job Stats (Averages unless specified)

Name	Completed	Elapsed	CPU	RQ	CS	Sleep	IO	IO-Wait	Max RQ	Max IO-Wait
CPU-A	55	17909.054	1790.000	16118.001	1.052	0.000	0.001	0.000	3.795	0.000
CPU-B	55	18015.246	1800.000	16214.188	1.058	0.000	0.001	0.000	3.779	0.000
CPU-C	55	18012.957	1800.000	16211.899	1.057	0.000	0.001	0.000	3.750	0.000
CPU-D	55	18024.064	1800.000	16223.005	1.058	0.000	0.001	0.000	3.750	0.000
CPU-E	55	18029.074	1800.000	16228.015	1.059	0.000	0.001	0.000	3.769	0.000
CPU-F	55	18032.209	1800.000	16231.149	1.059	0.000	0.001	0.000	3.797	0.000
CPU-G	55	18011.490	1800.000	16210.431	1.057	0.000	0.001	0.000	3.750	0.000
CPU-H	55	18057.137	1800.000	16256.076	1.060	0.000	0.001	0.000	3.760	0.000
CPU-I	55	18024.013	1800.000	16222.954	1.058	0.000	0.001	0.000	3.759	0.000
CPU-J	55	18000.321	1800.000	16199.263	1.057	0.000	0.001	0.000	3.764	0.000
VI-1	276	1846.165	0.460	0.001	0.184	1845.428	0.092	0.000	0.050	0.000
VI-2	289	1705.380	0.427	0.001	0.171	1704.695	0.085	0.000	0.050	0.000
WAV	2845	351.397	0.031	0.000	0.061	305.124	46.181	0.000	0.050	0.000

Overall Stats

CPU Utilization	Job Throughput	Turnaround Time	Avg. Wait Time	Max. RQ Wait Time	Max. IO Wait Time
99.91%	0.00396	3007.19	2251.81	3.797	0.000

Note two things here -- first, the context switch overhead has indeed been reduced. Second, as a result, the CPU utilization has gone up, the job throughput has gone up, and the turnaround time has gone down. In fact, the only metric that has gone the "wrong way" is the Max RQ time, which has increased dramatically. However, since these CPU-bound jobs should not care about response time, this is not really important.

As a final aside, if I change the quantum for CPU-bound jobs to 100 times the regular quantum, there is improvement, but not ten-fold:

Time: 1000000.000000 -- Processes:

PID	Name	State
3902	[CPU-A:055]	Ready
3914	[CPU-C:055]	Ready
3921	[CPU-F:055]	Running
3922	[CPU-B:055]	Ready
3925	[CPU-E:055]	Ready
3930	[CPU-G:055]	Ready
3933	[CPU-J:055]	Ready
3935	[CPU-H:055]	Ready
3940	[CPU-D:055]	Ready
3943	[CPU-I:055]	Ready
3967	[WAV:2839]	Sleeping
3968	[VI-1:296]	Starting
3969	[VI-2:272]	Starting

Job Stats (Averages unless specified)

Name	Completed	Elapsed	CPU	RQ	CS	Sleep	IO	IO-Wait	Max RQ	Max IO-Wait
CPU-A	55	17900.692	1790.000	16110.203	0.488	0.000	0.001	0.000	10.453	0.000
CPU-B	55	18003.654	1800.000	16203.162	0.491	0.000	0.001	0.000	10.450	0.000
CPU-C	55	17973.835	1800.000	16173.344	0.491	0.000	0.001	0.000	10.440	0.000
CPU-D	55	18049.502	1800.000	16249.009	0.492	0.000	0.001	0.000	10.463	0.000
CPU-E	55	18008.011	1800.000	16207.519	0.492	0.000	0.001	0.000	10.452	0.000
CPU-F	55	18002.155	1800.000	16201.663	0.491	0.000	0.001	0.000	10.450	0.000
CPU-G	55	18009.626	1800.000	16209.133	0.492	0.000	0.001	0.000	10.444	0.000
CPU-H	55	18025.738	1800.000	16225.245	0.492	0.000	0.001	0.000	10.445	0.000
CPU-I	55	18062.129	1800.000	16261.635	0.493	0.000	0.001	0.000	10.437	0.000
CPU-J	55	18021.723	1800.000	16221.230	0.492	0.000	0.001	0.000	10.447	0.000
VI-1	296	1597.780	0.401	0.001	0.160	1597.137	0.080	0.000	0.050	0.000
VI-2	272	1867.860	0.465	0.000	0.187	1867.115	0.093	0.000	0.042	0.000
WAV	2839	351.925	0.031	0.000	0.061	305.583	46.250	0.000	0.050	0.000

Overall Stats

CPU Utilization	Job Throughput	Turnaround Time	Avg. Wait Time	Max. RQ Wait Time	Max. IO Wait Time
99.95%	0.00396	3003.10	2252.71	10.463	0.000

There should be one final fix, in my opinion. I think that jobs on the first queue should be ordered by prediction, so that when the CPU bounds revert to the top queue, they are scheduled with lower priority than the I/O bound jobs. That will lower the Max RQ time, most likely.