From rich@cs.utk.edu Fri Feb 19 16:33:58 1999 Return-Path: Received: from fzr600.yamaha (root@YAMAHA.CS.UTK.EDU [128.169.93.175]) by CS.UTK.EDU with ESMTP (cf v2.9s-UTK) id QAA24073; Fri, 19 Feb 1999 16:33:56 -0500 (EST) Received: by cs.utk.edu via send-mail from stdin id (Debian Smail3.2.0.102) for plank@cs.utk.edu; Fri, 19 Feb 1999 16:34:50 -0500 (EST) Message-Id: Date: Fri, 19 Feb 1999 16:34:50 -0500 (EST) From: rich@cs.utk.edu (Rich the Wolski) To: plank@cs.utk.edu Subject: STIGMATA 99 reviews Status: RO >From pc99@cs.usask.ca Wed Jan 13 11:54 PST 1999 Received: from cs.usask.ca (cs.usask.ca [128.233.130.77]) by gremlin.ucsd.edu (8.9.1a/8.9.1) with ESMTP id LAA23909 for ; Wed, 13 Jan 1999 11:54:31 -0800 (PST) Received: from skorpio.usask.ca (skorpio.usask.ca [128.233.128.5]) by cs.usask.ca (8.9.0/8.9.0) with ESMTP id NAA11620 for ; Wed, 13 Jan 1999 13:59:44 -0600 (CST) Received: (from pc99@localhost) by skorpio.usask.ca (8.9.0/8.9.0) id NAA02444; Wed, 13 Jan 1999 13:54:28 -0600 (CST) Date: Wed, 13 Jan 1999 13:54:28 -0600 (CST) From: Program Chair - SIGMETRICS 99 Message-Id: <199901131954.NAA02444@skorpio.usask.ca> To: rich@cs.ucsd.edu Subject: 1999 ACM SIGMETRICS paper 1035.5144 reviews Content-Type: text Content-Length: 22256 Status: O ----- Review of Paper 1035.5144 ------- Title: Predicting the CPU Availability of Time-shared Unix Systems Reviewer type : PC member Reviewer: 0-0 Originality: [ 2] Technical merit: [ 2] Readability: [ 3] Relevance: [ 3] Overall rating: [ 2] Recommended Action: Weak Reject ---Comments for the Author--- In this paper, the authors discussed forecasts of CPU availability on time-shared Unix systems. In general, I feel although the application that the authors studied is very interesting, the technical contents of this paper are weak and need to be improved, In the first part of the paper, the authors discussed measurement using "uptime", "vmstat" and a probe process. The equations (1) and (5) are not clear to me. I am not convinced the forms of these two equations and how they related to each other. It seems that the cpu_availability computed in Equations (1) and (2) are not the same thing as that of the probe process. Therefore, I doubt whether the measurement accuracy section (section 2.2) is meaningful. Another factor effecting the accuracy of measurement is the stochastic nature of a probe process and the two measures. However, this was not considered by the authors. In the second part of the paper, the authors discussed results of prediction. By looking at the figures (1) and (3), it looks like that the CPU_availability has periodic components as evidenced by auto-correlation function. This should be explained. Also, since the authors used one-day-long data, I doubt whether the data is stationary and whether the approaches generally used for a stationary process is still valid. In addition, the authors stated "self-similarity is often interpreted as an indication of unpredictability". I would not agree this statement because it has been known that long-range dependency may help prediction (see reference [1]). [1] Jan Beran. "Statistics for Long-Memory Processes", Chapman & Hall, 1994. ----- Review of Paper 1035.5144 ------- Title: Predicting the CPU Availability of Time-shared Unix Systems Reviewer type : PC member Reviewer: 0-1 Originality: [2] Technical merit: [2] Readability: [3] Relevance: [3] Overall rating: [2] Recommended Action: Weak Reject ---Comments for the Author--- This paper considers the problem of predicting available CPU performance to support dynamic schedulers, motivated partly by recent advancements in distributed system environments. The basic problem I have with this paper is that it makes a relatively small technical contribution to the field that is not sufficient to warrant acceptance in the Sigmetrics program. Most of the methods used in section 3 assume that the time series is stationary, but the authors do not seem to have tested for stationarity. These tests must be performed (if they haven't been) and the paper needs to describe the results of these tests. If the time series data is not stationary (which wouldn't be surprising), then most of the methods in section 3 are not being correctly applied. Also, were the experiments presented in section 2 repeated for different days and periods? If so, then these results should be presented. If not, then the authors should conduct more than a single set of experiments and present these results in section 2. ----- Review of Paper 1035.5144 ------- Title: Predicting the CPU Availability of Time-shared Unix Systems Reviewer type : PC member Reviewer: 1-0 Originality: [1] Technical merit: [1] Readability: [2] Relevance: [1] Overall rating: [1] Recommended Action: Reject ---Comments for the Author--- This paper presents a measurement study on the efficacy of a method for measuring CPU availability in order to effectively schedule applications in a meta-computing setting. Via measurements, the author(s) illustrate the effectiveness of two Unix utilities to measure load versus the NWS sensor that combines both Unix utilities to compute the network load. The experimental platform is composed by three graduate student workstations and three departmental servers at UCSD. The author(s) report the mean error of the three methods during a 24 hour period, they continue by presenting the forecasting of CPU availability using the three methodologies, and conclude that the prediction error is at an acceptable level. Finally, they characterize the degree of workload self-similarity in both short and long term. Their conclusion that recent history is often a good prediction of short-term future is not surprising given the nature of burstiness of workload that has been reported by many previous studies on similar settings. My main reservation has to do with the general approach. Isn't it more useful to be able to predict how much time each of the executing jobs still needs to finish? It seems that being able to predict the remaining execution time of the job (in the same spirit as in the Harchol-Walter&Downey sigmetrics'96 paper) would be of great use to scheduling in a meta-computing setting. The analysis of the self-similar behavior (especially the long-term behavior) does not seem to be as relevant as the remaining execution time of the current jobs, especially when we are approaching the problem purely from the scheduling point of view. Characterization studies are by nature inductive, covering only one set of the possibilities. The referee's suggestion is that the authors should experiment more (i.e., looking at more machines of different types and at various time-frames -- could the reported results change from day to day?) to support better their conclusions and to give more validity to their observations. For instance, reporting only the mean error on measurements across the whole 24 hour period can be misleading -- at some observation instances, it is possible that errors are really high while at others errors are not as bad (e.g., it will be interesting to report the variance of the measurement errors on table 1). If such a situation occurs, the scheduler can be mislead. ----- Review of Paper 1035.5144 ------- Title: Predicting the CPU Availability of Time-shared Unix Systems Reviewer type : referee Reviewer: 2-0 Originality: [3] Technical merit: [3] Readability: [3] Relevance: [4] Overall rating: [3] Recommended Action: Weak Accept ---Comments for the Author--- (Summary of my understanding of the paper) This paper studies short (10 second horizon) and medium term (5 minute horizon) forecasting of the CPU availability of time-shared Unix systems. The authors begin by showing that CPU availability, by which they mean the percentage of the machine cycles that a potential new normal priority process would be served, can be accurately measured using information provided by the standard uptime and vmstat utilities under normal conditions. They introduce a hybrid sensor that performs more accurately when there are reduced priority processes, but find that it has difficulty detecting long-running processes. Next, they evaluate the one-step-ahead (10 seconds into the future) prediction error of prediction algorithms implemented in the Network Weather Service on a small collection of measurement traces with a 10 second granularity. The main result here is that the one-step-ahead prediction error is on par with the measurement error, which bodes well for using prediction. However, they find that their traces exhibit self-similarity (an effect noted by others), which leads them to believe that longer term prediction may be difficult. Happily, this does not appear to be the case, and they are able to make useful predictions of the average availability over the next five minutes. (Comments) I'm most impressed by the first part of the paper. Showing that one can (in most cases) accurately estimate the percentage of the CPU a new process would get from measurements of the load average and other metrics that are available at user-level is a useful contribution. Although the sample size from which the conclusion is drawn is somewhat small (six machines, 24 hours), the result jibes with my experience. Given the data, it is difficult to judge how useful the hybrid sensor is. The second part of the paper, which studies prediction, is less convincing. Although I am familiar with the prediction methods used in the Network Weather Service, I'm at a loss as to which of these is actually being studied in this paper. It is indeed very interesting that one-step-ahead (10 second) prediction errors for CPU availability are about the same as the measurement error. However, the authors don't offer an explanation for why this is so, and in the absence of a description of the prediction method, it is difficult for the reader to consider. Perhaps prediction decorrelates measurement error which results in some gain for short term predictions? It would be very useful to see what "gain" prediction provides over the raw variance of the signal. The point of prediction is to provide a more tightly bounded, high confidence estimate of future availability. I would like to see how much tighter the bounds are given prediction. The study of the autocorrelation structure of CPU availability and the confirmation of self-similarity is interesting. However, Section 3.2 addresses one-step-prediction errors for the series aggregated over 5 minutes, which is not exactly the "medium term" I was expecting from the abstract. I was expecting an analysis of k-step-ahead errors on the original series. It isn't too surprising that the aggregated series doesn't become vastly less predictable. Suppose the prediction error for k-ahead is N(0,s_k), then prediction error for the n step ahead aggregate is N(0,sqrt(sum(s_k^2,k=1..n))), which for s_k=S is N(0,sqrt(n)*S) - ie, the error grows as O(sqrt(n)) with aggregation level n. Of course, how true this analysis is depends on how white and normal the prediction errors are, which isn't explained. (Minor nitpicks) pp. 2, last line: chaotic systems are not the simplest systems that display self-similarity - let's not throw out fractional ARIMAs, FGNs, etc. just yet. Furthermore, they are not necessarily unpredictable. See, for example: @Book{ABARBANEL-CHAOTIC-DATA-BOOK, author = "Henry Abarbanel", title = "Analysis of Observed Chaotic Data", publisher = "Springer", year = "1996", series = "Institute for Nonlinear Science", } pp. 4: While the image of grad students devoting themselves to research at the end of the semester was hilarious, I would have prefered to also have a more detailed description of the machines and workloads. Furthermore, from your description, it seems like the data is really representative of your department's end-of-the-semester behavior, not the overall behavior. Reporting absolute error as a percentage is confusing. I know what you meant, but I had to keep reminding myself. I really would have liked to see a comparison of prediction error and the variance of what was being predicted. pp. 9: I believe that this known as a multiple experts problem in AI. You may want to look at @InProceedings{ON-LINE-LEARNING-MTS-PROCESS-MIGRATION-CBURCH-COLT97, author = "Avrim Blum and Carl Burch", title = "On-line Learning and the Metrical Task System Problem", pages = "45--53", booktitle = "Proceedings of the 10th Annual Conference on Computational Learning Theory ({COLT} '97)", year = "1997", } pp. 12: I would really avoid dragging in chaos when it may not be necessary. The simplest self-similar systems are not chaotic, and many chaotic systems are predictable. Using graphs instead of tables would help the presentation considerably. Some refs you may want to consider adding: @Article{WORKSTATION-AVAIL-STATS-CONDOR, author = "Matt W. Mutka and Miron Livny", title = "The Available Capacity of a Privately Owned Workstation Environment", journal = "Performance Evaluation", year = "1991", volume = "12", number = "4", pages = "269--284", month = "July", } @TechReport{DINDA-LOAD-PRED-TR-98, author = "Peter A. Dinda and David R. O'Hallaron", title = "An Evaluation of Linear Models for Host Load Prediction", institution = "School of Computer Science, Carnegie Mellon University", year = "1998", number = "CMU-CS-TR-98-148", month = "November", } @Unpublished{PRED-BASED-SCHED-DIST-COMP-SAMADANI-UNPUB-96, author = "Mehrdad Samadani and Erich Kalthofen", title = "On Distributed Scheduling Using Load Prediction From Past Information", note = "Abstracts published in Proceedings of the 14th annual {ACM} Symposium on the Principles of Distributed Computing (PODC'95, pp. 261) and in the Third Workshop on Languages, Compilers and Run-time Systems for Scalable Computers (LCR'95, pp. 317--320)", year = "1996", } ----- Review of Paper 1035.5144 ------- Title: Predicting the CPU Availability of Time-shared Unix Systems Reviewer type : PC member Reviewer: 3-0 Originality: [2] Technical merit: [2] Readability: [3] Relevance: [2] Overall rating: [2] Recommended Action: Weak Reject ---Comments for the Author--- In this paper, the authors study the problem of making short term and medium term forecasts of CPU availability on time-shared UNIX systems. The authors show that (a) simple techniques are reasonably effective in predicting short-term CPU loads/availability (b) CPU load traces exhibit self-similarity. The authors claim that conclusion (a) above is surprising. On the other hand, most of the classical studies on load-sharing in distributed systems (the Eager, Lazowska, Zahorjan paper for example) take for granted that current load is a reasonable indicator of load in the near future. While the authors have studied the problem of short term load prediction more thoroughly, in my opinion, they have merely confirmed the conventional wisdom. The fact that the authors work is motivated by "application level scheduling" for parallel programs -- as opposed to load sharing in distributed systems -- does not affect the problem of load prediction. As to conclusion (b) above regarding self-similarity, the authors state that self-similar behavior does not seem to affect the effectiveness of short-term load prediction. As such, this result seems to have no ramifications for cpu scheduling, etc. At this stage, when all kinds of computer systems phenomena (file traffic, network traffic, etc.) have been shown to self similar, yet another self similarity result is not interesting by itself. Some other concerns/comments: 1) Both the measurement techniques presented in the paper appear to not work in certain situations -- as pointed out by the authors. Another situation in which the "NWS-hybrid" technique will probably be inaccurate is when the host being measured is running I/O intensive applications. 2) The authors analysis is based on a single 24 hour period. They should verify their results for additional days. Overall, I believe that while this paper is interesting, its research contribution is small. ----- Review of Paper 1035.5144 ------- Title: Predicting the CPU Availability of Time-shared Unix Systems Reviewer type : PC member Reviewer: 4-0 Originality: [2] Technical merit: [3] Readability: [3] Relevance: [3] Overall rating: [3] Recommended Action: Weak Accept ---Comments for the Author--- Summary: The paper show how simple estimates of future CPU availability can be based on past behaviour with suprising accuracy. The show that both the next 10 seconds and the next 5 minutes estimates to be in error by about 10% for several unix workstations. The authors claim that such information will be helpful in guiding metacomputing scheduling decisions. Comments: - I question whether a 5 minute window is really very useful for metacomputing. This is my major objection to the paper. In a local area network with a fast interconnect this may be fine, but in a distributed system the overhead to start a remote computation may require a good estimate for a longer window of time. - The proposed techniques in this paper seem to be only an epsilon contribution beyond those in references 29-31. ----- Review of Paper 1035.5144 ------- Title: Predicting the CPU Availability of Time-shared Unix Systems Reviewer type : referee Reviewer: 5-0 Originality: [2] Technical merit: [2] Readability: [3] Relevance: [3] Overall rating: [2] Recommended Action: Weak Reject ---Comments for the Author--- (Minor nitpick about the paper's submission process: it was all but impossible to not figure out who wrote the paper, given their repeated mention of UCSD and the previous NWS work, and their direct citation of the NWS stuff by Wolski et. al. Oops! No matter.) Generally, this paper is good. I believed the premise, and it did a fine job of treating both the strengths and the weaknesses of the approach. This adds credibility to the work. But, there are three big weaknesses to this paper, in my opinion: 1. The paper promises to on their availbility prediction mechanisms and the efficacy of their approach. About 1/2 the paper delivers on this, but more than 7 pages are devoted to a self-similarity philosophical discussion. While there is some place for this in this paper, I think it needs to be greatly reduced in emphasis, as it distracts and detracts from the overall message of the paper. Also, there are some technical problems with their self-similarity discussion: a. All of the self-similarity measurement tools they mention (e.g. Pox plots) depend on the process being measured exhibiting stationarity (i.e. the average of the time series doesn't change over time). Even the definition of self-similarity depends on this feature. CPU load is likely stationary over millisecond to many minutes or small number of hours time scales, depending on the nature of the jobs being run, but stationarity almost certainly goes out the window in day or more time scales, given that daily cycles and such kick in. It's not meaningful to talk about self-similarity given non-stationary processes. b. Use Occam's razor: is self-similarity and long-term autocorrelation the simplest abstraction to use to talk about what's going on in figures 1 and 2, or are very human effects like specific long-term jobs being launch dominating these pictures? 2. Nearly no discussion is given to how this tool is going to be used, or what sorts of systems will be studied with it. The authors directly admit (as they correctly should) that the accuracy of their predictions varies greatly with the nature of the jobs being run on the machines under observation, and the predictions can be tuned if the nature of the jobs are somewhat known. Right now, the paper implicitly spins the tool as a general-purpose prediction tool, but as such, it will perform poorly given the huge number of special-case jobs that break the assumptions of the tool. Should this tool be used to load balancing of processes in a dedicated cluster-of-workstations doing a fine-grained parallel job? (Certainly not, given the 10 second prediction granularity.) Will it be used for the one-time selection of an unloaded host to batch-execute a job on? What other sorts of applications are relevant and non-relevant for this tool? Will many users of a shared collection of CPUs be using this tool? If so, will they all pounce on the least-loaded CPU every cycle, instantly making it the most loaded CPU and thereby introducing nasty feedback effects like in naive transport congestion control or routing implementations? Or, is the tool intended for a single user of a dedicated pool of resources? These sorts of questions and usage models need to be addressed, and either explicitly dealt with or explicitly dismissed as out of the scope of the paper. 3. I'm a big fan of demonstrating the effectiveness of a tool by using it in a real situation, and telling anecdotal or measurement evidence of how well it performed for the real-world task. This sort of stuff should be added to the paper, I think. Show me interesting observations that you could make that were enabled by the use of your system. Some other nitpicks: - it's only briefly mentioned in a single sentence in the second paragraph of page 5 that the prediction interval is for the subsequent 10 seconds of CPU time. All of the graphs and tables should explicitly declare this sort of relevant time quantum that pertains to the data presented. It's too easy to make false assumptions about what's being reported otherwise. - I don't like how the measurement errors are presented in the tables. Assume that the measured CPU availility is 80%, and the predicted availability is 10%. Is the measurement error supposed to be: error = (80%/10%) = factor of 8 = 800% (relative) error error = abs(80%-10%) = 70% (absolute) difference error Obviously equations 4 and 5 suggest the latter, but if percentages are reported in an error, that usually means the error reported is a relative number. Just looking at the tables, I mistakenly assumed the former while quickly scanning through the paper. - Are there other timescales of interest other than 10seconds? Can the system be easily modified to report other timescales for prediction, or multiple timescales (optimally)? - what other systems are there that attempt to do this sort of prediction? How does your system stack up to theirs, and why? There is currently __no__ related work presented in this paper, which is a huge hole.