One of the keys to understanding setjmp() and longjmp() is to understand machine layout, as described in the assembler and malloc lectures of the past few weeks. The state of a program depends completely on the contents of its memory (i.e. the code, globals, heap, and stack), and the contents of its registers. The contents of the registers includes the stack pointer (sp), frame pointer (fp), and program counter (pc). What setjmp() does is save the contents of the registers so that longjmp() can restore them later. In this way, longjmp() ``returns'' to the state of the program when setjmp() was called.
Specifically:
#include <setjmp.h> int setjmp(jmp_buf env);This says to save the current state of the registers into env. If you look in /usr/include/setjmp.h, you'll see that jmp_buf is defined as:
#define _JBLEN 9 typedef struct { int _jb[_JBLEN + 1]; } jmp_buf[1];This is an irritating way of saying that jmp_buf is an array of _JBLEN+1 integers.
So, when you call setjmp(), you pass it the address of an array of integers, and it stores the value of the registers in that array. Setjmp() returns 0 when you call it in this way.
longjmp(jmp_buf env, int val);Longjmp() resets the registers to the values saved in env. This includes the sp, fp and pc. What this means is that longjmp() doesn't return. Instead, when you call it, you return as if you have just called the setjmp() call that saved env. This is because the pc is restored along with the other registers. Setjmp() returns the val argument of longjmp(), which is not allowed to be zero (read the man page). Thus, you know when setjmp() returns a non-zero value that longjmp() was called, and is returning to setjmp().
As an example, look at the following code (in sj1.c):
#include <stdio.h> #include <unistd.h> #include <stdlib.h> #include <setjmp.h> int main() { jmp_buf env; int i; i = setjmp(env); if (i == 10) exit(0); printf("i = %d\n", i); longjmp(env, i+1); printf("Does this line get printed?\n"); return 0; } |
When we run this, we see that this is a pretty convoluted way to print the numbers from 0 through 9:
UNIX> ./sj1 i = 0 i = 1 i = 2 i = 3 i = 4 i = 5 i = 6 i = 7 i = 8 i = 9 UNIX>Let's walk through it. First, we call setjmp(), and it returns 0. Then we call longjmp() with a value of 1, which causes the code to return from setjmp() with a value of 1. We continue to do this until setjmp() finally returns 10, and we exit.
Setjmp() and longjmp() are usually used so that if an error is detected within a long string of procedure calls, the error may be dealt with efficiently by longjmp-ing out of the procedure that catches the error. This avoids having to return from each procedure can test return values. It is basically the C way of doing for example, "try/catch" clauses in C++.
For an example, look at sj2.c. It looks to be complicated, but really isn't. What happens is that there is a complicated series of procedure calls -- proc_1 through proc_4. If proc_4's argument is zero, then it flags the error by calling longjmp(). Otherwise, things proceed normally. As you can see, if you call sj2 with all positive arguments, then everything is ok. However, if you call it with all zeros, it will make the longjmp() call, and flag an error:
UNIX> sj2 1 2 3 4 proc_1(1, 2, 3, 4) = 4 UNIX> sj2 0 0 0 0 Error -- bad value of i (0), j (0), k (0), l (0) UNIX>
#include <setjmp.h> #include <stdlib.h> #include <stdio.h> int a(char *s, jmp_buf env) { int i; i = setjmp(env); printf("Setjmp returned -- i = %d, 0x%lx\n", i, (unsigned long) s); printf("s = %s\n", s); return i; } int b(char *s, jmp_buf env) { printf("In B: Calling longjmp(env, i)\n"); longjmp(env, 3); } int main(int argc, char **argv) { jmp_buf env; if (a("Jim", env) != 0) exit(0); b(NULL, env); } |
When we execute it, we get the following:
UNIX> sj3 Setjmp returned -- i = 0, 0x400836 s = Jim In B: i=3. Calling longjmp(env, i) Setjmp returned -- i = 3, 0x400693 s = UH??H??? UNIX>So, exactly what is happening? When the main() routine is first called, the stack looks as follows:
Stack |----------------| | | | | | | | | | | | | <-------- sp | env[0] | | env[1] | | env[2] | pc = main | env[3] | | .... | | env[8] | | other stuff | <------- fp |--------------- |Now, main() calls a(). First it pushes the arguments on the stack in reverse order, and then jsr is called, which pushes the return pc on the stack, and the old fp. The fp and sp are changed to make an empty stack frame for a():
Stack |----------------| | | | | <--------- sp, fp /------------- | old fp in main | | | old pc in main | | "Jim" <--- | s = "Jim" | | /--- | pointer to env | | \--> | env[0] | | | env[1] | | | env[2] | pc = a | | env[3] | | | .... | | | env[8] | \------------> | other stuff | |--------------- |
The first thing that a() does is allocate room its local variable i:
Stack |----------------| | | <--------- sp | i | <--------- fp /------------- | old fp in main | | | old pc in main | | "Jim" <--- | s = "Jim" | | /--- | pointer to env | | \--> | env[0] | | | env[1] | | | env[2] | pc = a | | env[3] | | | .... | | | env[8] | \------------> | other stuff | |--------------- |Then it calls setjmp(). This saves the current state of the registers. In other words, it saves the current values of sp, fp, and pc. Now, a() prints "i = 0", and "s = Jim", and then returns to main(). Now the stack looks as before, except that env is initialized to the state of the machine when a() was called:
Stack |----------------| | | | | | | | | | | | | <----------- sp | env[0] | | env[1] | | env[2] | pc = main | env[3] | | .... | | env[8] | | other stuff | <------------ fp |--------------- |Now, main() calls b(), and the stack looks as follows:
Stack |----------------| | | | | <--------- sp, fp /------------- | old fp in main | | | old pc in main | | | i = 3 | | /--- | pointer to env | | \--> | env[0] | | | env[1] | | | env[2] | pc = b | | env[3] | | | .... | | | env[8] | \------------> | other stuff | |--------------- |Then longjmp() is called. The registers are restored to their values when a() called setjmp(), and the pc returns from setjmp() in a(). However, the values in the stack are the same as they were for b():
Stack |----------------| | | <--------- sp | i = 2 | <--------- fp /------------- | old fp in main | | | old pc in main | | | s?? = 3 | | /--- | pointer to env | | \--> | env[0] | | | env[1] | | | env[2] | pc = a | | env[3] | | | .... | | | env[8] | \------------> | other stuff | |--------------- |You should see the problem. The stack is in a bad state. In particular, a() expects there to be a (char *) where s is, and instead, there is the integer value 3. Thus, when it tries to print out s, it tries to find a string at memory location 3, and dumps core.
This is a very common bug with setjmp() and longjmp() -- to use them properly, you CANNOT RETURN FROM THE PROCEDURE THAT CALLS setjmp(). This is sometimes called "longjmp-ing up the stack." As you can see, this bug is subtle -- the stack frame for b() looks a lot like the stack frame for a(), and thus this bug might slip by unnoticed for a while.
#include <signal.h> #include <unistd.h> #include <time.h> #include <stdio.h> #include <stdlib.h> #include <setjmp.h> int i, j; long T0; jmp_buf Env; void alarm_handler(int dummy) { long t1; t1 = time(0) - T0; printf("%ld second%s has passed: j = %d. i = %d\n", t1, (t1 == 1) ? "" : "s", j, i); if (t1 >= 8) { printf("Giving up\n"); longjmp(Env, 1); } alarm(1); signal(SIGALRM, alarm_handler); } int main() { signal(SIGALRM, alarm_handler); alarm(1); if (setjmp(Env) != 0) { printf("Gave up: j = %d, i = %d\n", j, i); exit(1); } T0 = time(0); for (j = 0; j < 10000; j++) { for (i = 0; i < 1000000; i++); } return 0; } |
This program longjmps out of alarm_handler after 8 seconds have passed, and then prints "Gave up". Be sure you can trace through this program:
UNIX> sh4 1 second has passed: j = 482. i = 549695 2 seconds have passed: j = 964. i = 948276 3 seconds have passed: j = 1447. i = 322623 4 seconds have passed: j = 1927. i = 801765 5 seconds have passed: j = 2410. i = 22333 6 seconds have passed: j = 2889. i = 39442 7 seconds have passed: j = 3372. i = 219445 8 seconds have passed: j = 3852. i = 857985 Giving up Gave up: j = 3852, i = 857985 UNIX>
One issue about calling longjmp() from a signal handler is that the operating system may not realize that you have left the signal handler. Specifically, when you are in an alarm handler, you cannot catch SIGALRM, because the operating system has disabled the signal. For example, two_alarm.c modifies sh4.c to put a while(1) loop in the alarm handler:
void alarm_handler(int dummy) { long t1; alarm(1); signal(SIGALRM, alarm_handler); t1 = time(0) - T0; printf("%d second%s %s passed: j = %d. i = %d\n", t1, (t1 == 1) ? "" : "s", (t1 == 1) ? "has" : "have", j, i); while(1); /* This is the only new code. */ } |
When we run this, we only get one line of text because the operating system will not generate SIGALRM while it is in the alarm handler.
This has an interesting effect on longjmp. If we longjmp out of the alarm handler, as we did in sh4.c, does the operating system know to re-enable SIGALRM? Take a look at two_alarm_setjmp_1.c: two_alarm_setjmp_1.c
#include <stdio.h> #include <time.h> #include <unistd.h> #include <stdlib.h> #include <signal.h> #include <setjmp.h> int i, j; long T0; jmp_buf Env; void alarm_handler(int dummy) { long t1; alarm(1); signal(SIGALRM, alarm_handler); t1 = time(0) - T0; printf("%ld second%s %s passed: j = %d. i = %d\n", t1, (t1 == 1) ? "" : "s", (t1 == 1) ? "has" : "have", j, i); longjmp(Env, t1); /* Instead of returning from the alarm handler, we longjmp(). */ } int main() { signal(SIGALRM, alarm_handler); alarm(1); if (setjmp(Env) != 0) { printf("Gave up: j = %d, i = %d\n", j, i); exit(1); } T0 = time(0); j = 0; i = 0; if (setjmp(Env) == 8) { printf("Gave up\n"); exit(0); }; for (; j < 10000; j++) { for (; i < 1000000; i++) ; i = 0; } printf("Done: Time = %ld\n", time(0)-T0); return 0; } |
This is similar to two_alarm.c, except we longjmp() out of the alarm handler, and then continue running the code. Its output depends on the implementation of the operating system. Here it is on mamba (a Linux box) in 2010:
UNIX> two_alarm_setjmp_1 1 second has passed: j = 476. i = 27986 Done: Time = 21 UNIX>As you can see, the operating system does not re-enable SIGALRM when we longjmp() out of the handler. Here it is on my Macintosh (again in 2010):
UNIX> two_alarm_setjmp_1 1 second has passed: j = 357. i = 652488 2 seconds have passed: j = 719. i = 160936 3 seconds have passed: j = 1079. i = 682336 4 seconds have passed: j = 1438. i = 479525 5 seconds have passed: j = 1797. i = 62895 6 seconds have passed: j = 2153. i = 660999 7 seconds have passed: j = 2508. i = 813449 8 seconds have passed: j = 2874. i = 267243 Gave up UNIX>Well, that's confusing. In its laconic way, the setjmp() man page helps to clear up the confusion:
NOTES POSIX does not specify whether setjmp() will save the signal context. (In System V it will not. In 4.3BSD it will, and there is a function _setjmp that will not.) If you want to save signal masks, use sigsetjmp().What this means is that the operating system maintains "signal context" for your process. These are the signals that are enabled or disabled. On some operating systems (like my Macintosh), this is saved in the setjmp() call, and on some (like mamba) it is not. For consistency, there are procedures sigsetjmp() and siglongjmp():
int sigsetjmp(sigjmp_buf env, int savesigs); void siglongjmp(sigjmp_buf env, int val); |
If you call sigsetjmp() with savesig equal to one, it will save the signal context and restore it on a longjmp call. That allows you to longjmp out of the alarm handler. two_alarm_setjmp_2.c does this and works properly on both machines:
On mamba: |
On my Macintosh: |
Here, mamba is faster than my Macintosh.
The code is written in a "re-entrant" way -- when it is interrupted by SIGALRM, the alarm handler siglongjmp's back into main, which prints out the current state of the generation, and then calls enumerate_primes1() again with current values.
#include <stdio.h> #include <stdlib.h> #include <signal.h> #include <setjmp.h> sigjmp_buf Env; void alarm_handler(int dummy) { alarm(1); signal(SIGALRM, alarm_handler); siglongjmp(Env, 1); } void enumerate_primes1(int *current_test, int *largest_prime) { int i; while(1) { for(i = 2; i*i <= *current_test && *current_test % i != 0; i++) ; if (*current_test % i != 0) *largest_prime = *current_test; *current_test = *current_test + 1; } } main() { int test, largest_prime; int time; test = 2; largest_prime = 2; time = 0; signal(SIGALRM, alarm_handler); alarm(1); time += sigsetjmp(Env, 1); printf("%4d Largest Prime: %10d\n", time, largest_prime); enumerate_primes1(&test, &largest_prime); } |
Once again, make sure you can trace through the code. Here it is running.
UNIX> prime_1 0 Largest Prime: 2 1 Largest Prime: 1052287 2 Largest Prime: 1729841 3 Largest Prime: 2310593 4 Largest Prime: 2836727 5 Largest Prime: 3326567 6 Largest Prime: 3787877 ...A second program generates primes a little more efficiently. It maintains a Dllist of all primes less than p, and then traverses that Dllist to see if p is divisible by any prime numbers that are less than or equal to sqrt(p). The code is in prime_2.c
#include <signal.h> #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <setjmp.h> #include "dllist.h" sigjmp_buf Env; void alarm_handler(int dummy) { alarm(1); signal(SIGALRM, alarm_handler); siglongjmp(Env, 1); } int is_prime(int p, Dllist l) { Dllist tmp; int i; dll_traverse(tmp, l) { i = tmp->val.i; if (i*i > p) return 1; if (p % i == 0) return 0; } return 1; } void enumerate_primes2(int *current_test, int *largest_prime, Dllist l) { while(1) { if (is_prime(*current_test, l)) { dll_append(l, new_jval_i(*current_test)); *largest_prime = *current_test; } *current_test = *current_test + 1; } } int main() { int test, largest_prime; int time; Dllist l; test = 2; largest_prime = 2; l = new_dllist(); time = 0; signal(SIGALRM, alarm_handler); alarm(1); time += sigsetjmp(Env, 1); printf("%4d Largest Prime: %10d\n", time, largest_prime); enumerate_primes2(&test, &largest_prime, l); } |
It generates more primes than the first program:
UNIX> prime_2 0 Largest Prime: 2 1 Largest Prime: 3262639 2 Largest Prime: 5595433 3 Largest Prime: 7650317 4 Largest Prime: 9531449 5 Largest Prime: 11299507 6 Largest Prime: 12981097 ....Finally, the program prime_12.c alternates between the two prime number generators, giving each one second of time and printing out how many primes each generates at two second intervals. Here's the main():
int main() { int test1, largest_prime1; int test2, largest_prime2; Dllist l; test1 = 2; largest_prime1 = 2; test2 = 2; largest_prime2 = 2; l = new_dllist(); time = 0; signal(SIGALRM, alarm_handler); alarm(1); time += sigsetjmp(Env, 1); if (time%2 == 0) { printf("%4ld EP1: %10d EP2: %10d\n", time/2, largest_prime1, largest_prime2); enumerate_primes1(&test1, &largest_prime1); } else { enumerate_primes2(&test2, &largest_prime2, l); } return 0; } |
When we run it, we see the the second one whups up on the first one:
UNIX> prime_12 0 EP1: 2 EP2: 2 1 EP1: 1052561 EP2: 3265061 2 EP1: 1729757 EP2: 5602151 3 EP1: 2311819 EP2: 7663367 4 EP1: 2839519 EP2: 9369739 5 EP1: 3329453 EP2: 11159677 ...
UNIX> gcc -O -o sh4_opt sh4.c UNIX> sh4_opt UNIX>The compiler analyzed the main loop and decided that since it did nothing, it could be eliminated:
for (j = 0; j < 10000; j++) { for (i = 0; i < 1000000; i++); } |
While that's not a particularly bad thing, you need to be aware that when the compiler optimizes, it doesn't really care about your setjmp()/longjmp() calls.
Let's take a pathelogical example (one that drove me crazy during my 2010 lecture, as I had optimization on by default and was not using the makefile):
UNIX> make prime_12_opt gcc -I/home/plank/cs360/include -O -o prime_12_opt prime_12.c /home/plank/cs360/objs/libfdr.a UNIX> prime_12_opt 0 EP1: 2 EP2: 2It hangs? Let's explore: If we print out the value of time after the setjmp, we see something strange:
UNIX> prime_12_opt Time = 0 0 EP1: 2 EP2: 2 Time = 1 Time = 1 Time = 1 Time = 1 ...What???? There is an explanation. When it optimizes, the compiler is happy to store certain frequently-used variables in registers rather than on the stack. That's where it is storing time, which means that time is saved during setjmp(), when its value is zero, and restored to zero at every longjmp() call. Is there anything you can do about this? Well, perhaps you should heed the advice of the setjmp() man page:
setjmp() and sigsetjmp() make programs hard to understand and maintain. If possible an alternative should be used.
So why do I teach you setjmp()/longjmp()? First, just because you shouldn't use something doesn't mean that others won't. You should be prepared to understand setjmp()/longjmp() code. Second, it's a great way to understand the interaction of registers and program state. Third, it makes great dinner-time conversation. You can thank me later....