ARRAYS and ALGORITHMS (2)
Here we'll look at some more routines.
---------
SELECTION SORT
This is one alternative to the bubble sort. It does tend to be
slightly faster. We should note that the best sorting routines
tend to be much more difficult to write and understand for people new
to programming. With the bubble sort, we compared successive
pairs of values and switched values as needed. What the selection
sort does is to find the smallest value in the list, and put it into
slot 0. Then we find the next smallest value and put it into slot
1, etc. We'll assume we have a list with 8 elements: 17,
67, -2, 33, 45, 19, 106, 3:
for i in
range(7):
#on each pass, we will find the smallest element in the remaining list
for j in
range(i+1,8,1): #these are the other
values to look at
if mylist[i] > mylist[j]: #found a
smaller value--so swap
temp = mylist[i]
mylist[i] = mylist[j]
mylist[j] = temp
watch what happens.
1) i=0 j=1 why? j starts with
the value i+1, so if i was 3, j would start with 4, etc
is mylist[0] > mylist[1]? is 17 > 67? no--so do not
swap
values
17, 67, -2, 33, 45, 19, 106, 3
i=0 j=2 is mylist[0] >
mylist[2]? is 17 > -2? yes--so
swap:
-2, 67, 17, 33, 45, 19, 106, 3
i=0 j=3 is mylist[0] >
mylist[3]? is -2 > 33? no--do not
swap
-2, 67, 17, 33, 45, 19, 106, 3
i=0, j=4, 5, 6 -2 is not greater than 45, 19, or
106 no swapping
i=0, j=7 -2 is not graeter
than 3--do not swap.
-2, 67, 17, 33, 45, 19, 106, 3
the smallest value is in slot 0. the rest of the list is not
(yet) sorted. now we want the next smallest value to go into slot
1:
2) i=1 j=2 is 67>17? yes. swap.
-2, 17,
67, 33, 45, 19, 106, 3
i=1, j=3, 4, 5, 6 17 is less than 33, 45, 19, and
106--no swapping
i=1 j=7 is 17 > 3? yes--so
swap:
-2, 3, 67, 33,
45, 19, 106, 17
after two passes, the two smallest elements are in slots
0 and 1--the rest of the list is not (yet) sorted.
3) i=2 j=3 compare 67 and 33. etc. the 3rd smallest
value--17--will wind up in slot 2 on this pass.
etc.
Note that the list we're comparing items in gets shorter and shorter
each pass we make through it--this is one of the reasons that the
selection short usually runs faster than the bubble sort.
"Usually?" yes--suppose we have a list which is already
sorted. The bubble sort makes one pass through the list--nothing
needs to be swapped--we're all done. The selection sort is still
making lots of passes through the same sorted list. But still--on
average, the selection sort is faster.
-------------------------------------------------
SEARCHING.
Suppose we have a list of 100 names (or numbers) and this list has
already been sorted. How can we find a particular value in this
list? We'll do this in much the same way that you would
find a name in a phone book.
We want 3 variables: startpoint, endpoint, and midpoint.
For a list, or part of a list, startpoint will the the index of the
first value in the list, endpoint will be the index of the last value
in the list. The middle value will be at midpoint.
We are looking for NAME in mylist.
Initially--we set startpoint to 0, endpoint to 99--these are the
beginning and end of the whole 100-value list. midpoint will be
(startpoint + endpoint)/2: thus initially, midpoint will be 50
(we'll round up). We look at the name in mylist[midpoint].
How does this compare to NAME?
1) mylist[midpoint] equals NAME. we've found what we were looking
for.
2) mylist[midpoint] < NAME In this case, NAME is in the half
of the list from midpoint to endpoint. So what we will do now is
to say:
startpoint = midpoint
(endpoint doesn't change)
midpoint = (startpoint + endpoint)/2 and we
repeat the process--in this example, startpoint=50, endpoint=99, and
midpoint = 75.
we only have half of the previous list to check.
3) mylist[midpoint] > NAME. In this case, NAME is in the half
of the list from startpoint to midpoint, so we now say:
(startpoint doesn't change)
endpoint = midpoint
midpoint = (startpoint +
endpoint)/2 and here we search mylist from slot 0 to
slot 50 by checking slot 25, etc
This is actually a very efficient process--it's also how you yourselves
find names in a phone book, or on a page in the phone book. On
each pass, we throw out half of the remaining names: If our
original list was size 100, on the next pass we're searching a list of
size 50, then searching a list of size 25, etc. Eventually
we reach a list of size 1. Imagine a VERY small town with only 1
inhabitant (Ludwig Snarf, e.g.). There's a phone book for
Snarfville--and there's only one name in the phone book. (It's a very
small phone book). How long does it take you to look up Ludwig
Snarf's phone number?
If you start with 2 and divide it in half, you get 1. If you
start with 4 and divide it in half twice, you get 1. If you start
with 64 and divide it in half 6 times you get 1. If you start
with 1 million and divide it in half 20 times you get 1.
Why? 4 is 2 to the second power. 64 is 2 to the 6th, 1 million is
2 to the 20th power. If you like mathy stuff, we say that 20 is
log-base-2 of 1,000,000. This means that if we start with a list
of size 1 million and divide it in half 20 times we get 1--in a list of
1 million names chopping the list as shown above gets us to a list of
size 1. How many more chops do we need to do if the list
was 2 million rather than 1 million? Just one more chop--that's the
nice thing about this algorithm