We're going to get started.
Handouts are the by the door if
anybody didn't pick one up.
My name is Charles Leiserson.
I will be lecturing this course
this term, Introduction to
Algorithms, with Erik Demaine.
In addition,
this is an SMA course,
a Singapore MIT Alliance course
which will be run in Singapore
by David Hsu.
And so all the lectures will be
videotaped and made available on
the Web for the Singapore
students, as well as for MIT
students who choose to watch
them on the Web.
If you have an issue of not
wanting to be on the videotape,
you should sit in the back row.
OK?
Otherwise, you will be on it.
There is a video recording
policy, but it seems like they
ran out.
If anybody wants to see it,
people, if they could just sort
of pass them around maybe a
little bit, once you're done
reading it, or you can come up.
I did secure one copy.
Before we get into the content
of the course,
let's briefly go over the
course information because there
are some administrative things
that we sort of have to do.
As you can see,
this term we have a big staff.
Take a look at the handout
here.
Including this term six TAs,
which is two more TAs than we
normally get for this course.
That means recitations will be
particularly small.
There is a World Wide Web page,
and you should bookmark that
and go there regularly because
that is where everything will be
distributed.
Email.
You should not be emailing
directly to, even though we give
you our email addresses,
to the individual members of
the staff.
You should email us generally.
And the reason is you will get
much faster response.
And also, for any
communications,
generally we like to monitor
what the communications are so
it's helpful to have emails
coming to everybody on the
course staff.
As I mentioned,
we will be doing distance
learning this term.
And so you can watch lectures
online if you choose to do that.
I would recommend,
for people who have the
opportunity to watch,
to come live.
It's better live.
You get to interact.
There's an intangible that
comes with having it live.
In fact, in addition to the
videos, I meet weekly with the
Singapore students so that they
have a live session as well.
Prerequisites.
The prerequisites for this
course are 6.042,
which is Math for Computer
Science, and 6.001.
You basically need discrete
mathematics and probability,
as well as programming
experience to take this course
successfully.
People do not have that
background should not be in the
class.
We will be checking
prerequisites.
If you have any questions,
please come to talk to us after
class.
Let's see.
Lectures are here.
For SMA students,
they have the videotapes and
they will also have a weekly
meeting.
Students must attend a one-hour
recitation session each week.
There will be new material
presented in the recitation.
Unlike the lectures,
they will not be online.
Unlike the lectures,
there will not be lecture notes
distributed for the recitations
in general.
And, yet, there will be
material there that is directly
on the exams.
And so every term we say oh,
when did you cover that?
That was in recitation.
You missed that one.
So, recitations are mandatory.
And, in particular,
also let me just mention your
recitation instructor is the one
who assigns your final grade.
So we have a grade meeting and
keep everybody normal,
but your recitation has the
final say on your grade.
Handouts.
Handouts are available on the
course Web page.
We will not generally,
except for this one,
first handout,
be bringing handouts to class.
Textbook is this book,
Introduction to Algorithms.
MIT students can get it any of
the local bookstores,
including the MIT Coop.
There is also a new online
service that provides textbooks.
You can also get a discount if
you buy it at the MIT Press
Bookstore.
There is a coupon in the MIT
Student Telephone Directory for
a discount on MIT Press books.
And you can use that to
purchase this book at a
discount.
Course website.
This is the course website.
It links to the Stellar
website, which is where,
actually, everything will be
kept.
And SMA students have their own
website.
Some students find this course
particularly challenges so we
will have extra help.
We will post weekly office
hours on the course website for
the TAs.
And then as an experiment this
term, we are going to offer
homework labs for this class.
What a homework lab is,
is it's a place and a time you
can go where other people in the
course will go to do homework.
And there will be typically two
TAs who staff the lab.
And so, as you're working on
your homework,
you can get help from the TAs
if you need it.
And it's generally a place,
we're going to schedule those,
and they will be on the course
calendar for where it is and
when it is that they will be
held, but usually Sundays 2:00
to 4:00 pm, or else it will be
some evening.
I think the first one is an
evening, right?
Near to when the homework is
due.
Your best bet is try to do the
homework in advance of the
homework lab.
But then, if you want extra
help, if you want to talk over
your solutions with people
because as we will talk about
problem sets you can solve in
collaboration with other people
in the class.
In addition,
there are several peer
assistance programs.
Also the office of Minority
Education has an assistance
program, and those usually get
booked up pretty quickly.
If you're interested in those,
good idea to make an
appointment to get there and get
help soon.
The homework labs,
I hope a lot of people will try
that out.
We've never done this.
I don't know of any other
course.
Do other people know of courses
at MIT that have done this?
6.011 did it,
OK.
Good.
And was it successful in that
class?
It never went,
OK.
Good.
[LAUGHTER] We will see.
If it's not paying off then we
will just return to ordinary
office hours for those TAs,
but I think for some students
that is a good opportunity.
If you wish to be registered in
this course, you must sign up on
the course Web page.
So, that is requirement one.
It must be done today.
You will find it difficult to
pass the course if you are not
in the class.
And you should notify your TA
if you decide to drop so that we
can get you off and stop the
mailings, stop the spam.
And you should register today
before 7:00 PM.
And then we're going to email
your recitation assignment to
you before Noon tomorrow.
And if you don't receive this
information by Thursday Noon,
please send us an email to the
course staff generally,
not to me individually,
saying that you didn't receive
your recitation assignment.
And so if you haven't received
it by Thursday Noon you want to.
I think generally they are
going to send them out tonight
or at least by tomorrow morning.
Yeah.
OK.
SMA students don't have to
worry about this.
Problem sets.
We have nine problem sets that
we project will be assigned
during the semester.
A couple things about problem
sets.
Homeworks won't generally be
accepted, if you have
extenuating circumstances you
should make prior arrangements
with your recitation instructor.
In fact, almost all of the
administrative stuff,
you shouldn't come to me to ask
and say can I hand in something
late?
You should be talking to your
recitation instructor.
You can read the other things
about the form,
but let me just mention that
there are exercises that should
be solved but not handed in as
well to give you drill on the
material.
I highly recommend you doing
the exercises.
They both test your
understanding of the material,
and exercises have this way of
finding themselves on quizzes.
You're often asked to describe
algorithms.
And here is a little outline of
what you can use to describe an
algorithm.
The grading policy is something
that somehow I cover.
And always every term there are
at least a couple of students
who pretend like I never showed
them this.
If you skip problems it has a
nonlinear effect on your grade.
Nonlinear, OK?
If you don't skip any problems,
no effect on your grade.
If you skip one problem,
a hundredth of a letter grade,
we can handle that.
But two problems it's a tenth.
And, as you see,
by the time you have skipped
like five letter grades,
it is already five problems.
This is not problem sets,
by the way.
This is problems,
OK?
You're down a third of a letter
grade.
And if you don't do nine or
more, so that's typically about
three to four problem sets,
you don't pass the class.
I always have some students
coming at the end of the year
saying oh, I didn't do any of my
problems.
Can you just pass me because I
did OK on the exams?
Answer no, a very simple answer
because we've said it upfront.
So, the problem sets are an
integral part of the course.
Collaboration policy.
This is extremely important so
everybody pay attention.
If you are asleep now wake up.
Like that's going to wake
anybody up, right?
[LAUGHTER] The goal of
homework.
Professor Demaine and my
philosophy is that the goal of
homework is to help you learn
the material.
And one way of helping to learn
is not to just be stuck and
unable to solve something
because then you're in no better
shape when the exam roles
around, which is where we're
actually evaluating you.
So, you're encouraged to
collaborate.
But there are some commonsense
things about collaboration.
If you go and you collaborate
to the extent that all you're
doing is getting the information
from somebody else,
you're not learning the
material and you're not going to
do well on the exams.
In our experience,
students who collaborate
generally do better than
students who work alone.
But you owe it to yourself,
if you're going to work in a
study group, to be prepared for
your study group meeting.
And specifically you should
spend a half an hour to 45
minutes on each problem before
you go to group so you're up to
speed and you've tried out your
ideas.
And you may have solutions to
some, you may be stuck on some
other ones, but at least you
applied yourself to it.
After 30 to 45 minutes,
if you cannot get the problem,
just sitting there and banging
your head against it makes no
sense.
It's not a productive use of
your time.
And I know most of you have
issues with having time on your
hands, right?
Like it's not there.
So, don't go banging your head
against problems that are too
hard or where you don't
understand what's going on or
whatever.
That's when the study group can
help out.
And, as I mentioned,
we'll have homework labs which
will give you an opportunity to
go and do that and coordinate
with other students rather than
necessarily having to form your
own group.
And the TAs will be there.
If your group is unable to
solve the problem then talk to
other groups or ask your
recitation instruction.
And, that's how you go about
solving them.
Writing up the problem sets,
however, is your individual
responsibility and should be
done alone.
You don't write up your problem
solutions with other students,
you write them up on your own.
And you should on your problem
sets, because this is an
academic place,
we understand that the source
of academic information is very
important, if you collaborated
on solutions you should write a
list of the collaborators.
Say I worked with so and so on
this solution.
It does not affect your grade.
It's just a question of being
scholarly.
It is a violation of this
policy to submit a problem
solution that you cannot orally
explain to a member of the
course staff.
You say oh, well,
my write-up is similar to that
other person's.
I didn't copy them.
We may ask you to orally
explain your solution.
If you are unable,
according to this policy,
the presumption is that you
cheated.
So, do not write up stuff that
you don't understand.
You should be able to write up
the stuff that you understand.
Understand why you're putting
down what you're putting down.
If it isn't obvious,
no collaboration whatsoever is
permitted on exams.
Exams is when we evaluate you.
And now we're not interested in
evaluating other people,
we're interested in evaluating
you.
So, no collaboration on exams.
We will have a take-home exam
for the second quiz.
You should look at the
schedule.
If there are problems with the
schedule of that,
we want to know early.
And we will give you more
details about the collaboration
in the lecture on Monday,
November 28th.
Now, generally,
the lectures here,
they're mandatory and you have
to know them,
but I know that some people say
gee, 9:30 is kind of early,
especially on a Monday or
whatever.
It can be kind of early to get
up.
However, on Monday,
November 28th,
you fail the exam if you do not
show up to lecture on time.
That one day you must show up.
Any questions about that?
That one day you must show up
here, even if you've been
watching them on the Web.
And generally,
if you think you have
transgressed,
the best is to come to us to
talk about it.
We can usually work something
out.
It's when we find somebody has
transgressed from a third-party
or from obvious analyses that we
do with homeworks,
that's when things get messy.
So, if you think,
for some reason or other,
oh, I may have done something
wrong, please come and talk to
us.
We actually were students once,
too, albeit many years ago.
Any questions?
So, this class has great
material.
Fabulous material.
And it's really fun,
but you do have to work hard.
Let's talk content.
This is the topic of the first
part of the course.
The first part of the course is
focused on analysis.
The second part of the course
is focused on design.
Before you can do design,
you have to master a bunch of
techniques for analyzing
algorithms.
And then you'll be in a
position to design algorithms
that you can analyze and that
which are efficient.
The analysis of algorithm is
the theoretical study --
-- of computer program
performance --
-- and resource usage.
And a particular focus on
performance.
We're studying how to make
things fast.
In particular,
computer programs.
We also will discover and talk
about other resources such as
communication,
such as memory,
whether RAM memory or disk
memory.
There are other resources that
we may care about,
but predominantly we focus on
performance.
Because this is a course about
performance, I like to put
things in perspective a little
bit by starting out and asking,
in programming,
what is more important than
performance?
If you're in an engineering
situation and writing code,
writing software,
what's more important than
performance?
Correctness.
Good.
OK.
What else?
Simplicity can be.
Very good.
Yeah.
Maintainability often much more
important than performance.
Cost.
And what type of cost are you
thinking?
No, I mean cost of what?
We're talking software here,
right?
What type of cost do you have
in mind?
There are some costs that are
involved when programming like
programmer time.
So, programmer time is another
thing also that might be.
Stability.
Robustness of the software.
Does it break all the time?
What else?
Come on.
We've got a bunch of engineers
here.
A lot of things.
How about features?
Features can be more important.
Having a wider collection of
features than your competitors.
Functionality.
Modularity.
Is it designed in a way where
you can make changes in a local
part of the code and you don't
have to make changes across the
code in order to affect a simple
change in the functionality?
There is one big one which
definitely, especially in the
`90s, was like the big thing in
computers.
The big thing.
Well, security actually.
Good.
I don't even have that one
down.
Security is excellent.
That's actually been more in
the 2000.
Security has been far more
important often than
performance.
Scalability has been important,
although scalability,
in some sense,
is performance related.
But, yes, scalability is good.
What was the big breakthrough
and why do people use Macintosh
rather than Windows,
those people who are of that
religion?
User-friendliness.
Wow.
If you look at the number of
cycles of computers that went
into user-friendliness in the
`90s, it grew from almost
nothing to where it's now the
vast part of the computation
goes into user-friendly.
So, all those things are more
important than performance.
This is a course on
performance.
Then you can say OK,
well, why do we bother and why
study algorithms and performance
if it's at the bottom of the
heap?
Almost always people would
rather have these other things
than performance.
You go off and you say to
somebody, would I rather have
performance or more
user-friendliness?
It's almost always more
important than performance.
Why do we care then?
Yeah?
That wasn't user-friendly.
Sometimes performance is
correlated with
user-friendliness,
absolutely.
Nothing is more frustrating
than sitting there waiting,
right?
So, that's a good reason.
What are some other reasons
why?
Sometimes they have real-time
constraints so they don't
actually work unless they
perform adequately.
Yeah?
Hard to get,
well, we don't usually quantify
user-friendliness so I'm not
sure, but I understand what
you're saying.
He said we don't get
exponential performance
improvements in
user-friendliness.
We often don't get that in
performance either,
by the way.
[LAUGHTER] Sometimes we do,
but that's good.
There are several reasons that
I think are important.
Once is that often performance
measures the line between the
feasible and the infeasible.
We have heard some of these
things.
For example,
when there are real-time
requirements,
if it's not fast enough it's
simply not functional.
Or, if it uses too much memory
it's simply not going to work
for you.
And, as a consequence,
what you find is algorithms are
on the cutting edge of
entrepreneurship.
If you're talking about just
re-implementing stuff that
people did ten years ago,
performance isn't that
important at some level.
But, if you're talking about
doing stuff that nobody has done
before, one of the reasons often
that they haven't done it is
because it's too time-consuming.
Things don't scale and so
forth.
So, that's one reason,
is the feasible versus
infeasible.
Another thing is that
algorithms give you a language
for talking about program
behavior, and that turns out to
be a language that has been
pervasive through computer
science, is that the theoretical
language is what gets adopted by
all the practitioners because
it's a clean way of thinking
about things.
A good way I think about
performance, and the reason it's
on the bottom of the heap,
is sort of like performance is
like money, it's like currency.
You say what good does a stack
of hundred dollar bills do for
you?
Would you rather have food or
water or shelter or whatever?
And you're willing to pay those
hundred dollar bills,
if you have hundred dollar
bills, for that commodity.
Even though water is far more
important to your living.
Well, similarly,
performance is what you use to
pay for user-friendliness.
It's what you pay for security.
And you hear people say,
for example,
that I want greater
functionality,
so people will program in Java,
even though it's much slower
than C, because they'll say it
costs me maybe a factor of three
or something in performance to
program in Java.
But Java is worth it because
it's got all these
object-oriented features and so
forth, exception mechanisms and
so on.
And so people are willing to
pay a factor of three in
performance.
So, that's why you want
performance because you can use
it to pay for these other things
that you want.
And that's why,
in some sense,
it's on the bottom of the heap,
because it's the universal
thing that you quantify.
Do you want to spend a factor
of two on this or spend a factor
of three on security,
et cetera?
And, in addition,
the lessons generalize to other
resource measures like
communication,
like memory and so forth.
And the last reason we study
algorithm performance is it's
tons of fun.
Speed is always fun,
right?
Why do people drive fast cars,
race horses,
whatever?
Rockets, et cetera,
why do we do that?
Because speed is fun.
Ski.
Who likes to ski?
I love to ski.
I like going fast on those
skis.
It's fun.
Hockey, fast sports,
right?
We all like the fast sports.
Not all of us,
I mean.
Some people say he's not
talking to me.
OK, let's move on.
That's sort of a little bit of
a notion as to why we study
this, is that it does,
in some sense,
form a common basis for all
these other things we care
about.
And so we want to understand
how can we generate money for
ourselves in computation?
We're going to start out with a
very simple problem.
It's one of the oldest problems
that has been studied in
algorithms, is the problem of
sorting.
We're going to actually study
this for several lectures
because sorting contains many
algorithmic techniques.
The sorting problem is the
following.
We have a sequence a_1,
a_2 up to a_n of numbers as
input.
And our output is a permutation
of those numbers.
A permutation is a
rearrangement of the numbers.
Every number appears exactly
once in the rearrangement such
that, I sometimes use a dollar
sign to mean "such that," a_1 is
less than or equal to a_2 prime.
Such that they are
monotonically increasing in
size.
Take a bunch of numbers,
put them in order.
Here's an algorithm to do it.
It's called insertion sort.
And we will write this
algorithm in what we call
pseudocode.
It's sort of a programming
language, except it's got
English in there often.
And it's just a shorthand for
writing for being precise.
So this sorts A from 1 to n.
And here is the code for it.
This is what we call
pseudocode.
And if you don't understand the
pseudocode then you should ask
questions about any of the
notations.
You will start to get used to
it as we go on.
One thing is that in the
pseudocode we use indentation,
where in most languages they
have some kind of begin-end
delimiters like curly braces or
something in Java or C,
for example.
We just use indentation.
The whole idea of the
pseudocode is to try to get the
algorithms as short as possible
while still understanding what
the individual steps are.
In practice,
there actually have been
languages that use indentation
as a means of showing the
nesting of things.
It's generally a bad idea,
because if things go over one
page to another,
for example,
you cannot tell what level of
nesting it is.
Whereas, with explicit braces
it's much easier to tell.
So, there are reasons why this
is a bad notation if you were
doing software engineering.
But it's a good one for us
because it just keeps things
short and makes fewer things to
write down.
So, this is insertion sort.
Let's try to figure out a
little bit what this does.
It basically takes an array A
and at any point the thing to
understand is,
we're setting basically,
we're running the outer loop
from j is 2 to n,
and the inner loop that starts
at j minus 1 and then goes down
until it's zero.
Basically, if we look at any
point in the algorithm,
we essentially are looking at
some element here j.
A of j, the jth element.
And what we do essentially is
we pull a value out here that we
call the key.
And at this point the important
thing to understand,
and we'll talk more about this
in recitation on Friday,
is that there is an invariant
that is being maintained by this
loop each time through.
And the invariant is that this
part of the array is sorted.
And the goal each time through
the loop is to increase,
is to add one to the length of
the things that are sorted.
And the way we do that is we
pull out the key and we just
copy values up like this.
And keep copying up until we
find the place where this key
goes, and then we insert it in
that place.
And that's why it's called
insertion sort.
We just sort of move the
things, copy the things up until
we find where it goes,
and then we put it into place.
And now we have it from A from
one to j is sorted,
and now we can work on j plus
one.
Let's give an example of that.
Imagine we are doing 8,
2, 4, 9, 3, 6.
We start out with j equals 2.
And we figure out that we want
to insert it there.
Now we have 2,
8, 4, 9, 3, 6.
Then we look at the four and
say oh, well,
that goes over here.
We get 2, 4,
8, 9, 3, 6 after the second
iteration of the outer loop.
Then we look at 9 and discover
immediately it just goes right
there.
Very little work to do on that
step.
So, we have exactly the same
output after that iteration.
Then we look at the 3 and
that's going to be inserted over
there.
2, 3, 4, 8, 9,
6.
And finally we look at the 6
and that goes in there.
2, 3, 4, 6, 8,
9.
And at that point we are done.
Question?
The array initially starts at
one, yes.
A[1...n], OK?
So, this is the insertion sort
algorithm.
And it's the first algorithm
that we're going to analyze.
And we're going to pull out
some tools that we have from our
math background to help to
analyze it.
First of all,
let's take a look at the issue
of running time.
The running time depends,
of this algorithm depends on a
lot of things.
One thing it depends on is the
input itself.
For example,
if the input is already sorted
--
-- then insertion sort has very
little work to do.
Because every time through it's
going to be like this case.
It doesn't have to shuffle too
many guys over because they're
already in place.
Whereas, in some sense,
what's the worst case for
insertion sort?
If it is reverse sorted then
it's going to have to do a lot
of work because it's going to
have to shuffle everything over
on each step of the outer loop.
In addition to the actual input
it depends, of course,
on the input size.
Here, for example,
we did six elements.
It's going to take longer if
we, for example,
do six times ten to the ninth
elements.
If we were sorting a lot more
stuff, it's going to take us a
lot longer.
Typically, the way we handle
that is we are going to
parameterize things in the input
size.
We are going to talk about time
as a function of the size of
things that we are sorting so we
can look at what is the behavior
of that.
And the last thing I want to
say about running time is
generally we want upper bonds on
the running time.
We want to know that the time
is no more than a certain
amount.
And the reason is because that
represents a guarantee to the
user.
If I say it's not going to run,
for example,
if I tell you here's a program
and it won't run more than three
seconds, that gives you real
information about how you could
use it, for example,
in a real-time setting.
Whereas, if I said here's a
program and it goes at least
three seconds,
you don't know if it's going to
go for three years.
It doesn't give you that much
guarantee if you are a user of
it.
Generally we want upper bonds
because it represents a
guarantee to the user.
There are different kinds of
analyses that people do.
The one we're mostly going to
focus on is what's called
worst-case analysis.
And this is what we do usually
where we define T of n to be the
maximum time on any input of
size n.
So, it's the maximum input,
the maximum it could possibly
cost us on an input of size n.
What that does is,
if you look at the fact that
sometimes the inputs are better
and sometimes they're worse,
we're looking at the worst case
of those because that's the way
we're going to be able to make a
guarantee.
It always does something rather
than just sometimes does
something.
So, we're looking at the
maximum.
Notice that if I didn't have
maximum then T(n) in some sense
is a relation,
not a function,
because the time on an input of
size n depends on which input of
size n.
I could have many different
times, but by putting the
maximum at it,
it turns that relation into a
function because there's only
one maximum time that it will
take.
Sometimes we will talk about
average case.
Sometimes we will do this.
Here T of n is then the
expected time over all inputs of
size n.
It's the expected time.
Now, if I talk about expected
time, what else do I need to say
here?
What does that mean,
expected time?
I'm sorry.
Raise your hand.
Expected inputs.
What does that mean,
expected inputs?
I need more math.
What do I need by expected time
here, math?
You have to take the time of
every input and then average
them, OK.
That's kind of what we mean by
expected time.
Good.
Not quite.
I mean, what you say is
completely correct,
except is not quite enough.
Yeah?
It's the time of every input
times the probability that it
will be that input.
It's a way of taking a weighted
average, exactly right.
How do I know what the
probability of every input is?
How do I know what the
probability a particular input
occurs is in a given situation?
I don't.
I have to make an assumption.
What's that assumption called?
What kind of assumption do I
have to meet?
I need an assumption --
-- of the statistical
distribution of inputs.
Otherwise, expected time
doesn't mean anything because I
don't know what the probability
of something is.
In order to do probability,
you need some assumptions and
you've got to state those
assumptions clearly.
One of the most common
assumptions is that all inputs
are equally likely.
That's called the uniform
distribution.
Every input of size n is
equally likely,
that kind of thing.
But there are other ways that
you could make that assumption,
and they may not all be true.
This is much more complicated,
as you can see.
Fortunately,
all of you have a strong
probability background.
And so we will not have any
trouble addressing these
probabilistic issues of dealing
with expectations and such.
If you don't,
time to go and say gee,
maybe I should take that
Probability class that is a
prerequisite for this class.
The last one I am going to
mention is best-case analysis.
And this I claim is bogus.
Bogus.
No good.
Why is best-case analysis
bogus?
Yeah?
The best-case probably doesn't
ever happen.
Actually, it's interesting
because for the sorting problem,
the most common things that get
sorted are things that are
already sorted interestingly,
or at least almost sorted.
For example,
one of the most common things
that are sorted is check numbers
by banks.
They tend to come in,
in the same order that they are
written.
They're sorting things that are
almost always sorted.
I mean, it's good.
When upper bond,
not lower bound?
Yeah, you want to make a
guarantee.
And so why is this not a
guarantee?
You're onto something there,
but we need a little more
precision here.
How can I cheat?
Yeah?
Yeah, you can cheat.
You cheat.
You take any slow algorithm
that you want and just check for
some particular input,
and if it's that input,
then you say immediately yeah,
OK, here is the answer.
And then it's got a good
best-case.
But I didn't tell you anything
about the vast majority of what
is going on.
So, you can cheat with a slow
algorithm that works fast on
some input.
It doesn't really do much for
you so we normally don't worry
about that.
Let's see.
What is insertion sorts
worst-case time?
Now we get into some sort of
funny issues.
First of all,
it sort of depends on the
computer you're running on.
Whose computer,
right?
Is it a big supercomputer or is
it your wristwatch?
They have different
computational abilities.
And when we compare algorithms,
we compare them typically for
relative speed.
This is if you compared two
algorithms on the same machine.
You could argue,
well, it doesn't really matter
what the machine is because I
will just look at their relative
speed.
But, of course,
I may also be interested in
absolute speed.
Is one algorithm actually
better no matter what machine
it's run on?
And so this kind of gets sort
of confusing as to how I can
talk about the worst-case time
of an algorithm of a piece of
software when I am not talking
about the hardware because,
clearly, if I had run on a
faster machine,
my algorithms are going to go
faster.
So, this is where you get the
big idea of algorithms.
Which is why algorithm is such
a huge field,
why it spawns companies like
Google, like Akamai,
like Amazon.
Why algorithmic analysis,
throughout the history of
computing, has been such a huge
success, is our ability to
master and to be able to take
what is apparently a really
messy, complicated situation and
reduce it to being able to do
some mathematics.
And that idea is called
asymptotic analysis.
And the basic idea of
asymptotic analysis is to ignore
machine-dependent constants --
-- and, instead of the actual
running time,
look at the growth of the
running time.
So, we don't look at the actual
running time.
We look at the growth.
Let's see what we mean by that.
This is a huge idea.
It's not a hard idea,
otherwise I wouldn't be able to
teach it in the first lecture,
but it's a huge idea.
We are going to spend a couple
of lectures understanding the
implications of that and will
basically be doing it throughout
the term.
And if you go on to be
practicing engineers,
you will be doing it all the
time.
In order to do that,
we adopt some notations that
are going to help us.
In particular,
we will adopt asymptotic
notation.
Most of you have seen some kind
of asymptotic notation.
Maybe a few of you haven't,
but mostly you should have seen
a little bit.
The one we're going to be using
in this class predominantly is
theta notation.
And theta notation is pretty
easy notation to master because
all you do is,
from a formula,
just drop low order terms and
ignore leading constants.
For example,
if I have a formula like 3n^3 =
90n^2 - 5n + 6046,
I say, well,
what low-order terms do I drop?
Well, n^3 is a bigger term n^2
than.
I am going to drop all these
terms and ignore the leading
constant, so I say that's
Theta(n^3).
That's pretty easy.
So, that's theta notation.
Now, this is an engineering way
of manipulating theta notation.
There is actually a
mathematical definition for
this, which we are going to talk
about next time,
which is a definition in terms
of sets of functions.
And, you are going to be
responsible, this is both a math
and a computer science
engineering class.
Throughout the course you are
going to be responsible both for
mathematical rigor as if it were
a math course and engineering
commonsense because it's an
engineering course.
We are going to be doing both.
This is the engineering way of
understanding what you do,
so you're responsible for being
able to do these manipulations.
You're also going to be
responsible for understanding
the mathematical definition of
theta notion and of its related
O notation and omega notation.
If I take a look as n
approached infinity,
a Theta(n^2) algorithm always
beats, eventually,
a Theta(n^3) algorithm.
As n gets bigger,
it doesn't matter what these
other terms were if I were
describing the absolute precise
behavior in terms of a formula.
If I had a Theta(n^2)
algorithm, it would always be
faster for sufficiently large n
than a Theta(n^3) algorithm.
It wouldn't matter what those
low-order terms were.
It wouldn't matter what the
leading constant was.
This one will always be faster.
Even if you ran the Theta(n^2)
algorithm on a slow computer and
the Theta(n^3) algorithm on a
fast computer.
The great thing about
asymptotic notation is it
satisfies our issue of being
able to compare both relative
and absolute speed,
because we are able to do this
no matter what the computer
platform.
On different platforms we may
get different constants here,
machine-dependent constants for
the actual running time,
but if I look at the growth as
the size of the input gets
larger, the asymptotics
generally won't change.
For example,
I will just draw that as a
picture.
If I have n on this axis and
T(n) on this axis.
This may be,
for example,
a Theta(n^3) algorithm and this
may be a Theta(n^2) algorithm.
There is always going to be
some point n_o where for
everything larger the Theta(n^2)
algorithm is going to be cheaper
than the Theta(n^3) algorithm
not matter how much advantage
you give it at the beginning in
terms of the speed of the
computer you are running on.
Now, from an engineering point
of view, there are some issues
we have to deal with because
sometimes it could be that that
n_o is so large that the
computers aren't big enough to
run the problem.
That's why we,
nevertheless,
are interested in some of the
slower algorithms,
because some of the slower
algorithms, even though they may
not asymptotically be slower,
I mean asymptotically they will
be slower.
They may still be faster on
reasonable sizes of things.
And so we have to both balance
our mathematical understanding
with our engineering commonsense
in order to do good programming.
So, just having done analysis
of algorithms doesn't
automatically make you a good
programmer.
You also need to learn how to
program and use these tools in
practice to understand when they
are relevant and when they are
not relevant.
There is a saying.
If you want to be a good
program, you just program ever
day for two years,
you will be an excellent
programmer.
If you want to be a world-class
programmer, you can program
every day for ten years,
or you can program every day
for two years and take an
algorithms class.
Let's get back to what we were
doing, which is analyzing
insertion sort.
We are going to look at the
worse-case.
Which, as we mentioned before,
is when the input is reverse
sorted.
The biggest element comes first
and the smallest last because
now every time you do the
insertion you've got to shuffle
everything over.
You can write down the running
time by looking at the nesting
of loops.
What we do is we sum up.
What we assume is that every
operation, every elemental
operation is going to take some
constant amount of time.
But we don't have to worry
about what that constant is
because we're going to be doing
asymptotic analysis.
As I say, the beautify of the
method is that it causes all
these things that are real
distinctions to sort of vanish.
We sort of look at them from
30,000 feet rather than from
three millimeters or something.
Each of these operations is
going to sort of be a basic
operation.
One way to think about this,
in terms of counting
operations, is counting memory
references.
How many times do you actually
access some variable?
That's another way of sort of
thinking about this model.
When we do that,
well, we're going to go through
this loop, j is going from 2 to
n, and then we're going to add
up the work that we do within
the loop.
We can sort of write that in
math as summation of j equals 2
to n.
And then what is the work that
is going on in this loop?
Well, the work that is going on
in this loop varies,
but in the worst case how many
operations are going on here for
each value of j?
For a given value of j,
how much work goes on in this
loop?
Can somebody tell me?
Asymptotically.
It's j times some constant,
so it's theta j.
So, there is theta j work going
on here because this loop starts
out with i being j minus 1,
and then it's doing just a
constant amount of stuff for
each step of the value of i,
and i is running from j minus
one down to zero.
So, we can say that is theta j
work that is going on.
Do people follow that?
OK.
And now we have a formula we
can evaluate.
What is the evaluation?
If I want to simplify this
formula, what is that equal to?
Sorry. In the back there.
Yeah. OK.
That's just Theta(n^2),
good.
Because when you're saying is
the sum of consecutive numbers,
you mean what?
What's the mathematic term we
have for that so we can
communicate?
You've got to know these things
so you can communicate.
It's called what type of
sequence?
It's actually a series,
but that's OK.
What type of series is this
called?
Arithmetic series,
good.
Wow, we've got some sharp
people who can communicate.
This is an arithmetic series.
You're basically summing 1 + 2
+ 3 + 4, some constants in
there, but basically it's 1 + 2
+ 3 + 4 + 5 + 6 up to n.
That's Theta(n^2).
If you don't know this math,
there is a chapter in the book,
or you could have taken the
prerequisite.
Erythematic series.
People have this vague
recollection.
Oh, yeah.
Good.
Now, you have to learn these
manipulations.
We will talk about a bit next
time, but you have to learn your
theta manipulations for what
works with theta.
And you have to be very careful
because theta is a weak
notation.
A strong notation is something
like Leibniz notation from
calculus where the chain rule is
just canceling two things.
It's just fabulous that you can
cancel in the chain rule.
And Leibniz notation just
expresses that so directly you
can manipulate.
Theta notation is not like
that.
If you think it is like that
you are in trouble.
You really have to think of
what is going on under the theta
notation.
And it is more of a descriptive
notation than it is a
manipulative notation.
There are manipulations you can
do with it, but unless you
understand what is really going
on under the theta notation you
will find yourself in trouble.
And next time we will talk a
little bit more about theta
notation.
Is insertion sort fast?
Well, it turns out for small n
it is moderately fast.
But it is not at all for large
n.
So, I am going to give you an
algorithm that is faster.
It's called merge sort.
I wonder if I should leave
insertion sort up.
Why not.
I am going to write on this
later, so if you are taking
notes, leave some space on the
left.
Here is merge sort of an array
A from 1 up to n.
And it is basically three
steps.
If n equals 1 we are done.
Sorting one element,
it is already sorted.
All right.
Recursive algorithm.
Otherwise, what we do is we
recursively sort A from 1 up to
the ceiling of n over 2.
And the array A of the ceiling
of n over 2 plus one up to n.
So, we sort two halves of the
input.
And then, three,
we take those two lists that we
have done and we merge them.
And, to do that,
we use a merge subroutine which
I will show you.
The key subroutine here is
merge, and it works like this.
I have two lists.
Let's say one of them is 20.
I am doing this in reverse
order.
I have sorted this like this.
And then I sort another one.
I don't know why I do it this
order, but anyway.
Here is my other list.
I have my two lists that I have
sorted.
So, this is A[1] to A[|n/2|]
and A[|n/2|+1] to A[n] for the
way it will be called in this
program.
And now to merge these two,
what I want to do is produce a
sorted list out of both of them.
What I do is first observe
where is the smallest element of
any two lists that are already
sorted?
It's in one of two places,
the head of the first list or
the head of the second list.
I look at those two elements
and say which one is smaller?
This one is smaller.
Then what I do is output into
my output array the smaller of
the two.
And I cross it off.
And now where is the next
smallest element?
And the answer is it's going to
be the head of one of these two
lists.
Then I cross out this guy and
put him here and circle this
one.
Now I look at these two guys.
This one is smaller so I output
that and circle that one.
Now I look at these two guys,
output 9.
So, every step here is some
fixed number of operations that
is independent of the size of
the arrays at each step.
Each individual step is just me
looking at two elements and
picking out the smallest and
advancing some pointers into the
array so that I know where the
current head of that list is.
And so, therefore,
the time is order n on n total
elements.
The time to actually go through
this and merge two lists is
order n.
We sometimes call this linear
time because it's not quadratic
or whatever.
It is proportional to n,
proportional to the input size.
It's linear time.
I go through and just do this
simple operation,
just working up these lists,
and in the end I have done
essentially n operations,
order n operations each of
which cost constant time.
That's a total of order n time.
Everybody with me?
OK.
So, this is a recursive
program.
We can actually now write what
is called a recurrence for this
program.
The way we do that is say let's
let the time to sort n elements
to be T(n).
Then how long does it take to
do step one?
That's just constant time.
We just check to see if n is 1,
and if it is we return.
That's independent of the size
of anything that we are doing.
It just takes a certain number
of machine instructions on
whatever machine and we say it
is constant time.
We call that theta one.
This is actually a little bit
of an abuse if you get into it.
And the reason is because
typically in order to say it you
need to say what it is growing
with.
Nevertheless,
we use this as an abuse of the
notation just to mean it is a
constant.
So, that's an abuse just so
people know.
But it simplifies things if I
can just write theta one.
And it basically means the same
thing.
Now we recursively sort these
two things.
How can I describe that?
The time to do this,
I can describe recursively as T
of ceiling of n over 2 plus T of
n minus ceiling of n over 2.
That is actually kind of messy,
so what we will do is just be
sloppy and write 2T(n/2).
So, this is just us being
sloppy.
And we will see on Friday in
recitation that it is OK to be
sloppy.
That's the great thing about
algorithms.
As long as you are rigorous and
precise, you can be as sloppy as
you want.
[LAUGHTER] This is sloppy
because I didn't worry about
what was going on,
because it turns out it doesn't
make any difference.
And we are going to actually
see that that is the case.
And, finally,
I have to merge the two sorted
lists which have a total of n
elements.
And we just analyze that using
the merge subroutine.
And that takes us to theta n
time.
That allows us now to write a
recurrence for the performance
of merge sort.
Which is to say that T of n is
equal to theta 1 if n equals 1
and 2T of n over 2 plus theta of
n if n is bigger than 1.
Because either I am doing step
one or I am doing all steps one,
two and three.
Here I am doing step one and I
return and I am done.
Or else I am doing step one,
I don't return,
and then I also do steps two
and three.
So, I add those together.
I could say theta n plus theta
1, but theta n plus theta 1 is
just theta n because theta 1 is
a lower order term than theta n
and I can throw it away.
It is either theta 1 or it is
2T of n over 2 plus theta n.
Now, typically we won't be
writing this.
Usually we omit this.
If it makes no difference to
the solution of the recurrence,
we will usually omit constant
base cases.
In algorithms,
it's not true generally in
mathematics, but in algorithms
if you are running something on
a constant size input it takes
constant time always.
So, we don't worry about what
this value is.
And it turns out it has no real
impact on the asymptotic
solution of the recurrence.
How do we solve a recurrence
like this?
I now have T of n expressed in
terms of T of n over 2.
That's in the book and it is
also in Lecture 2.
We are going to do Lecture 2 to
solve that, but in the meantime
what I am going to do is give
you a visual way of
understanding what this costs,
which is one of the techniques
we will elaborate on next time.
It is called a recursion tree
technique.
And I will use it for the
actual recurrence that is almost
the same 2T(n/2),
but I am going to actually
explicitly, because I want you
to see where it occurs,
plus some constant times n
where c is a constant greater
than zero.
So, we are going to look at
this recurrence with a base case
of order one.
I am just making the constant
in here, the upper bound on the
constant be explicit rather than
implicit.
And the way you do a recursion
tree is the following.
You start out by writing down
the left-hand side of the
recurrence.
And then what you do is you say
well, that is equal to,
and now let's write it as a
tree.
I do c of n work plus now I am
going to have to do work on each
of my two children.
T of n over 2 and T of n over
2.
If I sum up what is in here,
I get this because that is what
the recurrence says,
T(n)=2T(n/2)+cn.
I have 2T(n/2)+cn.
Then I do it again.
I have cn here.
I now have here cn/2.
And here is cn/2.
And each of these now has a
T(n/4).
And these each have a T(n/4).
And this has a T(n/4).
And I keep doing that,
the dangerous dot,
dot, dots.
And, if I keep doing that,
I end up with it looking like
this.
And I keep going down until I
get to a leaf.
And a leaf, I have essentially
a T(1).
That is T(1).
And so the first question I ask
here is, what is the height of
this tree?
Yeah.
It's log n.
It's actually very close to
exactly log n because I am
starting out at the top with n
and then I go to n/2 and n/4 and
all the way down until I get to
1.
The number of halvings of n
until I get to 1 is log n so the
height here is log n.
It's OK if it is constant times
log n.
It doesn't matter.
How many leaves are in this
tree, by the way?
How many leaves does this tree
have?
Yeah.
The number of leaves,
once again, is actually pretty
close.
It's actually n.
If you took it all the way
down.
Let's make some simplifying
assumption.
n is a perfect power of 2,
so it is an integer power of 2.
Then this is exactly log n to
get down to T(1).
And then there are exactly n
leaves, because the number of
leaves here, the number of nodes
at this level is 1,
2, 4, 8.
And if I go down height h,
I have 2 to the h leaves,
2 to the log n,
that is just n.
We are doing math here,
right?
Now let's figure out how much
work, if I look at adding up
everything in this tree I am
going to get T(n),
so let's add that up.
Well, let's add it up level by
level.
How much do we have in the
first level?
Just cn.
If I add up the second level,
how much do I have?
cn.
How about if I add up the third
level?
cn.
How about if I add up all the
leaves?
Theta n.
It is not necessarily cn
because the boundary case may
have a different constant.
It is actually theta n,
but cn all the way here.
If I add up the total amount,
that is equal to cn times log
n, because that's the height,
that is how many cn's I have
here, plus theta n.
And this is a higher order term
than this, so this goes away,
get rid of the constants,
that is equal to theta(n lg n).
And theta(n lg n) is
asymptotically faster than
theta(n^2).
So, merge sort,
on a large enough input size,
is going to beat insertion
sort.
Merge sort is going to be a
faster algorithm.
Sorry, you guys,
I didn't realize you couldn't
see over there.
You should speak up if you
cannot see.
So, this is a faster algorithm
because theta(n lg n) grows more
slowly than theta(n^2).
And merge sort asymptotically
beats insertion sort.
Even if you ran insertion sort
on a supercomputer,
somebody running on a PC with
merge sort for sufficient large
input will clobber them because
actually n^2 is way bigger than
n log n once you get the n's to
be large.
And, in practice,
merge sort tends to win here
for n bigger than,
say, 30 or so.
If you have a very small input
like 30 elements,
insertion sort is a perfectly
decent sort to use.
But merge sort is going to be a
lot faster even for something
that is only a few dozen
elements.
It is going to actually be a
faster algorithm.
That's sort of the lessons,
OK?
Remember that to get your
recitation assignments and
attend recitation on Friday.
Because we are going to be
going through a bunch of the
things that I have left on the
table here.
And see you next Monday.