The word stochastic is an adjective in English that describes something that was randomly determined.^{[1]} The word first appeared in English to describe a mathematical object called a stochastic process, but now in mathematics the terms stochastic process and random process are considered interchangeable.^{[2]}^{[3]}^{[4]}^{[5]}^{[6]} The word, with its current definition meaning random, came from German, but it originally came from Greek στόχος (stókhos), meaning 'aim, guess'.^{[1]}
The term stochastic is used in many different fields, particularly where stochastic or random processes are used to represent systems or phenomena that seem to change in a random way. The term is used in the physical sciences such as biology,^{[7]} chemistry,^{[8]} ecology,^{[9]} neuroscience,^{[10]} and physics^{[11]} as well as technology and engineering fields such as image processing, signal processing,^{[12]} information theory,^{[13]} computer science,^{[14]} (including the field of artificial intelligence), cryptography^{[15]} and telecommunications.^{[16]} It is also used in finance, due to seemingly random changes in financial markets^{[17]}^{[18]}^{[19]} as well as in medicine, linguistics, music, media, colour theory, botany, media, manufacturing, and geomorphology.
Stochastic social science theory is similar to systems theory.
YouTube Encyclopedic

1/5Views:252 51537 58599 437548 641787

✪ 5. Stochastic Processes I

✪ Mod01 Lec06 Stochastic processes

✪ 17. Stochastic Processes II

✪ Markov Chains  Part 1

✪ Learning and stochastic optimization with noni.i.d. data
Transcription
The following content is provided under a Creative Commons license. Your support will help MIT OpenCourseWare continue to offer high quality educational resources for free. To make a donation or view additional materials from hundreds of MIT courses, visit MIT OpenCourseWare at ocw.mit.edu. PROFESSOR: Today we're going to study stochastic processes and, among them, one type of it, so discrete time. We'll focus on discrete time. And I'll talk about what it is right now. So a stochastic process is a collection of random variables indexed by time, a very simple definition. So we have either let's start from 0 random variables like this, or we have random variables given like this. So a time variable can be discrete, or it can be continuous. With these ones, we'll call discrete time stochastic processes, and these ones continuous time. So for example, a discrete time random variable can be something like and so on. So these are the values, x0, x1, x2, x3, and so on. And they are random variables. This is just one so one realization of the stochastic process. But all these variables are supposed to be random. And then a continuous time random variable a continuous time stochastic process can be something like that. And it doesn't have to be continuous, so it can jump and it can jump and so on. And all these values are random values. So that's just a very informal description. And a slightly different point of view, which is slightly preferred, when you want to do some math with it, is that alternative definition it's a probability distribution over paths, over a space of paths. So you have all a bunch of possible paths that you can take. And you're given some probability distribution over it. And then that will be one realization. Another realization will look something different and so on. So this one it's more a intuitive definition, the first one, that it's a collection of random variables indexed by time. But that one, if you want to do some math with it, from the formal point of view, that will be more helpful. And you'll see why that's the case later. So let me show you some more examples. For example, to describe one stochastic process, this is one way to describe a stochastic process. t with let me show you three stochastic processes, so number one, f t equals t. And this was probability 1. Number 2, f t is equal to t, for all t, with probability 1/2, or f t is equal to minus t, for all t, with probability 1/2. And the third one is, for each t, f t is equal to t or minus t, with probability 1/2. The first one is quite easy to picture. It's really just there's nothing random in here. This happens with probability 1. Your path just says f t equals t. And we're only looking at t greater than or equal to 0 here. So that's number 1. Number 2, it's either this one or this one. So it is a stochastic process. If you think about it this way, it doesn't really look like a stochastic process. But under the alternative definition, you have two possible paths that you can take. You either take this path, with 1/2, or this path, with 1/2. Now, at each point, t, your value x t is a random variable. It's either t or minus t. And it's the same for all t. But they are dependent on each other. So if you know one value, you automatically know all the other values. And the third one is even more interesting. Now, for each t, we get rid of this dependency. So what you'll have is these two lines going on. I mean at every single point, you'll be either a top one or a bottom one. But if you really want draw the picture, it will bounce back and forth, up and down, infinitely often, and it'll just look like two lines. So I hope this gives you some feeling about stochastic processes, I mean, why we want to describe it in terms of this language, just a tiny bit. Any questions? So, when you look at a process, when you use a stochastic process to model a real life something going on, like a stock price, usually what happens is you stand at time t. And you know all the values in the past know. And in the future, you don't know. But you want to know something about it. You want to have some intelligent conclusion, intelligent information about the future, based on the past. For this stochastic processes, it's easy. No matter where you stand at, you exactly know what's going to happen in the future. For this one, it's also the same. Even though it's random, once you know what happened at some point, you know it has to be this distribution or this line, if it's here, and this line if it's there. But that one is slightly different. No matter what you know about the past, even if know all the values in the past, what happened, it doesn't give any information at all about the future. Though it's not true if I say any information at all. We know that each value has to be t or minus t. You just don't know what it is. So when you're given a stochastic process and you're standing at some time, your future, you don't know what the future is, but most of the time you have at least some level of control given by the probability distribution. Here, it was, you can really determine the line. Here, because of probability distribution, at each point, only gives t or minus t, you know that each of them will be at least one of the points, but you don't know more than that. So the study of stochastic processes is, basically, you look at the given probability distribution, and you want to say something intelligent about the future as t goes on. So there are three types of questions that we mainly study here. So a, first type, is what are the dependencies in the sequence of values. For example, if you know the price of a stock on all past dates, up to today, can you say anything intelligent about the future stock prices those type of questions. And b is what is the long term behavior of the sequence? So think about the law of large numbers that we talked about last time or central limit theorem. And the third type, this one is left relevant for our course, but, still, I'll just write it down. What are the boundary events? How often will something extreme happen, like how often will a stock price drop by more than 10% for a consecutive 5 days like these kind of events. How often will that happen? And for a different example, like if you model a call center and you want to know, over a period of time, the probability that at least 90% of the phones are idle or those kind of things. So that's was an introduction. Any questions? Then there are really lots of stochastic processes. One of the most important ones is the simple random walk. So today, I will focus on discrete time stochastic processes. Later in the course, we'll go on to continuous time stochastic processes. And then you'll see like Brownian motions and what else Ito's lemma and all those things will appear later. Right now, we'll study discrete time. And later, you'll see that it's really just what is it they're really parallel. So this simple random walk, you'll see the corresponding thing in continuous time stochastic processes later. So I think it's easier to understand discrete time processes, that's why we start with it. But later, it will really help if you understand it well. Because for continuous time, it will just carry over all the knowledge. What is a simple random walk? Let Yi be IID, independent identically distributed, random variables, taking values 1 or minus 1, each with probability 1/2. Then define, for each time t, X sub t as the sum of Yi, from i equals 1 to 2. Then the sequence of random variables, and X0 is equal to 0. X0, X1, X2, and so on is called a onedimensional, simple random walk. But I'll just refer to it as simple random walk or random walk. And this is a definition. It's called simple random walk. Let's try to plot it. At time 0, we start at 0. And then, depending on the value of Y1, you will either go up or go down. Let's say we went up. So that's at time 1. Then at time 2, depending on your value of Y2, you will either go up one step from here or go down one step from there. Let's say we went up again, down, 4, up, up, something like that. And it continues. Another way to look at it the reason we call it a random walk is, if you just plot your values of Xt, over time, on a line, then you start at 0, you go to the right, right, left, right, right, left, left, left. So the trajectory is like a walk you take on this line, but it's random. And each time you go to the right or left, right or left, right or left. So that was two representations. This picture looks a little bit more clear. Here, I just lost everything I draw. Something like that is the trajectory. So from what we learned last time, we can already say something intelligent about the simple random walk. For example, if you apply central limit theorem to the sequence, what is the information you get? So over a long time, let's say t is way, far away, like a huge number, a very large number, what can you say about the distribution of this at time t? AUDIENCE: Is it close to 0? PROFESSOR: Close to 0. But by close to 0, what do you mean? There should be a scale. I mean some would say that 1 is close to 0. Some people would say that 100 is close to 0, so do you have some degree of how close it will be to 0? Anybody? AUDIENCE: So variance will be small. PROFESSOR: Sorry? AUDIENCE: The variance will be small. PROFESSOR: Variance will be small. About how much will the variance be? AUDIENCE: 1 over n. PROFESSOR: 1 over n. 1 over n? AUDIENCE: Over t. PROFESSOR: 1 over t? Anybody else want to have a different? AUDIENCE: [INAUDIBLE]. PROFESSOR: 1 over square root t probably would. AUDIENCE: [INAUDIBLE]. AUDIENCE: The variance would be [INAUDIBLE]. PROFESSOR: Oh, you're right, sorry. Variance will be 1 over t. And the standard deviation will be 1 over square root of t. What I'm saying is, by central limit theorem. AUDIENCE: [INAUDIBLE]. Are you looking at the sums or are you looking at the? PROFESSOR: I'm looking at the Xt. Ah. That's a very good point t and square root of t. Thank you. AUDIENCE: That's very different. PROFESSOR: Yeah, very, very different. I was confused. Sorry about that. The reason is because Xt, 1over the square root of t times Xt we saw last time that this, if t is really, really large, this is close to the normal distribution, 0,1. So if you just look at it, Xt over the square root of t will look like normal distribution. That means the value, at t, will be distributed like a normal distribution, with mean 0 and variance square root of t. So what you said was right. It's close to 0. And the scale you're looking at is about the square root of t. So it won't go too far away from 0. That means, if you draw these two curves, square root of t and minus square root of t, your simple random walk, on a very large scale, won't like go too far away from these two curves. Even though the extreme values it can take I didn't draw it correctly is t and minus t, because all values can be 1 or all values can be minus 1. Even though, theoretically, you can be that far away from your xaxis, in reality, what's going to happen is you're going to be really close to this curve. You're going to play within this area, mostly. AUDIENCE: I think that [INAUDIBLE]. PROFESSOR: So, yeah, that was a very vague statement. You won't deviate too much. So at the 100 square root of t, you will be inside this interval like 90% of the time. If you take this to be 10,000 times square root of t, almost 99.9% or something like that. And there's even a theorem saying you will hit these two lines infinitely often. So if you go over time, a very long period, for a very, very long, you live long enough, then, even if you go down here. Even, in this picture, you might think, OK, in some cases, it might be the case that you always play in the negative region. But there's a theorem saying that that's not the case. With probability 1, if you go to infinity, you will cross this line infinitely often. And in fact, you will meet these two lines infinitely often. So those are some interesting things about simple random walk. Really, there are lot more interesting things, but I'm just giving an overview, in this course, now. Unfortunately, I can't talk about all of these fun stuffs. But let me still try to show you some properties and one nice computation on it. So some properties of a random walk, first, expectation of Xk is equal to 0. That's really easy to prove. Second important property is called independent increment. So if look at these times, t0, t1, up to tk, then random variables X sub ti plus 1 minus X sub ti are mutually independent. So what this says is, if you look at what happens from time 1 to 10, that is irrelevant to what happens from 20 to 30. And that can easily be shown by the definition. I won't do that, but we'll try to do it as an exercise. Third one is called stationary, so it has the property. That means, for all h greater or equal to 0, and t greater than or equal to 0 h is actually equal to 1 the distribution of Xt plus h minus Xt is the same as the distribution of X sub h. And again, this easily follows from the definition. What it says is, if you look at the same amount of time, then what happens inside this interval is irrelevant of your starting point. The distribution is the same. And moreover, from the first part, if these intervals do not overlap, they're independent. So those are the two properties that we're talking here. And you'll see these properties appearing again and again. Because stochastic processes having these properties are really good, in some sense. They are fundamental stochastic processes. And simple random walk is like the fundamental stochastic process. So let's try to see one interesting problem about simple random walk. So example, you play a game. It's like a coin toss game. I play with, let's say, Peter. So I bet $1 at each turn. And then Peter tosses a coin, a fair coin. It's either heads or tails. If it's heads, he wins. He wins the $1. If it's tails, I win. I win $1. So from my point of view, in this coin toss game, at each turn my balance goes up by $1 or down by $1. And now, let's say I started from $0.00 balance, even though that's not possible. Then my balance will exactly follow the simple random walk, assuming that the coin it's a fair coin, 5050 chance. Then my balance is a simple random walk. And then I say the following. You know what? I'm going to play. I want to make money. So I'm going to play until I win $100 or I lose $100. So let's say I play until I win $100 or I lose $100. What is the probability that I will stop after winning $100? AUDIENCE: 1/2. PROFESSOR: 1/2 because? AUDIENCE: [INAUDIBLE]. PROFESSOR: Yes. So happens with 1/2, 1/2. And this is by symmetry. Because every chain of coin toss, which gives a winning sequence, when you flip it, it will give a losing sequence. We have one to one correspondence between those two things. That was good. Now if I change it. What if I say I will win $100 or I lose $50? What if I play until win $100 or lose $50? In other words, I look at the random walk, I look at the first time that it hits either this line or it hits this line, and then I stop. What is the probability that I will stop after winning $100? AUDIENCE: [INAUDIBLE]. PROFESSOR: 1/3? Let me see. Why 1/3? AUDIENCE: [INAUDIBLE]. PROFESSOR: So you're saying, hitting this probability is p. And the probability that you hit this first is p, right? It's 1/2, 1/2. But you're saying from here, it's the same. So it should be 1/4 here, 1/2 times 1/2. You've got a good intuition. It is 1/3, actually. AUDIENCE: [INAUDIBLE]. PROFESSOR: And then once you hit it, it's like the same afterwards? I'm not sure if there is a way to make an argument out of it. I really don't know. There might be or there might not be. I'm not sure. I was thinking of a different way. But yeah, there might be a way to make an argument out of it. I just don't see it right now. So in general, if you put a line B and a line A, then probability of hitting B first is A over A plus B. And the probability of hitting this line, minus A, is B over A plus B. And so, in this case, if it's 100 and 50, it's 100 over 150, that's 2/3 and that's 1/3. This can be proved. It's actually not that difficult to prove it. I mean it's hard to find the right way to look at it. So fix your B and A. And for each k between minus A and B define f of k as the probability that you'll hit what is it this line first, and the probability that you hit the line B first when you start at k. So it kind of points out what you're saying. Now, instead of looking at one fixed starting point, we're going to change our starting point and look at all possible ways. So when you start at k, I'll define f of k as the probability that you hit this line first before hitting that line. What we are interested in is computing f 0. What we know is f of B is equal to 1, f of minus A is equal to 0. And then actually, there's one recursive formula that matters to us. If you start at f k, you either go up or go down. You go up with probability 1/2. You go down with probability 1/2. And now it starts again. Because of this which one is it stationary property. So starting from here, the probability that you hit B first it exactly f of k plus 1. So if you go up, the probability that you hit B first is f of k plus 1. If you go down, it's f of k minus 1. And then that gives you a recursive formula with two boundary values. If you look at it, you can solve it. When you solve it, you'll get that answer. So I won't go into details, but what I wanted to show is that simple random walk is really this property, these two properties. It has these properties and even more powerful properties. So it's really easy to control. And at the same, time it's quite universal. It can model like it's not a very weak model. It's rather restricted, but it's a really good model for like a mathematician. From the practical point of view, you'll have to twist some things slightly and so on. But in many cases, you can approximate it by simple random walk. And as you can see, you can do computations, with simple random walk, by hand. So that was it. I talked about the most important example of stochastic process. Now, let's talk about more stochastic processes. The second one is called the Markov chain. Let me right that part, actually. So Markov chain, unlike the simple random walk, is not a single stochastic process. A stochastic process is called a Markov chain if has some property. And what we want to capture in Markov chain is the following statement. These are a collection of stochastic processes having the property that whose effect of the past on the future is summarized only by the current state. That's quite a vague statement. But what we're trying to capture here is now, look at some generic stochastic process at time t. You know all the history up to time t. You want to say something about the future. Then, if it's a Markov chain, what it's saying is, you don't even have know all about this. Like this part is really irrelevant. What matters is the value at this last point, last time. So if it's a Markov chain, you don't have to know all this history. All you have to know is this single value. And all of the effect of the past on the future is contained in this value. Nothing else matters. Of course, this is a very special type of stochastic process. Most other stochastic processes, the future will depend on the whole history. And in that case, it's more difficult to analyze. But these ones are more manageable. And still, lots of interesting things turn out to be Markov chains. So if you look at simple random walk, it is a Markov chain, right? So simple random walk, let's say you went like that. Then what happens after time t really just depends on how high this point is at. What happened before doesn't matter at all. Because we're just having new coin tosses every time. That this value can affect the future, because that's where you're going to start your process from. Like that's where you're starting your process. So that is a Markov chain. This part is irrelevant. Only the value matters. So let me define it a little bit more formally. A discrete time stochastic process is a Markov chain if the probability that X at some time, t plus 1, is equal to something, some value, given the whole history up to time n is equal to the probability that Xt plus 1 is equal to that value, given the value X sub n for all n greater than or equal to t greater than or equal to 0 and all s. This is a mathematical way of writing down this. The value at Xt plus 1, given all the values up to time t, is the same as the value at time t plus 1, the probability of it, given only the last value. And the reason simple random walk is a Markov chain is because both of them are just 1/2. I mean, if it's for let me write it down. So example, random walk probability that Xt plus 1 equal to s, given t is equal to 1/2, if s is equal Xt plus 1 or Xt minus 1, and 0 otherwise. So it really depends only on the last value of Xt. Any questions? All right. If there is case when you're looking at a stochastic process, a Markov chain, and all Xi have values in some set s, which is finite, a finite set, in that case, it's really easy to describe Markov chains. So now denote the probability ij as the probability that, if at that time t you are at i, the probability that you jump to j at time t plus 1 for all pair of points i, j. I mean, it's a finite set, so I might just as well call it the integer set from 1 to m, just to make the notation easier. Then, first of all, if the sum over all j and s, Pij, that is equal to 1. Because if you start at i, you'll have to jump to somewhere in your next step. So if you sum over all possible states you can have, you have to sum up to 1. And really, a very interesting thing is this matrix called the transition probability matrix, defined as. So we put Pij at [INAUDIBLE] and [INAUDIBLE]. And really, this tells you everything about the Markov chain. Everything about the stochastic process is contained in this matrix. That's because a future state only depends on the current state. So if you know what happens at time t, where it's at time t, look at the matrix, you can decode all the information you want. What is the probability that it will be at let's say, it's at 0 right now. What's the probability that it will jump to 1 at the next time? Just look at 0 comma 1, here. There is no 0, 1, here, so it's 1 and 2. Just look at 1 and 2, 1 and 2, i and j. Actually, I made a mistake. That should be the right one. Not only that, that's a onestep. So what happened is it describes what happens in a single step, the probability that you jump from i to j. But using that, you can also model what's the probability that you jump from i to j in two steps. So let's define q sub i j as the probability that X at time t plus 2 is equal to j, given that X at time t is equal to i. Then the matrix, defined this way, can you describe it in terms of the matrix A? Anybody? Multiplication? Very good. So it's A square. Why is it? So let me write this down in a different way. qij is you sum over all intermediate values the probability that you jump from i to k, first, and then the probability that you jump from k to j. And if you look at what this means, each entry here is described by a linear what is it the dot product of a column and a row. And that's exactly what occurs. And if you want to look at the threestep, fourstep, all you have to do is just multiply it again and again and again. Really, this matrix contains all the information you want if you have a Markov chain and its finite. That's very important. For random walk, simple random walk, I told you that it is a Markov chain. But it does not have a transition probability matrix, because the state space is not finite. So be careful. However, finite Markov chains, really, there's one matrix that describes everything. I mean, I said it like it's something very interesting. But if you think about it, you just wrote down all the probabilities. So it should describe everything. So an example. You have a machine, and it's broken or working at a given day. That's a silly example. So if it's working today, working tomorrow, broken with probability 0.01, working with probability 0.90. If it's broken, the probability that it's repaired on the next day is 0.8. And it's broken at 0.2. Suppose you have something like this. This is an example of a Markov chain used in like engineering applications. In this case, s is also called a sample state space, actually. And the reason is because, in many cases, what you're modeling is these kind of states of some system, like broken or working, rainy, sunny, cloudy as weather. And all these things that you model represent states a lot of time. So you call it state set as well. So that's an example. And let's see what happens for this matrix. We have two states, working and broken. Working to working is 0.99. Working to broken is 0.01. Broken to working is 0.8. Broken to broken is 0.2. So that's what we've learned so far. And the question, what happens if you start from some state, let's say it was working today, and you go a very, very long time, like a year or 10 years, then the distribution, after 10 years, on that day, is A to the 3,650. So that will be that times 1, 0 will be the probability p, q. p will be the probability that it's working at that time. q will be the probability that it's broken at that time. What will p and q be? What will p and q be? That's the question that we're trying to ask. We didn't learn, so far, how to do this, but let's think about it. I'm going to cheat a little bit and just say, you know what, I think, over a long period of time, the probability distribution on day 3,650 and that on day 3,651 shouldn't be that different. They should be about the same. Let's make that assumption. I don't know if it's true or not. Well, I know it's true, but that's what I'm telling you. Under that assumption, now you can solve what p and q are. So approximately, I hope, p, q so A 3,650, 1, 0 is approximately the same as A to the 3,651, 1, 0. That means that this is p, q. p, q is about the same as A times p, q. Anybody remember what this is? Yes. So p, q will be the eigenvector of this matrix. Over a long period of time, the probability distribution that you will observe will be the eigenvector. And whats the eigenvalue? 1, at least in this case, it looks like it's 1. Now I'll make one more connection. Do you remember PerronFrobenius theorem? So this is a matrix. All entries are positive. So there is a largest eigenvalue, which is positive and real. And there is on all positive eigenvector corresponding to it. What I'm trying to say is that's going to be your p, q. But let me not jump to the conclusion yet. And one more thing we know is, by PerronFrobenius, there exists an eigenvalue, the largest one, lambda greater than 0, and eigenvector v1, v2, where v1, v2 are positive. Moreover, lambda was a multiplicity of 1. I'll get back to it later. So let's write this down. A times v1, v2 is equal to lambda times v1, v2. A times v1, v2, we can write it down. It's 0.99 v1 plus 0.01 v2. And that 0.8 v1 plus 0.2 v2, which is equal to v1, v2. You can solve v1 and v2, but before doing that sorry about that. This is flipped. Yeah, so everybody, it should have been flipped in the beginning. So that's 8. So sum these two values, and you get lambda of this, v1, v2. On the left, what you get is v1 plus v2, so sum two coordinates. On the left, you get v1 plus v2. On the right, you get lambda times v1 plus v2. That means your lambda is equal to 1. So that eigenvalue, guaranteed by PerronFrobenius theorem, is 1, eigenvalue of 1. So what you'll find here will be the eigenvector corresponding to the largest eigenvalue eigenvector will be the one corresponding to the largest eigenvalue, which is equal to 1. And that's something very general. It's not just about this matrix and this special example. In general, if you have a transition matrix, if you're given a Markov chain and given a transition matrix, PerronFrobenius theorem guarantees that there exists a vector as long as all the entries are positive. So in general, if transition matrix of a Markov chain has positive entries, then there exists a vector pi 1 equal to pi m such that I'll just call it v Av is equal to v. And that will be the long term behavior as explained. Over a long, if it converges to some state, it has to satisfy that. And by PerronFrobenius theorem, we know that there is a vector satisfying it. So if it converges, it will converge to that. And what it's saying is, if all the entries are positive, then it converges. And there is such a state. We know the longterm behavior of the system. So this is called the stationary distribution. Such vector v is called. It's not really right to say that a vector has stationary distribution. But if I give this distribution to the state space, what I mean is consider probability distribution over s such that probability is so it's a random variable X X is equal to i is equal to pi i. If you start from this distribution, in the next step, you'll have the exact same distribution. That's what I'm trying to say here. That's called a stationary distribution. Any questions? AUDIENCE: So [INAUDIBLE]? PROFESSOR: Yes. Very good question. Yeah, but PerronFrobenius theorem say there is exactly one eigenvector corresponding to the largest eigenvalue. And that turns out to be 1. The largest eigenvalue turns out to be 1. So there will a unique stationary distribution if all the entries are positive. AUDIENCE: [INAUDIBLE]? PROFESSOR: This one? AUDIENCE: [INAUDIBLE]? PROFESSOR: Maybe. It's a good point. Huh? Something is wrong. Can anybody help me? This part looks questionable. AUDIENCE: Just kind of [INAUDIBLE] question, is that topic covered in portions of [INAUDIBLE]? The other eigenvalues in the matrix are smaller than 1. And so when you take products of the transition probability matrix, those eigenvalues that are smaller than 1 scale after repeated multiplication to 0. So in the limit, they're 0, but until you get to the limit, you still have them. Essentially, that kind of behavior is transitionary behavior that dissipates. But the behavior corresponding to the stationary distribution persists. PROFESSOR: But, as you mentioned, this argument seems to be giving that all lambda has to be 1, right? Is that your point? You're right. I don't see what the problem is right now. I'll think about it later. I don't want to waste my time on trying to find what's wrong. But the conclusion is right. There will be a unique one and so on. Now let me make a note here. So let me move on to the final topic. It's called martingale. And this is another collection of stochastic processes. And what we're trying to model here is a fair game, stochastic processes which are a fair game. And formally, what I mean is a stochastic process is a martingale if that happens. Let me iterate it. So what we have here is, at time t, if you look at what's going to happen at time t plus 1, take the expectation, then it has to be exactly equal to the value of Xt. So we have this stochastic process, and, at time t, you are at Xt. At time t plus 1, lots of things can happen. It might go to this point, that point, that point, or so on. But the probability distribution is designed so that the expected value over all these are exactly equal to the value at Xt. So it's kind of centered at Xt, centered meaning in the probabilistic sense. The expectation is equal to that. So if your value at time t was something else, your values at time t plus 1 will be centered at this value instead of that value. And the reason I'm saying it models a fair game is because, if this is like your balance over some game, in expectation, you're not supposed to win any money at all And I will later tell you more about that. So example, a random walk is a martingale. What else? Second one, now let's say you're in a casino and you're playing roulette. Balance of a roulette player is not a martingale. Because it's designed so that the expected value is less than 0. You're supposed to lose money. Of course, at one instance, you might win money. But in expected value, you're designed to go down. So it's not a martingale. It's not a fair game. The game is designed for the casino not for you. Third one is some funny example. I just made it up to show that there are many possible ways that a stochastic process can be a martingale. So if Yi are IID random variables such that Yi is equal to 2, with probability 1/3, and 1/2 is probability 2/3, then let X0 equal 1 and Xk equal. Then that is a martingale. So at each step, you'll either multiply by 2 or 1/2 by 2 just divide by 2. And the probability distribution is given as 1/3 and 2/3. Then Xk is a martingale. The reason is so you can compute the expected value. The expected value of the Xk plus 1, given Xk up to [INAUDIBLE], is equal to what you have is expected value of Y k plus 1 times Yk up to Y1. That part is Xk. But this is designed so that the expected value is equal to 1. So it's a martingale. I mean it will fluctuate a lot, your balance, double, double, double, half, half, half, and so on. But still, in expectation, you will always maintain. I mean the expectation that all time is equal to 1, if you look at it from the beginning. You look at time 1, then the expected value of x1 and so on. Any questions on definition or example? So the random walk is an example which is both Markov chain and martingale. But these two concepts are really two different concepts. Try not to be confused between the two. They're just two different things. There are Markov chains which are not martingales. There are martingales which are not Markov chains. And there are somethings which are both, like a simple random walk. There are some stuff which are not either of them. They really are just two separate things. Let me conclude with one interesting theorem about martingales. And it really enforces your intuition, at least intuition of the definition, that martingale is a fair game. It's called optional stopping theorem. And I will write it down more formally later, but the message is this. If you play a martingale game, if it's a game you play and it's your balance, no matter what strategy you use, your expected value cannot be positive or negative. Even if you try to lose money so hard, you won't be able to do that. Even if you try to win money so hare, like try to invent something really, really cool and ingenious, you should not be able to win money. Your expected value is just fixed. That's the concept of the theorem. Of course, there are technical conditions that have to be there. So if you're playing a martingale game, then you're not supposed to win or lose, at least in expectation. So before stating the theorem, I have to define what a stopping point means. So given a stochastic process, a nonnegative integer, a valued random variable, tau, is called a stopping time, if, for all integer k greater than or equal to 0, tau, lesser or equal to k, depends only on X1 to Xk. So that is something very, very strange. I want to define something called a stopping time. It will be a nonnegative integer valued random variable. So it will it be 0, 1, 2, or so on. That means it will be some time index. And if you look at the event that tau is less than or equal to k so if you want to look at the events when you stop at time less than or equal to k, your decision only depends on the events up to k, on the value of the stochastic process up to time k. In other words, if this is some strategy you want to use by strategy I mean some strategy that you stop playing at some point. You have a strategy that is defined as you play some k rounds, and then you look at the outcome. You say, OK, now I think it's in favor of me. I'm going to stop. You have a predefined set of strategies. And if that strategy only depends on the values of the stochastic process up to right now, then it's a stopping time. If it's some strategy that depends on future values, it's not a stopping time. Let me show you by example. Remember that coin toss game which had random walk value, so either win $1 or lose $1. So in coin toss game, let tau be the first time at which balance becomes $100, then tau is a stopping time. Or you stop at either $100 or negative $50, that's still a stopping time. Remember that we discussed about it? We look at our balance. We stop at either at the time when we win $100 or lose $50. That is a stopping time. But I think it's better to tell you what is not a stopping time, an example. That will help, really. So let tau in the same game the time of first peak. By peak, I mean the time when you go down, so that would be your tau. So the first time when you start to go down, you're going to stop. That's not a stopping time. Not a stopping time. To see formally why it's the case, first of all, if you want to decide if it's a peak or not at time t, you have to refer to the value at time t plus 1. For just looking at values up to time t, you don't know if it's going to be a peak or if it's going to continue. So the event that you stop at time t depends on t plus 1 as well, which doesn't fall into this definition. So that's what we're trying to distinguish by defining a stopping time. In these cases it was clear, at the time, you know if you have to stop or not. But if you define your stopping time in this way and not a stopping time, if you define tau in this way, your decision depends on future values of the outcome. So it's not a stopping time under this definition. Any questions? Does it make sense? Yes? AUDIENCE: Could you still have tau as the stopping time, if you were referring to t, and then t minus 1 was greater than [INAUDIBLE]? PROFESSOR: So. AUDIENCE: Let's say, yeah, it was [INAUDIBLE]. PROFESSOR: So that time after peak, the first time after peak? AUDIENCE: Yes. PROFESSOR: Yes, that will be a stopping time. So three, tau is tau 0 plus 1, where tau 0 is the first peak, then it is a stopping time. It's a stopping time. So the optional stopping theorem that I promised says the following. Suppose we have a martingale, and tau is a stopping time. And further suppose that there exists a constant t such that tau is less than or equal t always. So you have some strategy which is a finite strategy. You can't go on forever. You have some bound on the time. And your stopping time always ends before that time. In that case, then expectation of your value at the stopping time, when you've stopped, your balance, if that's what it's modeling, is always equal to the balance at the beginning. So no matter what strategy you use, if you're a mortal being, then you cannot win. That's the content of this theorem. So I wanted to prove it, but I'll not, because I think I'm running out of time. But let me show you one, very interesting corollary of this applied to that number one. So number one is a stopping time. It's not clear that there is a bounded time where you always stop before that time. But this theorem does apply to that case. So I'll just forget about that technical issue. So corollary, it applies not immediately, but it does apply to the first case, case 1 given above. And then what it says is expectation of X tau is equal to 0. But expectation of X tau is X at tau is either 100 or negative 50, because they're always going to stop at the first time where you either hit $100 or minus $50. So this is 100 times some probability plus 1 minus p times minus 50. There's some probability that you stop at 100. With all the rest, you're going to stop at minus 50. You know it's set. It's equal to 0. What it gives is I hope it gives me the right thing I'm thinking about. p, 100, yes. It's 150p minus 50 equals 0. p is 1/3. And if you remember, that was exactly the computation we got. So that's just a neat application. But the content of this, it's really interesting. So try to contemplate about it, something very philosophically. If something can be modeled using martingales, perfectly, if it really fits into the mathematical formulation of a martingale, then you're not supposed to win. So that's it for today. And next week, Peter will give wonderful lectures. See you next week.
Contents
 1 Etymology
 2 Artificial intelligence
 3 Mathematics
 4 Natural science
 5 Physics
 6 Biology
 7 Creativity
 8 Computer science
 9 Finance
 10 Geomorphology
 11 Language and linguistics
 12 Manufacturing
 13 Media
 14 Medicine
 15 Music
 16 Social sciences
 17 Subtractive color reproduction
 18 See also
 19 Notes
 20 References
 21 Further reading
 22 External links
Etymology
The word stochastic in English was originally used as an adjective with the definition "pertaining to conjecturing", and stemming from a Greek word meaning "to aim at a mark, guess", and the Oxford English Dictionary gives the year 1662 as its earliest occurrence.^{[1]} In his work on probability Ars Conjectandi, originally published in Latin in 1713, Jakob Bernoulli used the phrase "Ars Conjectandi sive Stochastice", which has been translated to "the art of conjecturing or stochastics".^{[20]} This phrase was used, with reference to Bernoulli, by Ladislaus Bortkiewicz^{[21]} who in 1917 wrote in German the word stochastik with a sense meaning random. The term stochastic process first appeared in English in a 1934 paper by Joseph Doob.^{[1]} For the term and a specific mathematical definition, Doob cited another 1934 paper, where the term stochastischer Prozeß was used in German by Aleksandr Khinchin,^{[22]}^{[23]} though the German term had been used earlier in 1931 by Andrey Kolmogorov.^{[24]}
Artificial intelligence
In artificial intelligence, stochastic programs work by using probabilistic methods to solve problems, as in simulated annealing, stochastic neural networks, stochastic optimization, genetic algorithms, and genetic programming. A problem itself may be stochastic as well, as in planning under uncertainty.
Mathematics
In the early 1930s, Aleksandr Khinchin gave the first mathematical definition of a stochastic process as a family of random variables indexed by the real line.^{[25]}^{[22]}^{[a]} Further fundamental work on probability theory and stochastic processes was done by Khinchin as well as other mathematicians such as Andrey Kolmogorov, Joseph Doob, William Feller, Maurice Fréchet, Paul Lévy, Wolfgang Doeblin, and Harald Cramér.^{[27]}^{[28]} Decades later Cramér referred to the 1930s as the "heroic period of mathematical probability theory".^{[28]}
In mathematics, specifically probability theory, the theory of stochastic processes is considered to be an important contribution to mathematics^{[29]} and it continues to be an active topic of research for both theoretical reasons and applications.^{[30]}^{[31]}^{[32]}
The word stochastic is used to describe other terms and objects in mathematics. Examples include a stochastic matrix, which describes a stochastic process known as a Markov process, and stochastic calculus, which involves differential equations and integrals based on stochastic processes such as the Wiener process, also called the Brownian motion process.
Natural science
One of the simplest continuoustime stochastic processes is Brownian motion. This was first observed by botanist Robert Brown while looking through a microscope at pollen grains in water.
Physics
The name "Monte Carlo" for the stochastic Monte Carlo method was popularized by physics researchers Stanisław Ulam, Enrico Fermi, John von Neumann, and Nicholas Metropolis, among others. The name is a reference to the Monte Carlo Casino in Monaco where Ulam's uncle would borrow money to gamble.^{[33]} The use of randomness and the repetitive nature of the process are analogous to the activities conducted at a casino. Methods of simulation and statistical sampling generally did the opposite: using simulation to test a previously understood deterministic problem. Though examples of an "inverted" approach do exist historically, they were not considered a general method until the popularity of the Monte Carlo method spread.
Perhaps the most famous early use was by Enrico Fermi in 1930, when he used a random method to calculate the properties of the newly discovered neutron. Monte Carlo methods were central to the simulations required for the Manhattan Project, though were severely limited by the computational tools at the time. Therefore, it was only after electronic computers were first built (from 1945 on) that Monte Carlo methods began to be studied in depth. In the 1950s they were used at Los Alamos for early work relating to the development of the hydrogen bomb, and became popularized in the fields of physics, physical chemistry, and operations research. The RAND Corporation and the U.S. Air Force were two of the major organizations responsible for funding and disseminating information on Monte Carlo methods during this time, and they began to find a wide application in many different fields.
Uses of Monte Carlo methods require large amounts of random numbers, and it was their use that spurred the development of pseudorandom number generators, which were far quicker to use than the tables of random numbers which had been previously used for statistical sampling.
Biology
Stochastic resonance: In biological systems, introducing stochastic "noise" has been found to help improve the signal strength of the internal feedback loops for balance and other vestibular communication.^{[34]} It has been found to help diabetic and stroke patients with balance control.^{[35]} Many biochemical events also lend themselves to stochastic analysis. Gene expression, for example, has a stochastic component through the molecular collisions—as during binding and unbinding of RNA polymerase to a gene promoter—via the solution's Brownian motion.
Creativity
Simonton (2003, Psych Bulletin) argues that creativity in science (of scientists) is a constrained stochastic behaviour such that new theories in all sciences are, at least in part, the product of a stochastic process.
Computer science
Stochastic ray tracing is the application of Monte Carlo simulation to the computer graphics ray tracing algorithm. "Distributed ray tracing samples the integrand at many randomly chosen points and averages the results to obtain a better approximation. It is essentially an application of the Monte Carlo method to 3D computer graphics, and for this reason is also called Stochastic ray tracing."^{[citation needed]}
Stochastic forensics analyzes computer crime by viewing computers as stochastic processes.
Finance
The financial markets use stochastic models to represent the seemingly random behaviour of assets such as stocks, commodities, relative currency prices (i.e., the price of one currency compared to that of another, such as the price of US Dollar compared to that of the Euro), and interest rates. These models are then used by quantitative analysts to value options on stock prices, bond prices, and on interest rates, see Markov models. Moreover, it is at the heart of the insurance industry.
Geomorphology
The formation of river meanders has been analyzed as a stochastic process
Language and linguistics
Nondeterministic approaches in language studies are largely inspired by the work of Ferdinand de Saussure, for example, in functionalist linguistic theory, which argues that competence is based on performance.^{[36]}^{[37]} This distinction in functional theories of grammar should be carefully distinguished from the langue and parole distinction. To the extent that linguistic knowledge is constituted by experience with language, grammar is argued to be probabilistic and variable rather than fixed and absolute. This conception of grammar as probabilistic and variable follows from the idea that one's competence changes in accordance with one's experience with language. Though this conception has been contested,^{[38]} it has also provided the foundation for modern statistical natural language processing^{[39]} and for theories of language learning and change.^{[40]}
Manufacturing
Manufacturing processes are assumed to be stochastic processes. This assumption is largely valid for either continuous or batch manufacturing processes. Testing and monitoring of the process is recorded using a process control chart which plots a given process control parameter over time. Typically a dozen or many more parameters will be tracked simultaneously. Statistical models are used to define limit lines which define when corrective actions must be taken to bring the process back to its intended operational window.
This same approach is used in the service industry where parameters are replaced by processes related to service level agreements.
Media
The marketing and the changing movement of audience tastes and preferences, as well as the solicitation of and the scientific appeal of certain film and television debuts (i.e., their opening weekends, wordofmouth, topofmind knowledge among surveyed groups, star name recognition and other elements of social media outreach and advertising), are determined in part by stochastic modeling. A recent attempt at repeat business analysis was done by Japanese scholars^{[citation needed]} and is part of the Cinematic Contagion Systems patented by Geneva Media Holdings, and such modeling has been used in data collection from the time of the original Nielsen ratings to modern studio and television test audiences.
Medicine
Stochastic effect, or "chance effect" is one classification of radiation effects that refers to the random, statistical nature of the damage. In contrast to the deterministic effect, severity is independent of dose. Only the probability of an effect increases with dose.
Music
In music, mathematical processes based on probability can generate stochastic elements.
Stochastic processes may be used in music to compose a fixed piece or may be produced in performance. Stochastic music was pioneered by Iannis Xenakis, who coined the term stochastic music. Specific examples of mathematics, statistics, and physics applied to music composition are the use of the statistical mechanics of gases in Pithoprakta, statistical distribution of points on a plane in Diamorphoses, minimal constraints in Achorripsis, the normal distribution in ST/10 and Atrées, Markov chains in Analogiques, game theory in Duel and Stratégie, group theory in Nomos Alpha (for Siegfried Palm), set theory in Herma and Eonta,^{[41]} and Brownian motion in N'Shima.^{[citation needed]} Xenakis frequently used computers to produce his scores, such as the ST series including MorsimaAmorsima and Atrées, and founded CEMAMu. Earlier, John Cage and others had composed aleatoric or indeterminate music, which is created by chance processes but does not have the strict mathematical basis (Cage's Music of Changes, for example, uses a system of charts based on the IChing). Lejaren Hiller and Leonard Issacson used generative grammars and Markov chains in their 1957 Illiac Suite. Modern electronic music production techniques make these processes relatively simple to implement, and many hardware devices such as synthesizers and drum machines incorporate randomization features. Generative music techniques are therefore readily accessible to composers, performers, and producers.
Social sciences
Stochastic social science theory is similar to systems theory in that events are interactions of systems, although with a marked emphasis on unconscious processes. The event creates its own conditions of possibility, rendering it unpredictable if simply for the number of variables involved. Stochastic social science theory can be seen as an elaboration of a kind of 'third axis' in which to situate human behavior alongside the traditional 'nature vs. nurture' opposition. See Julia Kristeva on her usage of the 'semiotic', Luce Irigaray on reverse Heideggerian epistemology, and Pierre Bourdieu on polythetic space for examples of stochastic social science theory.^{[citation needed]}
Subtractive color reproduction
When color reproductions are made, the image is separated into its component colors by taking multiple photographs filtered for each color. One resultant film or plate represents each of the cyan, magenta, yellow, and black data. Color printing is a binary system, where ink is either present or not present, so all color separations to be printed must be translated into dots at some stage of the workflow. Traditional line screens which are amplitude modulated had problems with moiré but were used until stochastic screening became available. A stochastic (or frequency modulated) dot pattern creates a sharper image.
See also
Notes
 ^ Doob, when citing Khinchin, uses the term 'chance variable', which used to be an alternative term for 'random variable'.^{[26]}
References
 ^ ^{a} ^{b} ^{c} ^{d} "Stochastic". Oxford Dictionaries. Oxford University Press.
 ^ Robert J. Adler; Jonathan E. Taylor (29 January 2009). Random Fields and Geometry. Springer Science & Business Media. pp. 7–8. ISBN 9780387481166.
 ^ David Stirzaker (2005). Stochastic Processes and Models. Oxford University Press. p. 45. ISBN 9780198568148.
 ^ Loïc Chaumont; Marc Yor (19 July 2012). Exercises in Probability: A Guided Tour from Measure Theory to Random Processes, Via Conditioning. Cambridge University Press. p. 175. ISBN 9781107606555.
 ^ Murray Rosenblatt (1962). Random Processes. Oxford University Press. p. 91.
 ^ Olav Kallenberg (8 January 2002). Foundations of Modern Probability. Springer Science & Business Media. pp. 24 and 25. ISBN 9780387953137.
 ^ Paul C. Bressloff (22 August 2014). Stochastic Processes in Cell Biology. Springer. ISBN 9783319084886.
 ^ N.G. Van Kampen (30 August 2011). Stochastic Processes in Physics and Chemistry. Elsevier. ISBN 9780080475363.
 ^ Russell Lande; Steinar Engen; BerntErik Sæther (2003). Stochastic Population Dynamics in Ecology and Conservation. Oxford University Press. ISBN 9780198525257.
 ^ Carlo Laing; Gabriel J Lord (2010). Stochastic Methods in Neuroscience. OUP Oxford. ISBN 9780199235070.
 ^ Wolfgang Paul; Jörg Baschnagel (11 July 2013). Stochastic Processes: From Physics to Finance. Springer Science & Business Media. ISBN 9783319003276.
 ^ Edward R. Dougherty (1999). Random processes for image and signal processing. SPIE Optical Engineering Press. ISBN 9780819425133.
 ^ Thomas M. Cover; Joy A. Thomas (28 November 2012). Elements of Information Theory. John Wiley & Sons. p. 71. ISBN 9781118585771.
 ^ Michael Baron (15 September 2015). Probability and Statistics for Computer Scientists, Second Edition. CRC Press. p. 131. ISBN 9781498760607.
 ^ Jonathan Katz; Yehuda Lindell (20070831). Introduction to Modern Cryptography: Principles and Protocols. CRC Press. p. 26. ISBN 9781584885863.
 ^ François Baccelli; Bartlomiej Blaszczyszyn (2009). Stochastic Geometry and Wireless Networks. Now Publishers Inc. pp. 200–. ISBN 9781601982643.
 ^ J. Michael Steele (2001). Stochastic Calculus and Financial Applications. Springer Science & Business Media. ISBN 9780387950167.
 ^ Marek Musiela; Marek Rutkowski (21 January 2006). Martingale Methods in Financial Modelling. Springer Science & Business Media. ISBN 9783540266532.
 ^ Steven E. Shreve (3 June 2004). Stochastic Calculus for Finance II: ContinuousTime Models. Springer Science & Business Media. ISBN 9780387401010.
 ^ O. B. Sheĭnin (2006). Theory of probability and statistics as exemplified in short dictums. NG Verlag. p. 5. ISBN 9783938417409.
 ^ Oscar Sheynin; Heinrich Strecker (2011). Alexandr A. Chuprov: Life, Work, Correspondence. V&R unipress GmbH. p. 136. ISBN 9783899718126.
 ^ ^{a} ^{b} Doob, Joseph (1934). "Stochastic Processes and Statistics". Proceedings of the National Academy of Sciences of the United States of America. 20 (6): 376–379. doi:10.1073/pnas.20.6.376. PMC 1076423.
 ^ Khintchine, A. (1934). "Korrelationstheorie der stationeren stochastischen Prozesse". Mathematische Annalen. 109 (1): 604–615. doi:10.1007/BF01449156. ISSN 00255831.
 ^ Kolmogoroff, A. (1931). "Über die analytischen Methoden in der Wahrscheinlichkeitsrechnung". Mathematische Annalen. 104 (1): 1. doi:10.1007/BF01457949. ISSN 00255831.
 ^ VereJones, David (2006). "Khinchin, Aleksandr Yakovlevich": 4. doi:10.1002/0471667196.ess6027.pub2.
 ^ Snell, J. Laurie (2005). "Obituary: Joseph Leonard Doob". Journal of Applied Probability. 42 (1): 251. doi:10.1239/jap/1110381384. ISSN 00219002.
 ^ Bingham, N. (2000). "Studies in the history of probability and statistics XLVI. Measure into probability: from Lebesgue to Kolmogorov". Biometrika. 87 (1): 145–156. doi:10.1093/biomet/87.1.145. ISSN 00063444.
 ^ ^{a} ^{b} Cramer, Harald (1976). "Half a Century with Probability Theory: Some Personal Recollections". The Annals of Probability. 4 (4): 509–546. doi:10.1214/aop/1176996025. ISSN 00911798.
 ^ Applebaum, David (2004). "Lévy processes: From probability to finance and quantum groups". Notices of the AMS. 51 (11): 1336–1347.
 ^ Jochen Blath; Peter Imkeller; Sylvie Rœlly (2011). Surveys in Stochastic Processes. European Mathematical Society. pp. 5–. ISBN 9783037190722.
 ^ Michel Talagrand (12 February 2014). Upper and Lower Bounds for Stochastic Processes: Modern Methods and Classical Problems. Springer Science & Business Media. pp. 4–. ISBN 9783642540752.
 ^ Paul C. Bressloff (22 August 2014). Stochastic Processes in Cell Biology. Springer. pp. vii–ix. ISBN 9783319084886.
 ^ Douglas Hubbard "How to Measure Anything: Finding the Value of Intangibles in Business" p. 46, John Wiley & Sons, 2007
 ^ Hänggi, P. (2002). "Stochastic Resonance in Biology How Noise Can Enhance Detection of Weak Signals and Help Improve Biological Information Processing". ChemPhysChem. 3 (3): 285–90. doi:10.1002/14397641(20020315)3:3<285::AIDCPHC285>3.0.CO;2A. PMID 12503175.
 ^ Priplata, A.; et al. (2006). "NoiseEnhanced Balance Control in Patients with Diabetes and Patients with Stroke" (PDF). Ann Neurol. 59: 4–12. doi:10.1002/ana.20670. PMID 16287079.
 ^ Newmeyer, Frederick. 2001. "The Prague School and North American functionalist approaches to syntax" Journal of Linguistics 37, pp. 101–126. "Since most American functionalists adhere to this trend, I will refer to it and its practitioners with the initials 'USF'. Some of the more prominent USFs are Joan Bybee, William Croft, Talmy Givon, John Haiman, Paul Hopper, Marianne Mithun and Sandra Thompson. In its most extreme form (Hopper 1987, 1988), USF rejects the Saussurean dichotomies such as langue vs. parôle. For early interpretivist approaches to focus, see Chomsky (1971) and Jackendoff (1972). parole and synchrony vs. diachrony. All adherents of this tendency feel that the Chomskyan advocacy of a sharp distinction between competence and performance is at best unproductive and obscurantist; at worst theoretically unmotivated. "
 ^ Bybee, Joan. "Usagebased phonology." p. 213 in Darnel, Mike (ed). 1999. Functionalism and Formalism in Linguistics: General papers. John Benjamins Publishing Company
 ^ Chomsky (1959). Review of Skinner's Verbal Behavior, Language, 35: 26–58
 ^ Manning and Schütze, (1999) Foundations of Statistical Natural Language Processing, MIT Press. Cambridge, MA
 ^ Bybee (2007) Frequency of use and the organization of language. Oxford: Oxford University Press
 ^ Ilias Chrissochoidis, Stavros Houliaras, and Christos Mitsakis, "Set theory in Xenakis' EONTA", in International Symposium Iannis Xenakis, ed. Anastasia Georgaki and Makis Solomos (Athens: The National and Kapodistrian University, 2005), 241–249.
Further reading
 See the stochastic process of an 8foottall (2.4 m) Probability Machine comparing stock market returns to the randomness of the beans dropping through the quincunx pattern on YouTube. from Index Funds Advisors IFA.com
 Formalized Music: Thought and Mathematics in Composition by Iannis Xenakis, ISBN 1576470792
 Frequency and the Emergence of Linguistic Structure by Joan Bybee and Paul Hopper (eds.), ISBN 1588110281/ISBN 9027229481 (Eur.)
 The Stochastic Empirical Loading and Dilution Model provides documentation and computer code for modeling stochastic processes in Visual Basic for Applications.