Erdős–Rényi model

Features
Clique Component Cut Cycle Data structure Edge Loop Neighborhood Path Vertex Adjacency list / matrix Incidence list / matrix
Types
Bipartite Complete Directed Hyper Labeled Multi Random Weighted

Topology
Random graph Erdős–Rényi Barabási–Albert Bianconi–Barabási Fitness model Watts–Strogatz Exponential random (ERGM) Random geometric (RGG) Hyperbolic (HGN) Hierarchical Stochastic block Blockmodeling Maximum entropy Soft configuration LFR Benchmark
Dynamics
Boolean network agent based Epidemic/SIR

In the mathematical field of graph theory, the Erdős–Rényi model refers to one of two closely related models for generating random graphs or the evolution of a random network. These models are named after Hungarian mathematicians Paul Erdős and Alfréd Rényi, who introduced one of the models in 1959.^[1]^[2] Edgar Gilbert introduced the other model contemporaneously with and independently of Erdős and Rényi.^[3] In the model of Erdős and Rényi, all graphs on a fixed vertex set with a fixed number of edges are equally likely. In the model introduced by Gilbert, also called the Erdős–Rényi–Gilbert model,^[4] each edge has a fixed probability of being present or absent, independently of the other edges. These models can be used in the probabilistic method to prove the existence of graphs satisfying various properties, or to provide a rigorous definition of what it means for a property to hold for almost all graphs.

YouTube Encyclopedic

1/5
Views:
13 353
436
393 840
8 359
10 897

Transcription

Hello. Welcome to week two of Social Network Analysis. This week we'll be focusing on modeling social networks. Now why would you want to model social networks? Just last week we showed how you can learn many interesting things just by analyzing the network structure of real world networks. What good is a model? Well a model is going to be a simple representation of that real world complex network and it's going to allow you to test certain assumptions because in order to formulate your model you have implement those assumptions and see whether the network that is generated from your model in any way resembles the real world network that you're seeing. So for example if you were looking at a social network and you discovered that some people had many more friends than others. You could say oh, well I think that. What's leading to this is just a random fluctuation. So people meet and form edges at random and some people are going to have more edges and some people are going to have fewer edges and that's my model. It's going to be a completely random graph model. Well, what happens next is that you can test this. You can generate networks according to your model and see if indeed you get some individuals who have many more friends than others. If this doesn't happen you know if your model can't reproduce the, the distribution of degrees that you see in the real network, well, that means that your model needs to be adjusted. That the way you're thinking about the world and the way that friendship formation functions is actually not, not what you thought. Your model needs to be extended. These kinds of models also often just serve as a straw man. Right? So you know that the model, it's not, it's not realistic, right? People don't just meet at random but you'd like to know exactly how your network is different from such a random network, and that gives you further insight about the measurements that you're taking on your real world network. Because, if your real world network is looking very similar to the, to say, a random graph model, you can see, there's not much interest in going on, here. But if it looks very different, if the number Numbers you get from the model versus the numbers you get in your real network are very divergent then you can say uh-huh, this an interesting feature of, of my real world network, and that is what I will pursue further. This very basic network model was first formulated by Erdos and Renyi. They're two Hungarian mathematicians, Erdos was an eccentric. He was very, very productive. He traveled from conference to conference, from one collaborator's door to the next, saying, my mind is open, let's write some papers. And after he would finish the papers he would move on to the next place. There's actually a network, of coauthors that it's prestigious to be close to Erdos and so if you have an Erdos number of one, it means you are one of his many, many direct collaborators. If you have an Erdos number of two, it means you are a collaborator of a collaborator. So people like to calculate their Erdos number. Mine is four by the way, which isn't impressive at all. He collaborated a lot with Renyi, who, is actually credited with the quote, mathematicians are just machines that, who turn coffee into theorems. He and Erdos were very, productive, and among their 32 publications are several well known ones on the Erdos and Renyi random graph model, where they could exactly derive a lot of results about precisely this model of a network where you connect nodes at random. There basically very few assumptions here besides the size of the network, the number of nodes. You just have the probability that any two nodes are connected. This is the probability P. An alternative formulation is that you determine the total number of edges, M. And, the mathematical treatment is a little bit more difficult in the second case. And in any case, the two formulations become very similar once the network is large. So we'll focus on the first one, where you're just fixing the probability that any two nodes are connected. What do these networks look like? Here, I've laid out the nodes on a circle. Just, I think it's twenty nodes. And I've added some edges at random. As we learned last week, sometimes a layout is more effective if nodes that are directly connected are close to each other in the layout, so I applied a spring layout algorithm, and got this very typical shape of, of an Erdos-Renyi random graph. The first thing that we can derive for the, Erdos-Renyi random graph is its degree distribution. This tells us how many nodes in the network have no neighbors? How many of them have one neighbor? How many of them have two neighbors, etcetera? It's going to get a little bit mathematical. But if you've ever had basic probability and statistics, you recognize the binomial distribution. This is all it is. And, if you've never had, this level of math, don't worry. It's not, what I'm going to say, you can also build just through intuition and through playing with the demo. So it's not that you have to derive the binomial distribution from scratch or anything like that. I'm bringing it in more just to show, hey, if you're a mathematician, there's a whole lot of interesting things and properties you can derive about networks. And actually I, I'm just going to be scratching the, the surface of that. So that, that is the, the goal. To illustrate how you can intuit some properties when you have a nice model of the network. So, again our model is we have N nodes and, with probability P, we add an edge between every pair of nodes. So we're, we're kind of flipping a coin. Right? So, one node. Out of the N, it's going to look at the N minus one other nodes. And for each one it's going to flip a coin to see whether they're going to share an edge or not. Now, to test your understanding, I'd like you to imagine a network. And you're keeping the probability that any two nodes are connected. The value P, you're holding constant. And you're increasing the size of the network, the number of nodes. I want you to think about what happens to the average degree. How many other nodes on average, is a node connected to? To help you with this, I have created a net logo model. So let me just try and get into that model. And, what you're going to do is you're going to make sure that this probability or not is sent to probability. You're going to have, you're not going to use this number of neighbors. You're going to fix the probability of linking. Here, I'm just made it two percent in order to start with 100 nodes, so I'm just going to click this Erdos-Renyi button and maybe I'll let it lay out, just to see what it looks like. So let me try that a few more times and here I see the average degree. Now I'll increase this to maybe 300 and generate another Erdos-Renyi random graph. I'm also going to look at the average degree. I might do that a few more times and. I might increase to almost 1000 nodes. Again, generating Erdos-Renyi Random Graph. Has to work a little bit harder for a larger graph. And then I'm going to, again, look at the average degree. So see if you can understand what's going on in, in this case. Now that we've seen what these random networks characteristically look like, we can return to the problem of deriving the degree distribution. The degree distribution is just a probability distribution that's going to say, the probability that a node has degree zero. Meaning no neighbors is a certain quantity. The probability that it has a one is this. The probability that it has two neighbors, it's something else, etcetera. So this is our, probability distribution. And the probabilities, of course, have to sum to one. Back to how this might happen if we are doing a little thought experiment is that, the each node is looking at the n-1 other nodes and its flipping a biased coin, which was probability P is going to say, link to that particular node and with probability 1-P is going to say uh-uh, not that one. And the distribution that describes this process exactly, the coin flipping, is the binomial distribution. So what we're interested in is out of n-1 tries, what is the probability that k of them are successful. That is, what is the probability that the node will have degree k and first we need to know the number of different ways that we can choose k nodes out of n minus one. This is the binomial coefficient times the probability that you succeeded k times, which is p to the k times the probability that you failed. N minus one minus k times, which is one minus p to the n minus one minus k. So lets do a little bit of comprehension here. So, what is the maximum degree that a node can have in a graph that has n nodes? And here we're assuming that the graph is simple, meaning that there, you can't have multiple edges between two nodes. It's like two people can only have one friendship. They can't have two or three or four friendships between them. So, and this is the kind of graph that Erdos and Renyi model random graph. So what is the maximum degree in this case? Let's delve a little bit more into the binomial distribution. For a lot of you, this will be a review of basic probability, but let's do it nonetheless. So imagine that you have an eight node graph with a probability P of any two nodes being connected. What we'd like to know is what is the probability that any given node is connected to four others? And we're going to color those blue, and the three nodes that it's not connected to we're going to color white. First we need the binomial coefficient, which is going to say what is the number of ways you can choose four out of seven? Well, you can imagine first that you can uniquely identify all the nodes. So that it matters whether, you know, A goes first, B goes second, or B goes first, A goes second. Right? And there are. Seven factorial different ways of making such an ordering because for the first spot you have seven traces, for the second spot you have the remaining six, third spot you have remaining five etcetera. So this is seven factorial, if you could distinguish all of the nodes separately, but what we are saying is that, we don't. We only have, we're only distinguishing at the level of whether a trial was successful or not. So whether a node is white or blue. Which means that the three white nodes are just completely swappable. So our seven factorial is now divided by a three factorial in the denominator which is the number of different arrangements that we now don't distinguish between. Similarly for the nodes that we did connect to, there are four factorial different ways of ordering them and now we don't care about their particular ordering. So we have another factor of four factorial in the denominator. So in general, this binomial coefficient of choosing, for example here, k items out of n-1, is going to be n-1 factorial in the numerator, k factorial in the denominator, N minus one minus K factorial in the denominator as well. And there should be. Parenthesis around the n minus one. Sorry about that. So let's test our understanding. What is the number of ways of choosing two items out of five? Now let's get back to the distribution. Imagine that we have a certain way of choosing the k out of n-1 so here it's four out of seven. For example, if it was white, white, blue, blue, blue, white, blue, what we would have for the probability of that particular event is one minus p, times one minus p, times p, times p, times p, time one, times one minus p, times p, right? So p to the fourth, times one minus p to the third. And once we don't care about the particular order of white and blue, we add the binomial coefficient back in. So, in the end, the binomial distribution tells us that the probability of this particular node having degree four in our eight node probability p graph is 72 score P to the fourth, one - p to the third, to the third. And, that's it. That's how it works. If P is equals to .5, you get, this very symmetric looking distribution, where it's most likely that the node will have degree three or four. But also, it could end up with low probability, with no neighbors at all, or connected to all seven of the other nodes. If, however, the probability is only a tenth, then you have a distribution that's much more skewed. It's much more likely that a node will be connected to no others. And very unlikely that it would have anything past, degree four. Once you have this distribution, the well known property such as the mean, the mean is going to give us the average degree, from the binomial distribution we can tell that this average degree Z is going to be equal to n - one times P. And in general if you had an arbitrary degree distribution, the way that you could calculate the mean, if you just take the expected value which is the sum over all the possible outcomes here, we're interested in the degree X times the probability of X occurring. To test our understanding, let's see what the average degree would be of a graph with ten nodes, where the probability of any two nodes linking is a third. You can similarly complete the variance for the binomial distribution to variance, which is just the standard deviation squared, would be n minus one times p times one minus p. And it's computed in a very similar fashion so we, we won't have to do this in any assignments or anything like that, but in general you have the expected. A value of X minus the mean squared, which is the definition of variance, is just this quantity summed over all possible values of X times the probability of each particular value that X can take. The binomial distribution gets quite impractical to compute, because the binomial coefficient having those factorials in there just can't be computed for very large numbers. So you typically use approximations. So, in the limit of p being small, you can use the Poisson distribution. And then the limit of, the graph being large. So n is large, you can use the normal distribution. And the way that, that the poisson and normal are going to look, for, for large graphs is that it's going to be a nicely symmetric distribution around the, the mean value k. Of course, if the probability is very, very low, then the poisson distribution is actually is, is not symmetric. Right? It might be, very much hitting up against degree zero and then going from there. But the main point to take away is that we're not going to be seeing many nodes that have say, a degree that's many times higher than the average degree. That's something you simply don't see, and this is because this probability drops off exponentially here as you get higher and higher in degree. So the basic insight that we get out of this is that there are no hubs. Hubs in the sense of some nodes that have extremely high connectivity, extremely high degree. You just don't see this.

Definition

There are two closely related variants of the Erdős–Rényi random graph model.

In the $G(n,M)$ model, a graph is chosen uniformly at random from the collection of all graphs which have $n$ nodes and $M$ edges. The nodes are considered to be labeled, meaning that graphs obtained from each other by permuting the vertices are considered to be distinct. For example, in the $G(3,2)$ model, there are three two-edge graphs on three labeled vertices (one for each choice of the middle vertex in a two-edge path), and each of these three graphs is included with probability ${\tfrac {1}{3}}$ .
In the $G(n,p)$ model, a graph is constructed by connecting labeled nodes randomly. Each edge is included in the graph with probability $p$ , independently from every other edge. Equivalently, the probability for generating each graph that has $n$ nodes and $M$ edges is $p^{M}(1-p)^{{n \choose 2}-M}.$ The parameter $p$ in this model can be thought of as a weighting function; as $p$ increases from $0$ to $1$ , the model becomes more and more likely to include graphs with more edges and less and less likely to include graphs with fewer edges. In particular, the case $p={\tfrac {1}{2}}$ corresponds to the case where all $2^{\binom {n}{2}}$ graphs on $n$ vertices are chosen with equal probability.

The behavior of random graphs are often studied in the case where $n$ , the number of vertices, tends to infinity. Although $p$ and $M$ can be fixed in this case, they can also be functions depending on $n$ . For example, the statement that almost every graph in $G(n,2\ln(n)/n)$ is connected means that, as $n$ tends to infinity, the probability that a graph on $n$ vertices with edge probability $2\ln(n)/n$ is connected tends to $1$ .

Comparison between the two models

The expected number of edges in G(n, p) is ${\tbinom {n}{2}}p$ , and by the law of large numbers any graph in G(n, p) will almost surely have approximately this many edges (provided the expected number of edges tends to infinity). Therefore, a rough heuristic is that if pn² → ∞, then G(n,p) should behave similarly to G(n, M) with $M={\tbinom {n}{2}}p$ as n increases.

For many graph properties, this is the case. If P is any graph property which is monotone with respect to the subgraph ordering (meaning that if A is a subgraph of B and B satisfies P, then A will satisfy P as well), then the statements "P holds for almost all graphs in G(n, p)" and "P holds for almost all graphs in $G(n,{\tbinom {n}{2}}p)$ " are equivalent (provided pn² → ∞). For example, this holds if P is the property of being connected, or if P is the property of containing a Hamiltonian cycle. However, this will not necessarily hold for non-monotone properties (e.g. the property of having an even number of edges).

In practice, the G(n, p) model is the one more commonly used today, in part due to the ease of analysis allowed by the independence of the edges.

Properties of G(n, p)

With the notation above, a graph in G(n, p) has on average ${\tbinom {n}{2}}p$ edges. The distribution of the degree of any particular vertex is binomial:^[5]

P(\deg(v)=k)={n-1 \choose k}p^{k}(1-p)^{n-1-k},

where n is the total number of vertices in the graph. Since

P(\deg(v)=k)\to {\frac {(np)^{k}\mathrm {e} ^{-np}}{k!}}\quad {\text{ as }}n\to \infty {\text{ and }}np={\text{constant}},

this distribution is Poisson for large n and np = const.

In a 1960 paper, Erdős and Rényi^[6] described the behavior of G(n, p) very precisely for various values of p. Their results included that:

If np < 1, then a graph in G(n, p) will almost surely have no connected components of size larger than O(log(n)).
If np = 1, then a graph in G(n, p) will almost surely have a largest component whose size is of order n^2/3.
If np → c > 1, where c is a constant, then a graph in G(n, p) will almost surely have a unique giant component containing a positive fraction of the vertices. No other component will contain more than O(log(n)) vertices.
If $p<{\tfrac {(1-\varepsilon )\ln n}{n}}$ , then a graph in G(n, p) will almost surely contain isolated vertices, and thus be disconnected.
If $p>{\tfrac {(1+\varepsilon )\ln n}{n}}$ , then a graph in G(n, p) will almost surely be connected.

Thus ${\tfrac {\ln n}{n}}$ is a sharp threshold for the connectedness of G(n, p).

Further properties of the graph can be described almost precisely as n tends to infinity. For example, there is a k(n) (approximately equal to 2log₂(n)) such that the largest clique in G(n, 0.5) has almost surely either size k(n) or k(n) + 1.^[7]

Thus, even though finding the size of the largest clique in a graph is NP-complete, the size of the largest clique in a "typical" graph (according to this model) is very well understood.

Edge-dual graphs of Erdos-Renyi graphs are graphs with nearly the same degree distribution, but with degree correlations and a significantly higher clustering coefficient.^[8]

Relation to percolation

In percolation theory one examines a finite or infinite graph and removes edges (or links) randomly. Thus the Erdős–Rényi process is in fact unweighted link percolation on the complete graph. (One refers to percolation in which nodes and/or links are removed with heterogeneous weights as weighted percolation). As percolation theory has much of its roots in physics, much of the research done was on the lattices in Euclidean spaces. The transition at np = 1 from giant component to small component has analogs for these graphs, but for lattices the transition point is difficult to determine. Physicists often refer to study of the complete graph as a mean field theory. Thus the Erdős–Rényi process is the mean-field case of percolation.

Some significant work was also done on percolation on random graphs. From a physicist's point of view this would still be a mean-field model, so the justification of the research is often formulated in terms of the robustness of the graph, viewed as a communication network. Given a random graph of n ≫ 1 nodes with an average degree $\langle k\rangle$ . Remove randomly a fraction $1-p'$ of nodes and leave only a fraction $p'$ from the network. There exists a critical percolation threshold $p'_{c}={\tfrac {1}{\langle k\rangle }}$ below which the network becomes fragmented while above $p'_{c}$ a giant connected component of order n exists. The relative size of the giant component, P_∞, is given by^[6]^[1]^[2]^[9]

P_{\infty }=p'[1-\exp(-\langle k\rangle P_{\infty })].\,

Caveats

Both of the two major assumptions of the G(n, p) model (that edges are independent and that each edge is equally likely) may be inappropriate for modeling certain real-life phenomena. Erdős–Rényi graphs have low clustering, unlike many social networks.^[10] Some modeling alternatives include Barabási–Albert model and Watts and Strogatz model. These alternative models are not percolation processes, but instead represent a growth and rewiring model, respectively. Another alternative family of random graph models, capable of reproducing many real-life phenomena, are exponential random graph models.

History

The G(n, p) model was first introduced by Edgar Gilbert in a 1959 paper studying the connectivity threshold mentioned above.^[3] The G(n, M) model was introduced by Erdős and Rényi in their 1959 paper. As with Gilbert, their first investigations were as to the connectivity of G(n, M), with the more detailed analysis following in 1960.

Continuum limit representation of critical G(n, p)

A continuum limit of the graph was obtained when $p$ is of order $1/n$ .^[11] Specifically, consider the sequence of graphs $G_{n}:=G(n,1/n+\lambda n^{-{\frac {4}{3}}})$ for $\lambda \in \mathbb {R}$ . The limit object can be constructed as follows:

First, generate a diffusion $W^{\lambda }(t):=W(t)+\lambda t-{\frac {t^{2}}{2}}$ where $W$ is a standard Brownian motion.
From this process, we define the reflected process $R^{\lambda }(t):=W^{\lambda }(t)-\inf \limits _{s\in [0,t]}W^{\lambda }(s)$ . This process can be seen as containing many successive excursion (not quite a Brownian excursion, see ^[12]). Because the drift of $W^{\lambda }$ is dominated by $-{\frac {t^{2}}{2}}$ , these excursions become shorter and shorter as $t\to +\infty$ . In particular, they can be sorted in order of decreasing lengths: we can partition $\mathbb {R}$ into intervals $(C_{i})_{i\in \mathbb {N} }$ of decreasing lengths such that $R^{\lambda }$ restricted to $C_{i}$ is a Brownian excursion for any $i\in \mathbb {N}$ .
Now, consider an excursion $(e(s))_{s\in [0,1]}$ $(e(s))_{s\in [0,1]}$ . Construct a random graph as follows:

A Brownian excursion $(e(t))_{t\in [0,1]}$ . Here, the process $\Xi$ has a single point, marked with a red dot. The red line corresponds to a single internal node of the associated tree $T_{e}$ , the green line corresponds to a leaf of $T_{e}$ . If one adds an edge between the two nodes, one obtains a graph with a single cycle.
- Construct a real tree $T_{e}$ (see Brownian tree).
- Consider a Poisson point process $\Xi$ on $[0,1]\times \mathbb {R} _{+}$ with unit intensity. To each point $(x,s)\in \Xi$ such that $x\leq e(s)$ , corresponds an underlying internal node and a leaf of the tree $T_{e}$ . Identifying the two vertices, the tree $T_{e}$ becomes a graph $\Gamma _{e}$

Applying this procedure, one obtains a sequence of random infinite graphs of decreasing sizes: $(\Gamma _{i})_{i\in \mathbb {N} }$ . The theorem^[11] states that this graph corresponds in a certain sense to the limit object of $G_{n}$ as $n\to +\infty$ .

References

^ ^a ^b Erdős, P.; Rényi, A. (1959). "On Random Graphs. I" (PDF). Publicationes Mathematicae. 6 (3–4): 290–297. doi:10.5486/PMD.1959.6.3-4.12. S2CID 253789267. Archived (PDF) from the original on 2020-08-07. Retrieved 2011-02-23.
^ ^a ^b Bollobás, B. (2001). Random Graphs (2nd ed.). Cambridge University Press. ISBN 0-521-79722-5.
^ ^a ^b Gilbert, E.N. (1959). "Random Graphs". Annals of Mathematical Statistics. 30 (4): 1141–1144. doi:10.1214/aoms/1177706098.
^ Fienberg, Stephen E. (2012). "A brief history of statistical models for network analysis and open challenges". Journal of Computational and Graphical Statistics. 21 (4): 825–839. doi:10.1080/10618600.2012.738106. MR 3005799. S2CID 52232135.
^ Newman, Mark. E. J.; Strogatz, S. H.; Watts, D. J. (2001). "Random graphs with arbitrary degree distributions and their applications". Physical Review E. 64 (2): 026118. arXiv:cond-mat/0007235. Bibcode:2001PhRvE..64b6118N. doi:10.1103/PhysRevE.64.026118. PMID 11497662. S2CID 360112., Eq. (1)
^ ^a ^b Erdős, P.; Rényi, A. (1960). "On the evolution of random graphs" (PDF). Magyar Tudományos Akadémia Matematikai Kutató Intézetének Kőzleményei [Publications of the Mathematical Institute of the Hungarian Academy of Sciences]. 5: 17–61. Archived (PDF) from the original on 2021-02-01. Retrieved 2011-11-18. The probability p used here refers there to $N(n)={\tbinom {n}{2}}p$
^ Matula, David W. (February 1972). "The employee party problem". Notices of the American Mathematical Society. 19: A-382.
^ Ramezanpour, A.; Karimipour, V.; Mashaghi, A. (April 2003). "Generating correlated networks from uncorrelated ones". Physical Review E. 67 (4): 046107. arXiv:cond-mat/0212469. Bibcode:2003PhRvE..67d6107R. doi:10.1103/PhysRevE.67.046107. PMID 12786436. S2CID 33054818.
^ Bollobás, B.; Erdős, P. (1976). "Cliques in Random Graphs". Mathematical Proceedings of the Cambridge Philosophical Society. 80 (3): 419–427. Bibcode:1976MPCPS..80..419B. doi:10.1017/S0305004100053056. S2CID 16619643.
^ Saberi, Abbas Ali (March 2015). "Recent advances in percolation theory and its applications". Physics Reports. 578: 12. arXiv:1504.02898. Bibcode:2015PhR...578....1S. doi:10.1016/j.physrep.2015.03.003. S2CID 119209128. Retrieved 30 January 2022.
^ ^a ^b Addario-Berry, L.; Broutin, N.; Goldschmidt, C. (2012-04-01). "The continuum limit of critical random graphs". Probability Theory and Related Fields. 152 (3): 367–406. doi:10.1007/s00440-010-0325-4. ISSN 1432-2064. S2CID 253980763.
^ Aldous, David (1997-04-01). "Brownian excursions, critical random graphs and the multiplicative coalescent". The Annals of Probability. 25 (2). doi:10.1214/aop/1024404421. ISSN 0091-1798. S2CID 16578106.

Literature

West, Douglas B. (2001). Introduction to Graph Theory (2nd ed.). Prentice Hall. ISBN 0-13-014400-2.
Newman, M. E. J. (2010). Networks: An Introduction. Oxford.

External links

Video: Erdos-Renyi Random Graph

This page was last edited on 26 March 2024, at 03:30

From Wikipedia, the free encyclopedia