
1 Analyzing the Wikipedia voters network [27 points]

import snapG = snap.LoadEdgeList(snap.TNGraph, "Wiki-Vote.txt", 0, 1)
snap.PrintInfo(G, "Wiki-Vote", "result.txt", False)


Wiki-Vote: DirectedNodes:                    7115Edges:                    103689Zero Deg Nodes:           0Zero InDeg Nodes:         4734Zero OutDeg Nodes:        1005NonZero In-Out Deg Nodes: 1376Unique directed edges:    103689Unique undirected edges:  100762Self Edges:               0BiDir Edges:              5854Closed triangles:         608389Open triangles:           12720413Frac. of closed triads:   0.045645Connected component size: 0.993113Strong conn. comp. size:  0.182713Approx. full diameter:    690% effective diameter:  3.791225

1. The number of nodes in the network.


2. The number of nodes with a self-edge (self-loop).


3. The number of directed edges in the network.


4. The number of undirected edges in the network.


5. The number of reciprocated edges in the network.


6. The number of nodes of zero out-degree.


7. The number of nodes of zero in-degree.


k1 = 0
k2 = 0
for NI in G.Nodes():if NI.GetOutDeg() > 10:k1 += 1if NI.GetInDeg() < 10:k2 += 1
print(k1, k2)

8. The number of nodes with more than 10 outgoing edges (out-degree > 10).


9. The number of nodes with fewer than 10 incoming edges (in-degree < 10).


2 Further Analyzing the Wikipedia voters network [33 points]

1. (18 points) Plot the distribution of out-degrees of nodes in the network on a log-log scale. Each data point is a pair (x, y) where x is a positive integer and y is the number of nodes in the network with out-degree equal to x. Restrict the range of x between the minimum and maximum out-degrees. You may filter out data points with a 0 entry. For the log-log scale, use base 10 for both x and y axes.

snap.PlotOutDegDistr(G, "Wiki-Vote", "Wiki-Vote Out Degree")

2. (15 points) Compute and plot the least-square regression line for the out-degree distribution in the log-log scale plot. Note we want to find coefficients a and b such that the function log10 y = a · log10 x + b, equivalently, y = 10b · x a , best fits the out-degree distribution. What are the coefficients a and b? For this part, you might want to use the method called polyfit in NumPy with deg parameter equal to 1.

import math
import numpy as np
maxOutDeg = 0
for NI in G.Nodes():if NI.GetOutDeg() > maxOutDeg:maxOutDeg = NI.GetOutDeg()log10x = []
y = []
for deg in range(1, maxOutDeg):pointNo = G.CntOutDegNodes(deg)if pointNo != 0:log10x.append(math.log10(int(deg)))y.append(pointNo)print(np.polyfit(log10x, y, deg=1))


[-164.99965984  355.24262157]

3 Finding Experts on the Java Programming Language on StackOveflow [40 points]

1. The number of weakly connected components in the network.

G = snap.LoadEdgeList(snap.TNGraph, "stackoverflow-Java.txt", 0, 1)Components = G.GetWccs()



2. The number of edges and the number of nodes in the largest weakly connected component.

MxWcc = G.GetMxWcc()
snap.PrintInfo(MxWcc, "MxWcc", "result-MxWcc.txt", False)


MxWcc: DirectedNodes:                    131188Edges:                    322486Zero Deg Nodes:           0Zero InDeg Nodes:         78365Zero OutDeg Nodes:        26008NonZero In-Out Deg Nodes: 26815Unique directed edges:    322486Unique undirected edges:  322371Self Edges:               15035BiDir Edges:              15265Closed triangles:         41388Open triangles:           51596519Frac. of closed triads:   0.000802Connected component size: 1.000000Strong conn. comp. size:  0.032953Approx. full diameter:    1290% effective diameter:  5.527031


322486 131188

3. IDs of the top 3 most central nodes in the network by PagePank scores.

PRankH = G.GetPageRank()
top1, top2, top3 = [0, 0], [0, 0], [0, 0]
for item in PRankH:if PRankH[item] > top1[1]:top3 = top2top2 = top1top1 = [item, PRankH[item]]elif PRankH[item] > top2[1]:top3 = top2top2 = [item, PRankH[item]]elif PRankH[item] > top3[1]:top3 = [item, PRankH[item]]print(top1[0], top2[0], top3[0])


992484 135152 22656

4. IDs of the top 3 hubs and top 3 authorities in the network by HITS scores.

NIdHubH, NIdAuthH = G.GetHits()
top1, top2, top3 = [0, 0], [0, 0], [0, 0]
for item in NIdHubH:if PRankH[item] > top1[1]:top3 = top2top2 = top1top1 = [item, PRankH[item]]elif PRankH[item] > top2[1]:top3 = top2top2 = [item, PRankH[item]]elif PRankH[item] > top3[1]:top3 = [item, PRankH[item]]
print(top1[0], top2[0], top3[0])top1, top2, top3 = [0, 0], [0, 0], [0, 0]
for item in NIdAuthH:if PRankH[item] > top1[1]:top3 = top2top2 = top1top1 = [item, PRankH[item]]elif PRankH[item] > top2[1]:top3 = top2top2 = [item, PRankH[item]]elif PRankH[item] > top3[1]:top3 = [item, PRankH[item]]
print(top1[0], top2[0], top3[0])


992484 135152 22656
992484 135152 22656

