Case Study: Social Network Analysis

This case study demonstrates graph algorithms on a social network, identifying influential users and bridges between communities.

Overview

We'll analyze a small social network with 10 people across three communities:

  • Tech Community: Alice, Bob, Charlie, Diana (densely connected)
  • Art Community: Eve, Frank, Grace (moderately connected)
  • Isolated Group: Henry, Iris, Jack (small triangle)

Two critical bridges connect these communities:

  • Diana ↔ Eve (Tech ↔ Art)
  • Grace ↔ Henry (Art ↔ Isolated)

Running the Example

cargo run --example graph_social_network

Expected output: Social network analysis with degree centrality, PageRank, and betweenness centrality rankings.

Network Construction

Building the Graph

use aprender::graph::Graph;

let edges = vec![
    // Tech community (densely connected)
    (0, 1), // Alice - Bob
    (1, 2), // Bob - Charlie
    (2, 3), // Charlie - Diana
    (0, 2), // Alice - Charlie (shortcut)
    (1, 3), // Bob - Diana (shortcut)

    // Art community (moderately connected)
    (4, 5), // Eve - Frank
    (5, 6), // Frank - Grace
    (4, 6), // Eve - Grace (shortcut)

    // Bridge between tech and art
    (3, 4), // Diana - Eve (BRIDGE)

    // Isolated group
    (7, 8), // Henry - Iris
    (8, 9), // Iris - Jack
    (7, 9), // Henry - Jack (triangle)

    // Bridge to isolated group
    (6, 7), // Grace - Henry (BRIDGE)
];

let graph = Graph::from_edges(&edges, false);

Network Properties

  • Nodes: 10 people
  • Edges: 13 friendships (undirected)
  • Average degree: 2.6 connections per person
  • Structure: Three communities with two bridge nodes

Analysis 1: Degree Centrality

Results

Top 5 Most Connected People:
  1. Charlie - 0.333 (normalized degree centrality)
  2. Diana - 0.333
  3. Eve - 0.333
  4. Bob - 0.333
  5. Henry - 0.333

Interpretation

Degree centrality measures direct friendships. Multiple people tie at 0.333, meaning they each have 3 friends out of 9 possible connections (3/9 = 0.333).

Key Insights:

  • Tech community members (Bob, Charlie, Diana) are well-connected within their group
  • Eve connects the Tech and Art communities (bridge role)
  • Henry connects the Art community to the Isolated group (another bridge)

Limitation: Degree centrality only counts direct friends, not the importance of those friends. For example, being friends with influential people doesn't increase your degree score.

Analysis 2: PageRank

Results

Top 5 Most Influential People:
  1. Henry - 0.1196 (PageRank score)
  2. Grace - 0.1141
  3. Eve - 0.1117
  4. Bob - 0.1097
  5. Charlie - 0.1097

Interpretation

PageRank considers both quantity and quality of connections. Henry ranks highest despite having the same degree as others because he's in a tightly connected triangle (Henry-Iris-Jack).

Key Insights:

  • Henry's triangle: The Isolated group (Henry, Iris, Jack) forms a complete subgraph where everyone knows everyone. This tight clustering boosts PageRank.
  • Grace and Eve: Bridge nodes gain influence from connecting different communities
  • Bob and Charlie: Well-connected within Tech community, but not bridges

Why Henry > Eve?

  • Henry: In a triangle (3 edges among 3 nodes = maximum density)
  • Eve: Connects two communities but not in a triangle
  • PageRank rewards tight clustering

Real-world analogy: Henry is like a local influencer in a close-knit community, while Eve is like a connector between distant groups.

Analysis 3: Betweenness Centrality

Results

Top 5 Bridge People:
  1. Eve - 24.50 (betweenness centrality)
  2. Diana - 22.50
  3. Grace - 22.50
  4. Henry - 18.50
  5. Bob - 8.00

Interpretation

Betweenness centrality measures how often a node lies on shortest paths between other nodes. High scores indicate critical bridges.

Key Insights:

  • Eve (24.50): Connects Tech (4 people) ↔ Art (3 people). Most paths between these communities pass through Eve.
  • Diana (22.50): The Tech side of the Tech-Art bridge. Paths from Alice/Bob/Charlie to Art community pass through Diana.
  • Grace (22.50): Connects Art ↔ Isolated group. Critical for reaching Henry/Iris/Jack.
  • Henry (18.50): The Isolated side of the Art-Isolated bridge.

Network fragmentation:

  • Removing Eve: Tech and Art communities disconnect
  • Removing Grace: Art and Isolated group disconnect
  • Removing both: Network splits into 3 disconnected components

Real-world impact:

  • Social networks: Eve and Grace are "connectors" who introduce people across groups
  • Organizations: These individuals are critical for cross-team communication
  • Supply chains: Removing these nodes disrupts flow

Comparing All Three Metrics

PersonDegreePageRankBetweennessRole
Eve0.3330.111724.50Critical bridge (Tech ↔ Art)
Diana0.3330.107622.50Bridge (Tech side)
Grace0.3330.114122.50Critical bridge (Art ↔ Isolated)
Henry0.3330.119618.50Triangle leader, bridge (Isolated side)
Bob0.3330.10978.00Well-connected (Tech)
Charlie0.3330.10976.00Well-connected (Tech)

Key Findings

  1. Most influential overall: Henry (highest PageRank due to triangle)
  2. Most critical bridges: Eve and Grace (highest betweenness)
  3. Well-connected locally: Bob and Charlie (high degree, low betweenness)

Actionable Insights

For team building:

  • Encourage Eve and Grace to mentor others (they connect communities)
  • Recognize Henry's leadership in the Isolated group
  • Bob and Charlie are strong within Tech but need cross-team exposure

For risk management:

  • Eve and Grace are single points of failure for communication
  • Add redundant connections (e.g., direct link between Tech and Isolated)
  • Cross-train people outside their primary communities

Performance Notes

CSR Representation Benefits

The graph uses Compressed Sparse Row (CSR) format:

  • Memory: 50-70% reduction vs HashMap
  • Cache misses: 3-5x fewer (sequential access)
  • Construction: O(n + m) time

For this 10-node, 13-edge graph, the difference is minimal. Benefits appear at scale:

  • 10K nodes, 50K edges: HashMap ~240 MB, CSR ~84 MB
  • 1M nodes, 5M edges: HashMap runs out of memory, CSR fits in 168 MB

PageRank Numerical Stability

Aprender uses Kahan compensated summation to prevent floating-point drift:

let mut sum = 0.0;
let mut c = 0.0;  // Compensation term

for value in values {
    let y = value - c;
    let t = sum + y;
    c = (t - sum) - y;  // Recover low-order bits
    sum = t;
}

Result: Σ PR(v) = 1.0 within 1e-10 precision.

Without Kahan summation:

  • 10 nodes: error ~1e-9 (acceptable)
  • 100K nodes: error ~1e-5 (problematic)
  • 1M nodes: error ~1e-4 (PageRank scores invalid)

Parallel Betweenness

Betweenness computation uses Rayon for parallelization:

let partial_scores: Vec<Vec<f64>> = (0..n_nodes)
    .into_par_iter()  // Parallel iterator
    .map(|source| brandes_bfs_from_source(source))
    .collect();

Speedup (Intel i7-8700K, 6 cores):

  • Serial: 450 ms (10K nodes)
  • Parallel: 95 ms (10K nodes)
  • 4.7x speedup

The outer loop is embarrassingly parallel (no synchronization needed).

Real-World Applications

Social Media Influencer Detection

Problem: Identify influencers in a Twitter network.

Approach:

  1. Build graph from follower relationships
  2. PageRank: Find overall influence (considers follower quality)
  3. Betweenness: Find connectors between communities (e.g., tech ↔ fashion)
  4. Degree: Find accounts with many followers (raw popularity)

Result: Target influential accounts for marketing campaigns.

Organizational Network Analysis

Problem: Improve cross-team communication in a company.

Approach:

  1. Build graph from email/Slack interactions
  2. Betweenness: Identify critical connectors
  3. PageRank: Find informal leaders (high influence)
  4. Degree: Find highly collaborative individuals

Result: Promote connectors, add redundancy, prevent information silos.

Supply Chain Resilience

Problem: Identify single points of failure in a logistics network.

Approach:

  1. Build graph from supplier-manufacturer relationships
  2. Betweenness: Find critical warehouses/suppliers
  3. Simulate removal (betweenness = 0 → fragmentation)
  4. Add redundancy to high-betweenness nodes

Result: More resilient supply chain, reduced disruption risk.

Toyota Way Principles in Action

Muda (Waste Elimination)

CSR representation eliminates HashMap pointer overhead:

  • 50-70% memory reduction
  • 3-5x fewer cache misses
  • No performance cost (same Big-O complexity)

Poka-Yoke (Error Prevention)

Kahan summation prevents numerical drift in PageRank:

  • Naive summation: O(n·ε) error accumulation
  • Kahan: maintains Σ PR(v) = 1.0 within 1e-10

Result: Correct PageRank scores even on large graphs (1M+ nodes).

Heijunka (Load Balancing)

Rayon work-stealing balances BFS tasks across cores:

  • Nodes with more edges take longer
  • Work-stealing prevents idle threads
  • Near-linear speedup on multi-core CPUs

Exercises

  1. Add a new edge: Connect Alice (0) to Eve (4). How does this change:

    • Diana's betweenness? (should decrease)
    • Alice's betweenness? (should increase)
    • PageRank distribution?
  2. Remove a bridge: Delete the Diana-Eve edge (3, 4). What happens to:

    • Betweenness scores? (Diana/Eve should drop)
    • Graph connectivity? (Tech and Art communities disconnect)
  3. Compare directed vs undirected: Change is_directed to true. How does PageRank change?

    • Directed: influence flows one way
    • Undirected: bidirectional influence
  4. Larger network: Generate a random graph with 100 nodes, 500 edges. Measure:

    • Construction time
    • PageRank convergence iterations
    • Betweenness speedup (serial vs parallel)

Further Reading

  • Graph Algorithms: Newman, M. (2018). "Networks" (comprehensive textbook)
  • PageRank: Page, L., et al. (1999). "The PageRank Citation Ranking"
  • Betweenness: Brandes, U. (2001). "A Faster Algorithm for Betweenness Centrality"
  • Social Network Analysis: Wasserman, S., Faust, K. (1994). "Social Network Analysis"

Summary

  • Degree centrality: Local popularity (direct friends)
  • PageRank: Global influence (considers friend quality)
  • Betweenness: Bridge role (connects communities)
  • Key insight: Different metrics reveal different roles in the network
  • Performance: CSR format, Kahan summation, parallel Brandes enable scalable analysis
  • Applications: Social media, organizations, supply chains

Run the example yourself:

cargo run --example graph_social_network