graph-ml/ Pathfinding & Similarity
Last Updated: October 20, 2018

Navigating connections and comparing nodes based on topology.

Pathfinding & Similarity

These algorithms focus on the relationships between specific nodes, either by finding the best path between them or by calculating how "similar" they are.

1. Pathfinding Algorithms

  • Shortest Path (SSSP): Finds the quickest route from a source node to all other nodes.
    • Unweighted: Counts the number of hops.
    • Weighted: Considers costs like distance, time, or price (Dijkstra).
  • Minimum Spanning Tree (MST): Finds the subset of edges that connects all nodes with the minimum total weight.
    • Use Case: Designing cost-effective electrical grids or telecommunications networks.
  • A* Search: An optimized version of SSSP that uses heuristics to reach a specific goal faster.

2. Similarity Algorithms

Similarity algorithms assign a score to a pair of nodes based on how much their neighborhoods overlap.

  • Cosine Similarity: Measures the cosine of the angle between two node attribute vectors.
  • Jaccard Similarity: Compares the number of shared neighbors vs. the total number of unique neighbors.
    • Use Case: "People also bought" recommendations or duplicate account detection.

Topological Link Prediction determines the likelihood of a future relationship between two nodes based on the existing graph structure.

  • Common Neighbors: The more friends two people share, the more likely they are to become friends.
  • Adamic-Adar: Similar to common neighbors but gives higher weight to shared neighbors that have few connections (rare connections are more meaningful).

4. K-Nearest Neighbors (KNN)

KNN classifies a node based on the properties of its k most similar neighbors.

  • Use Case: Predicting a customer's churn risk based on the behavior of their most "similar" peers.

[!IMPORTANT] SSSP Complexity: Calculating the shortest path from every node to every other node is computationally expensive ($O(V^3)$). For large graphs, focus on Single-Source ($O(V+E)$) or localized searches.