Experiment 2 – Consensus Algorithms

Purpose

To apply consensus algorithms on duplicate records identified in Experiment 1, aiming to establish canonical supplier identities through normalization, similarity scoring, and consensus determination.

Input Data

Duplicates carried over from Experiment 1 (CSV data loading and duplicate detection).

Algorithms Applied

  • Normalization: Convert to lowercase, remove punctuation and stop words, expand abbreviations.
  • Similarity comparison: Levenshtein distance, Jaro-Winkler similarity (threshold >80%).
  • Consensus determination: Frequency-Based Voting, Hierarchical clustering, synthesis for information completeness.

Run Consensus Analysis