Project Summary – Consensus Supplier Data Validation

1. Project Overview

Project Title: Consensus Method for Supplier Data Validation in the Philippines

Core R&D Activity: Consensus-based validation of TIN, supplier name, and address from receipts.

Business Context: Unlike markets with public company registries, the Philippines lacks a centralized database. Supplier details on receipts are inconsistent (misspellings, missing fields, conflicting addresses). This creates accounting and compliance challenges that require advanced consensus techniques to establish reliable supplier records.

2. Hypotheses & Technical Uncertainty

  • Can consensus validation resolve supplier identities more effectively than single-source verification?
  • Will TIN normalization improve grouping accuracy of supplier records?
  • Can clustering algorithms handle abbreviations and spelling variations in supplier names?
  • Will completeness scoring reconstruct missing or partial addresses reliably?
  • Can queue management and scaling methods support real-time processing at high transaction volumes?

3. Experimental Activities

Experiment 1 – TIN Normalization

Corrected ~95% of entries using standardization and checksum validation.

Experiment 2 – Name Resolution

Combined fuzzy matching + domain-specific dictionaries to improve clustering of supplier names.

Experiment 3 – Address Completeness

Applied scoring logic to reconstruct full addresses from partial or inconsistent entries.

Experiment 4 – Queue Scaling

Introduced priority queue methods that scaled consensus validation efficiently under load.

4. Results

  • TIN normalization: Achieved ~95% correction accuracy
  • Supplier names: Clustered and resolved with high precision
  • Addresses: Reconstructed to complete, standardized formats
  • Scalability: Queue management maintained processing efficiency under stress tests

5. Compliance & Future Work

Compliance: Designed in alignment with Philippines Data Privacy Act (DPA) and GDPR principles.

Future Work:

  • Expand to multilingual datasets (Tagalog, English, regional dialects)
  • Introduce dynamic dictionaries for industry-specific supplier names
  • Extend consensus framework to global markets lacking central company databases