A large company handling huge amounts of consumer transaction data faced challenges in extracting useful information from receipts that came in tens of thousands of formats. Traditional OCR and template-based systems failed to scale, creating operational bottlenecks and limiting downstream analytics. Blackstraw delivered a layout- agnostic receipt intelligence system that can process 500,000 receipts each day across over 76,000+ formats. This resulted in a reliable solution that led to four patents and one peer-reviewed workshop paper.
The client operated at internet scale, processing receipts with multiple different layouts, fonts, and structures. Frequent format changes made rule-based extraction fragile and costly to maintain. As volume increased, manual intervention and template updates became unmanageable, limiting reliable automation and slowing down the process of using transaction data at scale. To support growth and allow continuous processing, the client required a document intelligence system that could adapt to new receipt formats without depending on preset templates or manual rule engineering.
Graph-Based Document Decoding Architecture: Designed a layout-agnostic receipt decoding system using Graph Attention Networks (GATs) to model relationships between text elements in unstructured documents.
Vision + Text Intelligence: Combined visual cues and textual context to accurately detect and interpret text lines across highly variable receipt layouts.
Template-Free Adaptation: Eliminated dependency on predefined templates, enabling the system to dynamically adapt to previously unseen receipt formats.
High-Throughput Inference Pipelines: Built scalable inference pipelines to support continuous, high-volume receipt ingestion at internet scale.
Format-Agnostic Processing at Scale: Successfully processed 76,000+ receipt layouts without manual template creation or rule maintenance.
Massive Throughput Enablement: Enabled continuous processing of 500,000 receipts per day, supporting enterprise-scale ingestion requirements.
Defensible Intellectual Property: Resulted in 4 patents and 1 peer-reviewed workshop publication, establishing technical differentiation and research credibility.
Foundation for Global Document Automation: Positioned the client to scale receipt and document intelligence globally without proportional increases in operational complexity.