Embracing Iteration: Building Robust LLM Systems in Production
Introduction
When tasked with extracting information from rate confirmation PDFs, our team faced a common challenge in modern ML engineering: balancing perfect solutions against practical implementations. This experience taught me valuable lessons about iterative development and the importance of building systems that can learn and improve in production.
The Challenge: PDF Processing Complexity
Our goal was seemingly straightforward: extract specific information from rate confirmation PDFs. However, we quickly discovered that the solution would be anything but simple. PDFs came in various formats, and no single approachâwhether using vision models, OCR, or markdown convertersâworked consistently across all documents.
We had two potential paths forward:
- Conduct extensive pre-production testing to find the optimal combination of tools for each PDF type
- Build a system that could learn and adapt in production
The Fallback Architecture
Instead of trying to perfect our system before deployment, we developed a dynamic fallback mechanism. Hereâs how it works:
- Each extraction attempt is scored based on the quality of the extracted sections
- Results and scores are persisted to build a knowledge base
- Different combinations of tools (vision models, OCR, markdown converters) are tried in sequence
- The system learns which combinations work best for different PDF types and sections
- The fallback sequence automatically adjusts based on historical performance
While this approach is computationally expensive in the beginning, it becomes more efficient over time as the system learns the optimal tool combinations for different document types.
Lessons in Production ML Engineering
As a younger ML engineer, I would have been hesitant to deploy a system without handling every edge case. However, experience has taught me several valuable lessons:
First, perfect solutions often come at the cost of delayed deployment and missed opportunities for real-world learning. Instead, itâs crucial to:
- Deploy quickly with proper safeguards
- Implement comprehensive logging
- Monitor system performance
- Iterate based on production data
Second, when working with LLMs, validation becomes even more critical. Understanding the problem space deeply helps ensure accurate results, even as the system evolves.
The Power of Iterative Development
The key insight from this project is that itâs okay to start with an inefficient solution if itâs coupled with:
- Robust fallback mechanisms
- Comprehensive logging
- Clear understanding of limitations
- Ability to learn from production data
This approach allows us to deliver value to customers immediately while continuously improving our system based on real-world usage patterns.
Conclusion
The journey from a complex PDF processing challenge to a learning, adapting system highlights a crucial truth in modern ML engineering: embracing iteration and building systems that can learn from production data is often more valuable than striving for perfect solutions upfront.
For other ML engineers, especially those early in their careers, remember that itâs okay to deploy systems that arenât perfectly optimized. Whatâs important is having the right monitoring, fallbacks, and improvement mechanisms in place. This approach not only delivers value faster but often results in more robust and practical solutions.