Navigating the Financial Labyrinth
A 19-node agentic pipeline that turns seven disconnected data sources into IRS-ready tax schedules, a retirement portfolio dashboard, and filled PDF forms.
The Challenge
Every year, seven disconnected financial systems speak different languages. Schwab brokerage data arrives as CSV exports. Retirement accounts pipe through Plaid. Crypto lives in Coinbase. YNAB manages three separate budgets. Buildium rental management spits out PDFs. Tax documents scatter across portals. Manual configuration lives in scattered spreadsheets.
The old process: one human, spreadsheets, phone calls, and weeks of frustration. Tax season was the breaking point. A recent accountant switch triggered a 20-person email thread. Deductible expenses were missed. Categories were inconsistent. The question became urgent: what if personal finance ran through enterprise-grade pipeline architecture?
The answer: a 19-node LangGraph orchestrator, seven source connectors, four validation layers, dbt transformation with 13 models and 58 tests, and four MCP servers that answer financial questions conversationally. From chaos to automation. From weeks to hours.
The Architecture
The Maker-Checker Pattern
Every calculation is deterministic Python first. Then, a Gemini node validates the result independently. If they agree, we proceed. If they disagree, the pipeline halts and flags the discrepancy for human review. LLMs validate, they do not calculate. Financial data demands accuracy that you can audit and understand. This pattern ensures both.
The Conversational Layer
The pipeline operates in two modes. Batch mode runs nightly: ingest, transform, validate, output. But financial questions do not always wait for batch windows. Four MCP servers expose the validated data as a conversational interface.
| MCP Server | Example Query |
|---|---|
| finance-portfolio | "What is my current allocation by bucket?" |
| finance-rental | "Calculate this month's P&L for property number 2." |
| finance-tax | "What is my estimated tax liability for the quarter?" |
| finance-ynab | "What did I spend on property insurance last quarter?" |
Each server queries the validated mart layer. Results are current, auditable, and rooted in the deterministic pipeline. The system operates in two modes: batch automation and conversational interaction. Together, they make financial data both reliable and accessible.
The Stack
| Layer | Technology | Role |
|---|---|---|
| Orchestration | LangGraph | 19-node stateful graph with conditional routing |
| Validation | Gemini | 7 maker-checker nodes including IRS PDF checker |
| Transformation | dbt | 13 models, 58 tests, staging/intermediate/mart layers |
| Storage | DuckDB | Local analytical warehouse, zero cloud dependency |
| Source APIs | Python | Schwab, YNAB, Buildium, Plaid, Coinbase integrations |
| Conversational | MCP Servers | 4 domain-specific servers for interactive queries |
| Tax Output | Python / PDF | IRS form filler producing Schedule B, C, D, E |
| Portfolio | Google Sheets | Multi-tab dashboard with holdings, buckets, rebalancing |
The Result
The first full pipeline run surfaced deductible expenses invisible in the manual process. Missed depreciation schedules. Overlooked rental operating costs. A categorization inconsistency that looked small but compounded. The system led to an amended return and recovered additional refund.
Now, the entire tax year processes in hours instead of weeks. Every number is tested. Every categorization is validated. Every edge case is documented. The system runs every night. Results populate within minutes.
But the pipeline is more than tax prep. It is a foundation. Portfolio allocation decisions are based on validated cost basis. Rental property decisions are grounded in accurate cash flow analysis. Retirement planning reflects real dividend income and options returns. The data foundation became reliable enough to make decisions on.
What This Demonstrates
Data Engineering
Connectors, transforms, mart layers, testing discipline. The unglamorous foundation that makes everything else possible.
AI/ML Engineering
Maker-checker validation, agentic orchestration, conversational interfaces. AI adds judgment, not determinism.
Domain Expertise
IRS rules, cost basis calculation, bucket strategy logic. The domain rules that transform raw data into insight.
Built with AI: Claude was the development partner for this entire system -- from initial design through implementation and iteration. Every component, test, and integration emerged from human-AI collaboration.
Need a Guide Through Your Data Labyrinth?
This pipeline was built with the same architecture patterns, validation rigor, and AI-augmented workflows we bring to every engagement. If your data lives in too many places and your decisions run on hope -- we should talk.
Start a Conversation