Navigating the Financial Labyrinth

A 19-node agentic pipeline that turns seven disconnected data sources into IRS-ready tax schedules, a retirement portfolio dashboard, and filled PDF forms.

The Challenge

Every year, seven disconnected financial systems speak different languages. Schwab brokerage data arrives as CSV exports. Retirement accounts pipe through Plaid. Crypto lives in Coinbase. YNAB manages three separate budgets. Buildium rental management spits out PDFs. Tax documents scatter across portals. Manual configuration lives in scattered spreadsheets.

The old process: one human, spreadsheets, phone calls, and weeks of frustration. Tax season was the breaking point. A recent accountant switch triggered a 20-person email thread. Deductible expenses were missed. Categories were inconsistent. The question became urgent: what if personal finance ran through enterprise-grade pipeline architecture?

The answer: a 19-node LangGraph orchestrator, seven source connectors, four validation layers, dbt transformation with 13 models and 58 tests, and four MCP servers that answer financial questions conversationally. From chaos to automation. From weeks to hours.

The Architecture

Data Sources
Schwab API
Plaid / John Hancock
Coinbase API
YNAB 3-Budget
Buildium CSV
Tax Portals
Manual Config
7 connectors, raw data, schema-heterogeneous
dbt: 13 models, 58 tests [pipe] DuckDB analytical warehouse
7 Gemini maker-checker nodes + IRS PDF checker
IRS Forms [pipe] Google Sheets Dashboard [pipe] MCP Query Servers
Output Targets
IRS PDF Forms
Google Sheets Portfolio
MCP Query Layer

The Maker-Checker Pattern

Step 1
Python Calculation
Step 2
Gemini Validation
Step 3
Agreement / Investigate

Every calculation is deterministic Python first. Then, a Gemini node validates the result independently. If they agree, we proceed. If they disagree, the pipeline halts and flags the discrepancy for human review. LLMs validate, they do not calculate. Financial data demands accuracy that you can audit and understand. This pattern ensures both.

The Conversational Layer

The pipeline operates in two modes. Batch mode runs nightly: ingest, transform, validate, output. But financial questions do not always wait for batch windows. Four MCP servers expose the validated data as a conversational interface.

MCP ServerExample Query
finance-portfolio"What is my current allocation by bucket?"
finance-rental"Calculate this month's P&L for property number 2."
finance-tax"What is my estimated tax liability for the quarter?"
finance-ynab"What did I spend on property insurance last quarter?"

Each server queries the validated mart layer. Results are current, auditable, and rooted in the deterministic pipeline. The system operates in two modes: batch automation and conversational interaction. Together, they make financial data both reliable and accessible.

The Stack

LayerTechnologyRole
OrchestrationLangGraph19-node stateful graph with conditional routing
ValidationGemini7 maker-checker nodes including IRS PDF checker
Transformationdbt13 models, 58 tests, staging/intermediate/mart layers
StorageDuckDBLocal analytical warehouse, zero cloud dependency
Source APIsPythonSchwab, YNAB, Buildium, Plaid, Coinbase integrations
ConversationalMCP Servers4 domain-specific servers for interactive queries
Tax OutputPython / PDFIRS form filler producing Schedule B, C, D, E
PortfolioGoogle SheetsMulti-tab dashboard with holdings, buckets, rebalancing

The Result

The first full pipeline run surfaced deductible expenses invisible in the manual process. Missed depreciation schedules. Overlooked rental operating costs. A categorization inconsistency that looked small but compounded. The system led to an amended return and recovered additional refund.

Now, the entire tax year processes in hours instead of weeks. Every number is tested. Every categorization is validated. Every edge case is documented. The system runs every night. Results populate within minutes.

But the pipeline is more than tax prep. It is a foundation. Portfolio allocation decisions are based on validated cost basis. Rental property decisions are grounded in accurate cash flow analysis. Retirement planning reflects real dividend income and options returns. The data foundation became reliable enough to make decisions on.

What This Demonstrates

Data Engineering

Connectors, transforms, mart layers, testing discipline. The unglamorous foundation that makes everything else possible.

AI/ML Engineering

Maker-checker validation, agentic orchestration, conversational interfaces. AI adds judgment, not determinism.

Domain Expertise

IRS rules, cost basis calculation, bucket strategy logic. The domain rules that transform raw data into insight.

Built with AI: Claude was the development partner for this entire system -- from initial design through implementation and iteration. Every component, test, and integration emerged from human-AI collaboration.

Need a Guide Through Your Data Labyrinth?

This pipeline was built with the same architecture patterns, validation rigor, and AI-augmented workflows we bring to every engagement. If your data lives in too many places and your decisions run on hope -- we should talk.

Start a Conversation