Skip to content

07. Solution Architecture & Pre-Sales for AI/GenAI

Two roles you are targeting — an NVIDIA Generative AI Solution Architect and an enterprise contact-center GenAI Solution Architect — are roughly half customer-facing pre-sales work. Discovery, workshops, demos, POC scoping, reference architectures, build-vs-buy framing, and translating between business and technical buyers. The build skills you already have are necessary but not sufficient. This file covers the other half.

The running example throughout: Meridian Bank wants to "add AI to customer support." You are the SA who has to turn that vague ask into a scoped, funded production pilot — a support-agent assistant with RAG over their policy and product docs. Every section returns to it.

What a GenAI Solution Architect actually does

A Solution Architect (SA) is the primary technical owner of a customer relationship, before and after the sale. The job is not to build the customer's system end-to-end. The job is to design the right system, prove it can work, and de-risk the decision to buy and deploy. You spend more time in rooms with the customer than in your own codebase.

The day splits roughly:

Activity Share What it looks like
Customer-facing 40-50% Discovery calls, workshops, demos, exec readouts, POC reviews
Technical design 20-25% Reference architectures, POC scoping, build-vs-buy analysis
Hands-on building 15-20% POC prototypes, demo assets, sample integrations — not production code
Sales collaboration 10-15% Deal strategy, RFP/RFI responses, SOW input, sizing
Enablement / writing 5-10% Customer training, internal field enablement, design docs

The NVIDIA SA job description names this directly: serve as the primary technical domain expert for pre- and post-sale, build proof-of-concept solutions and reference architectures, collaborate with sales on pre-sales activities, and translate complex technical concepts for customer audiences during the sales process. The contact-center role is the same motion aimed at one vertical: scope a virtual-agent / RAG deployment from pilot to full rollout.

The mental shift from IC engineering: your output is not code, it is a customer's confident, correct decision. A beautiful architecture the customer does not understand or trust is a failure. A simpler architecture they fund and deploy is a win.

SA vs SE vs Sales Engineer vs Architect vs Consultant

These titles overlap and vary by company. The distinctions that matter:

Role Primary motion Owns the number? Builds production code?
Solution Architect (SA) Design + prove the solution; pre and post-sale technical ownership Influences, not owns Prototypes/POCs, not prod
Solutions Engineer / Sales Engineer (SE) Technical sales support; demos and POCs to close deals Tied to a quota with the AE Demos and POCs
(Software) Architect Internal system design; long-term technical direction No Sometimes
Consultant / Delivery Architect Post-sale implementation, billable delivery No (or utilization target) Yes, delivery code

At the cloud and platform vendors the lines blur. NVIDIA, AWS, and Databricks use "Solution Architect" for a deeply technical pre/post-sales role. Google Cloud calls the equivalent "Customer Engineer." Salesforce and most SaaS companies say "Solutions Engineer" and tie it tightly to a quota with an Account Executive. The deeper the product is technically, the more the role drifts from "demo to close" toward "architect and prove." A GenAI SA sits on the architect-heavy end: the product is hard, the buyer is skeptical, and a bad architecture sinks the deal six months later.

Pre-sales vs post-sales. Pre-sales is everything before the contract: discovery, workshops, demos, POC, proposal. Post-sales is adoption: deployment guidance, scaling, expansion, renewal. SAs who only do pre-sales get a reputation for overpromising; SAs who only do post-sales never shape the deal. The strong ones own the arc, which is why "land and expand" lives with the SA, not just the AE.

Discovery and qualification

The customer says: "We want to add AI to customer support." That is not a problem statement. It is a solution someone already picked, wrapped around an unnamed pain. Your first job is to find the pain.

Feature request vs business outcome. "We want a chatbot" is a feature request. "Our support team takes 40 hours a week answering the same policy questions, and average handle time is killing our CSAT" is a business outcome. You cannot scope, size, or measure a feature request. You can do all three with an outcome. Discovery is the work of converting one into the other.

Good discovery questions for Meridian Bank:

  • What does a support agent do today when a customer asks about, say, overdraft policy? Walk me through it.
  • How many tickets a month? What fraction are repetitive vs novel?
  • What does "good" look like in 6 months? What number moves?
  • Who feels this pain most — agents, customers, or the support VP's budget?
  • What have you already tried? Why did it stall?
  • Where does the authoritative answer live today, and how often does it change?
  • What happens if the AI gets an answer wrong? Who is liable?
  • Who has to say yes for this to get funded?

Notice the last one. Discovery is also qualification: is this a real deal, or a research project with no budget and no buyer?

Qualification frameworks, adapted for GenAI. BANT (Budget, Authority, Need, Timeline) is the lightweight version — fine for an early call. For a six-figure, multi-stakeholder, multi-month GenAI deal, use MEDDIC or MEDDPICC:

Letter Meaning GenAI-specific question
M Metrics What measurable outcome justifies spend? (handle time, deflection rate, CSAT)
E Economic buyer Who controls the budget? Often a VP, not the engineer you're talking to
D Decision criteria Accuracy bar? Latency bar? Data residency? On-prem requirement?
D Decision process Pilot → eval → procurement → security review → contract. What are the gates?
I Identify (implicate) pain Make the cost of inaction concrete and felt
C Champion Who inside the org sells this when you're not in the room?
(P) Paper process Procurement, legal, security review, data-processing agreements
(C) Competition Build-it-ourselves, ChatGPT Enterprise, a rival vendor

The Champion is the single most predictive element of whether a deal closes. For Meridian, your champion might be the support-platform engineering lead who is tired of being asked for a chatbot and wants a credible partner. Find that person, arm them, and keep them informed. MEDDPICC's extra letters exist because enterprise GenAI deals above six figures hit procurement and competitive dynamics that BANT never models.

The discipline that separates SAs from order-takers: disqualify early and honestly. If there is no metric, no economic buyer, and no decision process, you have a science fair, not a deal. Saying so politely saves everyone a quarter.

Running design and architecture workshops

Once a real problem is named, you converge the room on an architecture. This happens in a workshop, not over email.

Get the right people in the room. A GenAI architecture workshop with only engineers produces a system nobody will fund; with only executives, one nobody can build. For Meridian you want: the support VP (owns the metric and budget), the platform engineering lead (your champion, owns integration), a security/compliance person (owns the veto), and a frontline support lead (owns ground truth about real tickets). Missing the security person is the classic mistake — they show up at the end and reset the whole design over data residency.

Structure that works for a half-day session:

  1. Restate the problem and the target metric out loud. Get agreement before any architecture.
  2. Map the current-state flow on the whiteboard. Where does an answer come from today?
  3. Identify the few places AI changes that flow. Resist redesigning everything.
  4. Sketch a candidate architecture live. Let the room poke holes.
  5. Name constraints explicitly: data can't leave the VPC, latency under 3s, must integrate with their existing Zendesk.
  6. Converge on one architecture to prototype, plus the open questions a POC must answer.

Converging a room. Disagreement in workshops is usually about unstated constraints, not architecture. When the security lead resists the cloud LLM and the engineer wants it for speed, the real conflict is data residency, not model choice. Surface the constraint, write it on the board, and the architecture often picks itself. Your job is to make the implicit explicit, then let the constraints decide — the same move that resolves engineering design disagreements, applied to a room of mixed stakeholders.

Technical demos and POCs

A demo that lands vs a demo that flops. The flop demos the technology: "look, it calls an LLM, here's the latency graph." The demo that lands shows the customer their own pain disappearing. For Meridian, do not demo a generic chatbot. Demo it answering their overdraft-policy question, using their doc, with a citation back to their source — even if the backend is held together with tape. Relevance beats polish. The room remembers seeing their problem solved, not your architecture diagram.

Demo rules:

  • Use the customer's data and vocabulary, never lorem-ipsum.
  • Show one thing working completely, not five things half-working.
  • Show a failure case honestly and how the system handles it (citation, refusal, escalation). Skeptical buyers trust an SA who shows the seams.
  • Never demo something you can't put into the POC. A faked capability becomes a promise.

Scoping a POC / pilot. This is where deals are won or lost. A POC without written success criteria and an exit gate becomes the endless POC — six months of "almost there," no decision, no revenue, your time gone. Scope it like an engineering contract:

Element Meridian POC
Single use case Overdraft and fee policy questions only
Success criteria ≥85% answer accuracy on a 200-question gold set; <3s p95 latency; correct citation on every answer
Dataset 500 real anonymized tickets + current policy docs
Timeline 6 weeks, fixed
Exit gate Hits criteria → fund production pilot. Misses → documented learnings, no auto-extension
Owner on their side The champion, with named time committed

Databricks publicly runs a 6-week RAG POC; that cadence is industry-standard for a reason — it is long enough to prove value and short enough to force a decision. The fixed timeline and the exit gate are the whole point. "Let's just keep iterating" is the sound of a deal dying.

Avoiding the endless-POC trap: define success before you build, get the customer to sign off on the gold set, and put the go/no-go decision date in writing. If the criteria slip, that is a new POC with a new gate, not a quiet extension.

Designing and presenting reference architectures

A reference architecture is the picture the customer takes to their own leadership and security team. It must be correct, tailored, and explainable in five minutes.

Tailor to their constraints, not your favorite stack. The same Meridian support-agent problem yields different architectures depending on constraints:

  • Cloud-OK, fast-moving: managed LLM API (Bedrock/Azure OpenAI/Anthropic), managed vector store, RAG over their docs, citations, a guardrail layer. Ships fastest.
  • Strict data residency (bank, likely): model inside their VPC or on-prem, self-hosted embeddings, vector DB in their account, no data leaving the boundary. Slower, costlier, but passes security.
  • Tight budget / low volume: smaller model, aggressive caching, deflect-then-escalate rather than full automation.
  • Thin AI skills internally: lean on a platform (NVIDIA NIM microservices, a managed agent platform) so they're not maintaining serving infra they can't staff.

You present the one that fits their cloud, their data rules, their budget, and their team's skills. An architecture the customer cannot operate is a liability you handed them.

Build-vs-buy framing in front of a customer. Customers ask "should we build this ourselves?" The honest SA does not always say buy. The framing:

  • Build when it is core differentiation, you have the talent, and you'll maintain it for years.
  • Buy / use a platform when it is undifferentiated plumbing (serving, vector infra, guardrails) and speed-to-value matters.
  • For Meridian: buy the platform and model serving (not their differentiation, can't staff it), build the integration and the eval/gold-set discipline (this is their quality moat).

Saying "build this part yourselves, you don't need us for it" is the most trust-building sentence an SA can say. It is also how you win the next, bigger deal.

Communicating to mixed audiences

The hardest SA skill is talking to a business buyer and a technical buyer in the same room without losing either.

The two languages:

Concern Business stakeholder hears Technical stakeholder hears
Quality "Right answer 9 times out of 10, cites its source" "85% accuracy on a 200-item gold set, hybrid retrieval + reranker"
Speed "Feels instant to the customer" "p95 under 3 seconds end-to-end"
Risk "It says 'I don't know' instead of guessing" "Refusal policy + retrieval-grounding + confidence threshold"
Cost "Pays back in ~5 months from reduced handle time" "~$0.02 per resolved query at projected volume"
Time-to-value "Pilot live in 6 weeks, full rollout next quarter" "POC scope, then phased deployment behind a feature flag"

Lead with the business framing, then offer the technical depth to whoever wants it. The skill is the live translation: the support VP asks "is it accurate?" and you answer "it gets the right answer about 9 times in 10 and always shows the source it used" — then turn to the engineer and add "85% on the gold set, and we gate every change against it."

Handling a skeptical CTO. Skepticism is usually informed, not hostile. The CTO has seen AI demos fail in production. Do not oversell. Acknowledge the failure modes before they raise them: "Yes, LLMs hallucinate — here's the grounding and refusal design that contains it, and here's the eval that proves it." A CTO trusts the SA who names the risk first.

Handling a non-technical buyer. Avoid jargon entirely; anchor on the metric and the analogy to something they already trust. "Think of it like a very fast junior agent who only answers from the approved policy manual and escalates anything it's unsure about." Then prove it with their own data in the demo.

Objection handling and competitive positioning

Objections are buying signals — a customer who has stopped objecting has stopped considering you. Handle them straight, never by overpromising.

Objection Weak response Strong response
"It'll hallucinate / be inaccurate" "Our model is very accurate" "It will sometimes be wrong. We contain it: grounded retrieval, citations, refusal on low confidence, and a gold-set eval that gates every change. Here's the measured accuracy."
"Our data is sensitive / regulated" "It's secure" "Walk me through your residency rules. We can keep the model and data inside your VPC, no training on your data, full audit logs. Let's get your security lead in the next session."
"It's too expensive" "It pays for itself" "Here's per-query cost at your volume vs. agent time saved. We start with one use case so you prove ROI before scaling spend."
"Why not just use ChatGPT?" "Ours is better" "ChatGPT is great for general questions. It doesn't know your policies, can't cite your docs, won't meet your residency rules, and has no eval gate. That's the gap this closes."
"Vendor X says they do this too" Trash the competitor "They're solid. The difference for your situation is [specific: residency, eval discipline, integration with your stack]. Let's compare on your actual criteria."

Two rules. First, never promise zero hallucinations or 100% accuracy — the one time it fails, you lose all credibility, and the technical buyer knows the claim is false anyway. Second, never trash a competitor; position on the customer's specific decision criteria. Disparagement reads as insecurity.

Working with sales and business development

The SA does not own the quota, but the SA owns whether the deal is technically winnable. In pre-sales you and the Account Executive (AE) operate as a pair: the AE owns the relationship, commercials, and close; you own technical credibility, the solution, and de-risking.

RFP / RFI responses. When Meridian's procurement sends a 60-question RFP, the SA writes the technical answers: architecture, security, accuracy approach, integration, SLAs. Answer the actual question, map each answer to their stated criteria, and flag anything you can't truthfully commit to rather than fudging it. A fudged RFP answer becomes a contractual obligation.

SOW and proposal basics. A Statement of Work names scope, deliverables, timeline, success criteria, responsibilities (yours vs theirs), and what's explicitly out of scope. The out-of-scope section prevents the endless-POC and scope-creep death. For the Meridian pilot: in scope is the overdraft/fee use case and the eval harness; out of scope is voice, other languages, and other product lines — those are phase two.

Sizing and estimation. Customers and AEs need numbers. You estimate: token/query volume → model and infra cost per month, integration effort in engineer-weeks, and timeline to pilot and to full rollout. Reasonable contact-center cadences: a full Advanced RAG system in 4-6 weeks; a multi-agent or voice-AI system in 6-10 weeks. Anchor estimates in ranges with stated assumptions; a false-precision single number you'll later miss is worse than an honest range.

Customer enablement and field training

Part of the post-sale SA job is making the customer and your own field team self-sufficient.

Customer enablement. After the Meridian pilot succeeds, run a workshop for their engineers on operating the system: how the eval gate works, how to update the gold set when policy changes, how to read the traces when an answer looks wrong. A customer who can operate the system renews and expands. A customer dependent on you for every change churns the moment you're busy.

Field enablement. At a vendor like NVIDIA or AWS, SAs also train the wider sales and SA org: reusable demo assets, reference architectures, "how to scope a RAG POC" playbooks. This is how one SA's learning becomes the whole field team's capability. It is also the highest-leverage work an SA does — it scales you past your own calendar.

First 90 days as an AI SA

Phase Focus
Days 1-30 Learn the product stack deeply enough to demo it cold. Shadow senior SAs on discovery calls and workshops. Read recent won and lost deal post-mortems. Build your own demo environment.
Days 31-60 Run discovery on a real (low-stakes) account with a senior SA watching. Build and deliver your first scoped demo. Co-author one RFP response. Learn the qualification framework the org actually uses.
Days 61-90 Own a POC end-to-end: scope it, set success criteria, run the workshop, deliver the readout. Start building one reusable asset for the field team. Have a point of view on one architecture decision the org debates.

A day in the life. Morning: discovery call with a new prospect (mostly listening, mapping pain to outcome). Late morning: build session refining a demo with the customer's data. Afternoon: architecture workshop converging a different customer's room on a reference design, then a 30-minute exec readout translating the POC results into ROI for a VP. End of day: write the SOW scope for the deal that's ready and update the AE on two others. Maybe two hours of hands-on building across the whole day — the rest is rooms and writing.

Interview Q&A

Q1. A customer says "we want to add AI to our support flow." Walk me through your first call. Don't pitch. Run discovery: map the current support flow, find where the real pain and cost sit, name a measurable target outcome, and qualify whether there's a budget, an economic buyer, and a decision process. Convert the feature request ("a chatbot") into a business outcome ("cut handle time on repetitive policy questions"). Leave with a defined problem, a candidate metric, and the names of who has to say yes. Common wrong answer to avoid: jumping straight to proposing a RAG architecture before you understand the problem, the constraints, or whether the deal is even real.

Q2. Whiteboard a GenAI architecture for a regulated bank that wants a support-agent assistant. Clarify constraints first — data residency almost certainly means nothing leaves their boundary. Then: ingestion and chunking of policy/product docs → embeddings (self-hosted, in their VPC) → vector store in their account → hybrid retrieval + reranker → LLM (in-VPC or approved private endpoint) with a grounded, cite-or-refuse prompt → guardrail layer (PII, refusal on low confidence) → escalation to a human agent → an eval gold set gating every change, plus audit logging. State the latency and accuracy targets and the human-in-the-loop path. Common wrong answer to avoid: drawing a generic cloud-API RAG diagram without asking about data residency, then watching the security lead veto the whole thing.

Q3. Your POC has run for three months and the customer keeps asking for "just one more tweak." What do you do? Name it: this is an endless POC, which means no exit gate was set or it's being ignored. Stop and re-anchor on the written success criteria. If they were met, force the go/no-go decision — that was the deal. If they were never set, set them now with a fixed date and a clear gate, and reframe further changes as a new, separately scoped phase. Iteration without a decision date is a deal quietly dying. Common wrong answer to avoid: continuing to iterate to keep the customer happy, which burns your time and signals to procurement that the product isn't ready.

Q4. A skeptical CTO says "these things just hallucinate, I don't trust it." How do you respond? Agree, then contain. "You're right, LLMs hallucinate — so we design around it." Walk through grounded retrieval, citations, refusal on low confidence, and the eval gold set that measures and gates accuracy. Show a failure case in the demo and how the system handles it gracefully. Naming the risk before they do, and proving you've engineered for it, builds more trust than any accuracy claim. Common wrong answer to avoid: insisting your model is "highly accurate" or "doesn't hallucinate" — the CTO knows that's false and you lose the room.

Q5. The business buyer and the lead engineer are in the same readout. How do you present POC results? Lead with the business outcome in plain language — "answers the right policy question 9 times in 10, cites its source, pays back in about 5 months from reduced handle time." Then offer the technical depth for whoever wants it: 85% on the 200-item gold set, hybrid retrieval, p95 under 3s, per-query cost. Translate live when the VP asks "is it accurate?" Don't make the engineer sit through marketing or the VP sit through architecture. Common wrong answer to avoid: pitching one audience and losing the other — all ROI slides (engineer checks out) or all architecture (VP checks out).

Q6. The customer asks "why shouldn't we just build this ourselves?" You sell a platform. What do you say? Be honest about the build-vs-buy line. Tell them what they should build — their integration and their eval/gold-set discipline, because that's their quality moat — and what isn't worth building: model serving, vector infra, guardrail plumbing they can't staff to maintain. Recommending they build part of it themselves is the most trust-building thing you can do and it usually wins the larger deal. Common wrong answer to avoid: arguing they should buy everything, which signals you're protecting a sale rather than solving their problem.

Q7. How do you scope a POC so it actually closes? One use case, a written gold set with measurable success criteria, real customer data, a fixed timeline (6 weeks is a good default), a named owner on their side, and an explicit exit gate: hit the criteria → fund the production pilot; miss → documented learnings, no automatic extension. Get the customer to sign off on the gold set and the decision date before you build anything. Common wrong answer to avoid: scoping a broad "explore what's possible" POC with no success criteria — that's the recipe for the endless POC.

Q8. A competitor is in the deal claiming the same capabilities. How do you position? Never trash them. Anchor on the customer's actual decision criteria and show where the difference matters for their situation — data residency, eval discipline, integration with their existing stack, total cost at their volume. Make them compare on the criteria you're genuinely strong on, expressed in their terms. Confidence plus specificity beats disparagement, which only reads as insecurity. Common wrong answer to avoid: badmouthing the competitor or making vague "we're just better" claims with no grounding in the customer's criteria.

Q9. (Cumulative) The POC met its accuracy target offline but answers look wrong in the customer's live pilot. Is this a discovery problem, an architecture problem, or an eval problem? Usually a discovery/eval problem before an architecture one. If offline passed but live fails, the gold set didn't reflect real user behavior — the questions real agents and customers ask differ from the curated set (the same lesson as building eval sets from real traffic, not clean internal demos). Re-derive the gold set from real pilot tickets, then check retrieval freshness and chunking. Reach for re-architecting the retrieval stack only after you've confirmed the eval was measuring the wrong thing. Common wrong answer to avoid: immediately blaming the model or rebuilding the retrieval pipeline before checking whether the gold set ever matched real usage.

Apply now (10 min)

Take one of your two target JDs — the NVIDIA GenAI SA or the contact-center GenAI SA — and produce two artifacts for a named prospect (use Meridian Bank or a real company in that vertical).

1. Discovery question list (8-10 questions). Write the questions you'd ask on the first call, organized so they uncover the business outcome and qualify the deal. Cover: current-state flow, pain and cost, target metric, prior attempts, constraints (especially data residency), failure-mode liability, economic buyer, and decision process. Mark which MEDDIC letter each question is probing.

2. One-page reference architecture sketch. ASCII or boxes-and-arrows: ingestion → embeddings → vector store → retrieval + rerank → LLM with cite-or-refuse → guardrails → escalation → eval gold set + audit logging. Annotate it with their constraints (where data lives, latency target, accuracy target) and one explicit build-vs-buy call. Then write the three sentences you'd say presenting it to a mixed business/technical room.

If you can do both cold for either JD, you can run an SA first round.

How the SA / pre-sales motion runs at real companies

  • NVIDIA — Gen AI Solution Architects are the primary technical experts pre- and post-sale, building POCs and reference architectures on the full NVIDIA AI stack and embedding with partners to deploy at scale.
  • AWS — Solutions Architects pair with account teams to design customer architectures on AWS services, run well-architected reviews, and de-risk migrations and GenAI builds (Bedrock, Connect) before commit.
  • Google Cloud — the equivalent role is "Customer Engineer," owning technical discovery, demos, and POCs for Vertex AI and Gemini Enterprise deals alongside the account executive.
  • Databricks — Solutions Architects run scoped engagements like the published 6-week RAG POC, proving lakehouse + GenAI value against a customer's own data before a larger commitment.
  • Snowflake — Sales Engineers / SAs design data-platform and Cortex AI architectures, handling the data-governance and residency objections that dominate enterprise data deals.
  • Salesforce — Solutions Engineers tie tightly to an Account Executive's quota, demoing Agentforce and Einstein against the prospect's CRM data to close.
  • Microsoft — Cloud Solution Architects and technical specialists drive Azure OpenAI and Copilot adoption, leaning on the partner ecosystem for delivery while they shape the architecture.
  • Anthropic / OpenAI — Applied AI / forward-deployed engineers and solutions architects scope enterprise deployments, tune prompts and guardrails against the customer's data, and translate model capabilities into a concrete production design.
  • Palantir — Forward-Deployed Engineers sit on-site embedding with the customer, blurring SA and delivery: discover the problem, build the solution, and prove value in the customer's environment.
  • Cohere — solutions / forward-deployed engineers focus enterprise RAG and on-prem/VPC deployments where data residency is the deciding constraint, exactly the bank scenario.
  • Twilio / Amazon Connect — contact-center solution architects scope virtual-agent and agent-assist GenAI deployments, moving customers from pilot to full rollout on voice and chat.
  • Cresta / Sierra — contact-center AI vendors run SAs who prove deflection and handle-time gains in a scoped pilot before enterprise rollout, the contact-center JD's exact motion.
  • Snowflake / Databricks partner accelerators — pre-built RAG accelerators exist specifically so SAs can stand up a credible POC in weeks instead of months and force a faster decision.
  • IBM / Accenture — delivery-heavy consultants and solution architects own the post-sale build and integration, the billable end of the spectrum where the SA also does delivery code.
  • Glean / Writer (enterprise RAG) — SAs lead with the customer's own knowledge base in the demo, because relevance to the buyer's documents is what lands the deal, not generic capability.

Sources