Agentic AI Is Here—Are Your QA and QC Practices Ready?

The rise of artificial intelligence (AI), particularly large language models (LLMs) and agentic AI systems, is reshaping the software landscape. Far beyond a tool for productivity gains—such as automating repetitive tasks or generating code snippets—AI is fundamentally altering how software is designed, developed, and validated. This transformation brings profound implications for Quality Assurance (QA) and Quality Control (QC), the twin pillars of software quality. While traditional software relied on deterministic processes with predictable outcomes, AI introduces non-determinism, complexity, and interconnectedness that challenge long-standing assumptions. In domains like Customer Relationship Management (CRM), where systems like Salesforce orchestrate critical business processes, these shifts demand a reevaluation of how we ensure software is both the right solution and built correctly.

This article explores the paradigm shift AI imposes on software design and its ripple effects on QA and QC. Drawing on the nuances of non-deterministic systems and the unique challenges of agentic AI, we’ll examine why traditional approaches fall short and propose new strategies to navigate this uncharted territory. Through a CRM-focused example, we’ll ground these concepts in a practical context, illustrating how businesses can adapt to harness AI’s potential while safeguarding quality.

The Evolution of Software Design: From Deterministic to Probabilistic

Traditional software design assumes predictability. A function in a Java program, given the same inputs, produces the same outputs—every time. Developers and testers rely on this determinism to define requirements, write code, and verify behavior. In contrast, AI systems, particularly LLMs like those powering ChatGPT, Gemini, Claude or Grok, operate probabilistically. The same prompt can yield different responses depending on factors like temperature settings, training data drift, or even the whims of random sampling. This non-determinism isn’t a flaw; it’s a feature that enables creativity, adaptability, and human-like reasoning.

Agentic AI—systems that autonomously make decisions, interact with tools, and orchestrate workflows—takes this further. Imagine a Salesforce-based agentic AI designed to manage customer interactions: it might query a CRM database, generate personalized emails using an LLM, and schedule follow-ups via an external calendar API, all without human intervention. Each component introduces variability, and their interactions create emergent behaviors that defy simple prediction. Software design in this context shifts from crafting rigid logic to defining flexible frameworks that guide AI toward desired outcomes while tolerating uncertainty.

This evolution redefines the role of developers. Instead of writing explicit instructions, they design guardrails—constraints and objectives that steer AI behavior. For example, in a Salesforce environment, developers might configure an agent to prioritize high-value leads based on CRM data, but the exact phrasing of its outreach emails depends on the LLM’s interpretation. This shift demands new skills: prompt engineering, probabilistic modeling, and systems thinking to anticipate how components interact. The result is software that’s less a monolith and more a living ecosystem, constantly adapting to new inputs and contexts.

QA and QC: A Tale of Two Disciplines

To understand AI’s impact, we must first revisit the distinction between QA and QC, as articulated by software engineering pioneers like Ian Sommerville. QA asks, “Are we building the right thing?” It’s a strategic, process-oriented discipline focused on aligning software with user needs, business goals, and ethical standards. QC, conversely, asks, “Are we building it right?” It’s tactical, centered on testing and inspecting deliverables to ensure they meet specifications.

In traditional software, QA might involve validating that a CRM feature, like Salesforce’s Opportunity Management, addresses sales teams’ needs for pipeline tracking. QC would then test the feature—checking that calculations for deal stages are accurate and the UI renders correctly across browsers. Both rely on determinism: QA assumes requirements can be clearly defined, and QC expects consistent outputs to verify against.

AI upends these assumptions. A Salesforce agentic AI, for instance, might autonomously score leads, draft emails, and log interactions. QA must grapple with questions like: Does the system prioritize the right leads? Are its communications culturally appropriate? QC faces even thornier issues: How do you test a system where identical inputs produce varied outputs? These challenges require rethinking both disciplines, particularly in the CRM domain, where trust and precision are paramount.

The QA Challenge: Building the Right Thing in a Non-Deterministic World

QA’s mission—ensuring software solves the right problem—becomes exponentially harder with AI. Non-determinism blurs the line between “right” and “wrong.” Consider a Salesforce agentic AI tasked with automating customer support ticket resolution. The “right” solution might mean resolving 90% of tickets without human intervention while maintaining customer satisfaction. But how do you define “satisfaction” when the AI’s responses vary? QA must shift from rigid requirements to flexible objectives, balancing measurable outcomes (e.g., resolution rate) with qualitative goals (e.g., tone consistency).

Use Case: Salesforce Agentic AI for Lead Nurturing

Let’s ground this in a CRM example. A company uses Salesforce to manage leads, deploying an agentic AI to nurture prospects. The system:

·     Analyzes CRM data to identify high-potential leads.

·     Generates personalized email drafts using an LLM.

·     Schedules follow-ups based on prospect responses, integrating with an external calendar tool.

QA’s Role: QA starts by defining the system’s purpose. The “right thing” here is maximizing conversions while preserving brand voice and avoiding spam-like behavior. Key questions include:

·      Does the AI correctly identify high-potential leads based on data like purchase history or engagement metrics?

·      Are emails personalized without being overly informal or off-brand?

·      Does the system respect opt-out requests to comply with regulations like GDPR?

To answer these, QA must:

·      Engage Stakeholders: Collaborate with sales teams, marketing, and compliance officers to align on goals. For instance, sales might prioritize lead volume, while marketing emphasizes tone.

·      Define Success Metrics: Combine quantitative measures (e.g., conversion rate increase of 15%) with qualitative checks (e.g., emails score ≥8/10 for relevance in human reviews).

·      Simulate Workflows: Model real-world scenarios, like a lead responding negatively, to ensure the AI adapts appropriately (e.g., pausing outreach).

The challenge lies in variability. The LLM might generate emails that are brilliant one day and awkward the next. QA must anticipate this, setting boundaries—like requiring emails to avoid certain phrases—while accepting that exact outputs can’t be prescribed. This is a departure from traditional QA, where requirements like “the button turns blue on hover” left no room for ambiguity.

Ethical Considerations

AI also raises ethical stakes for QA. In our Salesforce example, what if the AI prioritizes leads based on biased data, favoring certain demographics? QA must proactively design processes to detect and mitigate bias, such as auditing lead-scoring algorithms or sampling outputs for fairness. This isn’t just about building the right thing for the business—it’s about ensuring the system aligns with societal values.

The QC Challenge: Building It Right When “Right” Is a Range

QC, focused on verifying correctness, faces a more immediate hurdle: how do you test a system that doesn’t produce consistent results? Traditional QC relies on deterministic test cases—input X should yield output Y. With LLMs, output Y might be a range of responses, some excellent, others flawed. Agentic AI compounds this, as interactions between components create unpredictable outcomes.

Revisiting the Salesforce Example

For our lead-nurturing AI, QC must verify that:

·      Lead-scoring logic correctly ranks prospects based on CRM data.

·      Email drafts meet quality standards (e.g., grammatically correct, on-brand).

·      Follow-up scheduling respects constraints (e.g., no emails sent at 3 AM).

QC’s Role: Traditional tests won’t suffice. QC must adopt probabilistic approaches:

·      Statistical Testing: Evaluate outputs against distributions, not single cases. For instance, test that 95% of emails score above a certain quality threshold (e.g., using metrics like readability or sentiment analysis).

·      Robustness Checks: Probe edge cases, like incomplete CRM data or ambiguous prospect replies, to ensure the system doesn’t break. For example, if a lead’s email bounces, does the AI retry appropriately?

·      End-to-End Validation: Test the entire pipeline, from lead scoring to scheduling, to catch errors in component interactions. A perfectly scored lead is useless if the follow-up email is scheduled for the wrong time zone.

The Versioning Problem

A unique complication arises from how LLMs are updated. Unlike traditional APIs, where versions are clearly labeled (e.g., Salesforce API v60.0), public LLMs often receive opaque updates. If the LLM powering our email drafts changes overnight—say, becoming more formal—QC tests might fail unexpectedly. Without the ability to pin to a specific model version, teams must continuously revalidate outputs, increasing testing overhead. In our example, QC might need to retest email tone weekly to ensure it aligns with brand guidelines, a burden traditional software rarely imposes.

Agentic Complexity

Agentic AI’s interconnectedness amplifies QC challenges. If our Salesforce AI misinterprets a calendar API’s response (e.g., scheduling conflicts), the error might cascade, flooding a prospect with emails. QC must test not just individual components but their interactions, using techniques like:

·      Scenario-Based Testing: Simulate real-world workflows, like a prospect replying with a question, to verify system resilience.

·      Chaos Engineering: Introduce failures (e.g., API downtime) to ensure the AI recovers gracefully.

·      Tracing: Log intermediate outputs to pinpoint where errors occur, like a misparsed CRM field leading to an irrelevant email.

The Multiplier Effect of Agentic AI

Agentic AI, common in CRM systems like Salesforce, introduces a multiplier effect on both QA and QC. Each component—CRM data analysis, LLM-generated content, external tool integration—adds variability. When chained, small deviations can snowball. In our lead-nurturing example:

·      A slight bias in lead scoring (e.g., overemphasizing recent activity) might prioritize low-value prospects.

·      This could trigger off-tone emails, eroding trust.

·      Misaligned scheduling might then spam prospects, violating compliance.

QA must anticipate these cascades during design, defining clear handoffs between components (e.g., validating lead scores before email generation). QC must test the system holistically, ensuring errors don’t propagate. The complexity grows exponentially with more components, as each interaction creates new failure modes. Traditional software rarely faces this scale of interdependence, making agentic AI a frontier for quality practices.

Rethinking QA and QC for the AI Era

To navigate these challenges, businesses must evolve their approaches:

For QA:

·      Flexible Requirements: Define goals as ranges (e.g., “emails should convert 10-15% of leads”) rather than fixed outputs.

·      Stakeholder Collaboration: Involve diverse teams—sales, legal, ethics—to ensure the system aligns with multifaceted needs.

·      Ethical Audits: Regularly assess AI for bias, fairness, and compliance, especially in regulated domains like CRM.

For QC:

·      Probabilistic Metrics: Use statistical tools to evaluate output distributions, like sampling 1,000 emails to check quality.

·      Automated Monitoring: Deploy real-time checks to flag anomalies, such as a sudden drop in email relevance.

·      Hybrid Validation: Combine automated tests with human reviews to catch nuanced errors, like tone mismatches.

In our Salesforce example, QA might simulate a year of lead nurturing to refine the system’s objectives, while QC could run Monte Carlo simulations to map possible outcomes, identifying rare but costly failures. These methods, while resource-intensive, reflect the reality of AI-driven software.

Beyond Productivity: AI as a Paradigm Shift

AI’s impact transcends productivity. Automating email drafts or lead scoring saves time, but the true revolution lies in how it redefines software itself. Systems are no longer static artifacts but dynamic entities that learn, adapt, and occasionally surprise. This shift demands a corresponding evolution in QA and QC, moving from deterministic checklists to probabilistic frameworks that embrace uncertainty.

In CRM, where customer trust is the currency, getting this right is critical. Our Salesforce agentic AI must nurture leads effectively (QA’s domain) and execute flawlessly (QC’s responsibility). Missteps—whether prioritizing the wrong prospects or sending tone-deaf emails—can erode relationships built over years. By rethinking quality practices, businesses can harness AI’s potential while safeguarding what matters most.

Conclusion

AI, particularly agentic AI, is not just a tool but a new way of building software. Its non-determinism and interconnectedness challenge traditional QA and QC, demanding approaches that balance flexibility with rigor. In the CRM world, where Salesforce powers mission-critical processes, these changes are both a risk and an opportunity. By designing systems that anticipate variability, testing them probabilistically, and aligning them with user needs, we can build AI-driven software that’s not only innovative but trustworthy. The journey is complex, but the destination—a world where software evolves alongside its users—is worth the effort.