What should I look for when buying AI hiring software?

When buying AI hiring software, demand three things: explainability (the ability to see exactly why a candidate received a specific score, not just a label), control over training data (the AI should learn from your organisation's own standards via active learning, not a generic third-party dataset), and auditability (a complete scoring record that can be produced for any candidate in a compliance review). Vendors who cannot demonstrate all three create legal exposure for your organisation.

What is explainable AI in hiring?

Explainable AI in hiring means you can trace every candidate score back to the specific evidence that produced it — the candidate's actual response, the criteria applied to evaluate it, and why that response was rated the way it was. It is different from transparency (which describes the system in general) because it accounts for individual decisions. Explainability is required for GDPR compliance, NYC Local Law 144 compliance, and for responding to candidate inquiries about why they were not selected.

What is active learning in hiring software?

Active learning in hiring software means the AI model is continuously updated based on feedback from your team. When a recruiter or hiring manager reviews a candidate response and signals whether it is strong or weak, that feedback trains the model to better understand your organisation's specific standards for the role. Without active learning, the AI uses pre-set criteria determined by the vendor — which may not reflect what good performance looks like in your organisation.

What is the legal risk of using black-box AI in hiring?

Using black-box AI in hiring creates significant legal exposure. If your organisation cannot explain why a candidate was not selected, you cannot defend that decision under employment discrimination law. The EEOC has clarified that employers remain legally responsible for the discriminatory impact of AI tools they use, even if those tools were developed by a third party. Black-box systems also make it impossible to detect patterns of adverse impact across demographic groups before they become a liability.

What is the difference between AI transparency and AI explainability?

Transparency describes how an AI system is built — the vendor publishes documentation about their model architecture and general methodology. Explainability describes a specific decision about a specific candidate — the ability to show what evidence produced a particular score and why. Transparency tells you about the system. Explainability tells you about the decision. Both matter, but when a candidate challenges their outcome or a regulator requests documentation, it is explainability you will need.

What training data do AI hiring vendors use?

Most AI hiring vendors train their models on aggregate third-party data collected across many organisations, roles, and industries. This means the AI's definition of a "good" candidate response may reflect standards that are irrelevant or even counterproductive for your specific roles. Best practice is to use AI that is trained on your organisation's own data — either through active learning from your team's feedback, or by supplying your own benchmarks. Always ask vendors to specify the source of their training data and whether it can be customised to your organisation.

Can AI hiring tools discriminate against candidates?

Yes. AI hiring tools can produce patterns of adverse impact — systematically disadvantaging candidates from certain demographic groups — without any discriminatory intent. This typically happens when training data reflects historical hiring patterns that were themselves biased, or when the model learns to use credential or background proxies that correlate with demographic characteristics. Reputable vendors conduct regular bias audits and share the results. Any vendor unwilling to share bias audit findings should be treated with significant caution.

What does NYC Local Law 144 require for AI hiring tools?

NYC Local Law 144 requires employers using automated employment decision tools (AEDTs) to conduct annual independent bias audits, publish the results publicly, and notify candidates before the tool is used in their assessment. It applies to employers and employment agencies that use AEDTs to screen candidates for roles based in New York City. The law is widely viewed as a model for broader AI hiring regulation and is a useful compliance baseline even for organisations outside New York.

How do I know if an AI hiring vendor's scoring is biased?

Ask the vendor for their most recent bias audit results, specifically looking for adverse impact ratios across demographic groups including gender, race, and age. A reputable vendor will share this documentation. Also ask how the model was trained and whether training data was audited for historical bias before use. Finally, test the platform with a set of matched candidate responses that differ only in demographic information — any systematic difference in scores signals a bias problem.

If I cannot see the training data, is the AI explainable?

No. Genuine explainability requires access to what the model was trained to look for. If training data is undisclosed or described only in vague general terms, you cannot verify whether the grading criteria are appropriate for your roles, cannot audit for bias, and cannot defend a candidate score in a compliance review. Being able to show that a candidate's response was evaluated against specific, visible criteria is the foundation of explainability — and that is only possible when the underlying standards are accessible.

Buyer's Guide|18 Min Read

How to Evaluate AI Hiring Vendors: A Buyer's Guide to Transparent AI

The AI hiring market is growing fast and the claims are getting louder. Before you commit to any tool that uses artificial intelligence to grade, rank, or screen candidates, there are questions every buyer should be asking — and clear answers they should be demanding.

AI is now embedded in hiring tools across every category — applicant tracking, skills assessments, video interviews, resume screening, and reference checking. Each vendor promises faster decisions, less bias, and better hires. What most of them do not promise — and what most buyers fail to ask for — is transparency.

This guide is written for HR leaders, talent acquisition directors, and procurement teams evaluating AI-powered hiring software. It covers what explainability actually means, why a score alone is never enough, how training data shapes every decision the AI makes, and the legal exposure your organisation accepts when it relies on a system it cannot explain.

The questions at the end of this guide are designed to be taken directly into vendor conversations. Any vendor unwilling or unable to answer them clearly is telling you something important.

The promise and the problem with AI hiring tools

The promise is compelling. AI hiring tools can process thousands of candidate responses in the time it takes a recruiter to review ten. They can apply consistent scoring criteria at scale, reduce the variance introduced by different reviewers, and surface candidates who might be missed by keyword-matched résumé screening.

The problem is that most AI tools are designed to produce outputs, not explanations. They tell you a candidate scored 73 out of 100. They might even categorise that as "strong" or "needs development." But they rarely tell you what evidence produced that score, whose definition of good performance the model was trained on, or why a candidate who scored 58 was deemed unsuitable for your specific role.

That gap between output and explanation is where legal exposure lives, where bias hides, and where employer confidence in the tool erodes over time.

Explainability vs. transparency: they are not the same

These two terms are often used interchangeably in vendor materials. They are meaningfully different.

Transparency means you can see how a system is built. A vendor might publish a technical whitepaper, share their model architecture, or provide a general description of how their algorithm works. This is a starting point, but it is not sufficient on its own.

Explainabilitymeans you can account for a specific decision about a specific candidate. Not "our model considers communication skills, problem-solving, and attention to detail," but rather "this candidate received this score on this question because their response demonstrated these specific attributes, and here is the evidence."

Transparency describes the system. Explainability describes the decision. Both matter, but when a candidate asks why they were not selected, or a regulator asks you to justify a screening outcome, it is explainability you will need.

The score is not enough

Many AI hiring tools offer what they describe as "score explanations." They will tell you that a candidate scored highly on communication, or that they "demonstrated strong analytical reasoning." This kind of labelling feels informative but is rarely sufficient.

Consider the analogy of a student receiving an exam grade. Telling the student they scored 62% and that this was "below average" communicates an outcome. It does not communicate what they got wrong, which answers were marked down, or what a correct answer would have looked like. A student cannot improve from a number. A teacher cannot verify their marking was consistent. And an institution cannot defend that grade if the student challenges it.

The same logic applies to candidate scoring. A score with a descriptive label is an assertion. A score with the underlying evidence — the specific response, the criteria applied, and why that response was rated the way it was — is a justification. You need justifications, not assertions.

What to demand from any AI grading tool

The exact candidate response that was scored
The specific criteria applied to that response
Why the response met or did not meet those criteria
What a higher-scoring response would look like

The dataset problem: whose idea of "good" is the AI using?

Every AI model is trained on data. That data encodes a definition of what a "good" or "poor" candidate response looks like. The most important question you can ask any AI hiring vendor is: where did that definition come from?

If the model was trained on generic, third-party data — aggregated responses from across many different organisations, roles, and industries — then the AI's notion of good performance may have nothing to do with what good performance means in your organisation. A financial services firm and an early-stage tech startup may value very different things in a customer success hire. A generic model trained on aggregate data cannot know which standard applies to your role.

Worse, if the training data itself contained historical bias — skewing toward responses from candidates who were hired but later performed poorly, or from groups that were historically over-represented in your pipeline — the AI will replicate and amplify that bias at scale.

You must be able to see what data the AI was trained on, or at minimum understand clearly whether that data comes from your organisation's own hiring history or from an undisclosed third-party pool.

Active learning: why your data must drive the AI

The gold standard for AI hiring tools is a model that learns from your organisation's own standards — not someone else's. This is the principle behind active learning: the AI is continuously updated based on feedback from your team, so that its definition of a strong response reflects your specific requirements for each role.

In practice, this means that when a hiring manager or subject matter expert reviews a candidate response and marks it as strong or weak, that feedback is fed back into the model. Over time, the AI becomes calibrated to what your organisation genuinely values — not to a generic industry average.

The key consideration for buyers is whether this process is genuinely low-touch for your team. Active learning should not require your hiring managers to grade hundreds of responses manually before the AI becomes useful. A well-implemented system surfaces a small number of responses for review and uses that feedback to update the model intelligently, without creating a significant burden on your team.

If a vendor cannot clearly explain how their model is updated based on your organisation's feedback, or if the answer is that it is not — that the model is static and pre-trained — then you are not in control of what "good" means in your hiring process. The AI is.

The control question

If you are not actively teaching the AI what good performance looks like for your roles, then the AI is making those decisions independently — using standards you did not set, cannot inspect, and may not be able to justify.

If you cannot see the data, it is not explainable

Explainability is not just a feature. It is a prerequisite for responsible use of AI in hiring. And real explainability requires access to the data that drove the decision.

When an AI grades a candidate response, the explanation for that grade is only meaningful if you can trace it back to what the model was trained to look for. If the training data is proprietary, hidden, or described only in vague terms in a whitepaper, then you cannot truly verify whether the AI's grading criteria are appropriate for your role. You are taking the vendor's word for it.

This matters in several practical situations. If a candidate challenges their assessment outcome, you need to be able to show the specific evidence that informed their score. If an internal audit flags a pattern of disparate impact across demographic groups, you need to be able to examine the data to understand why. If a regulator requests documentation of your selection process, you need a paper trail that links scores to observable evidence — not to an opaque model.

The ability to show the data is not optional for organisations that take their compliance obligations seriously. Ask every vendor to show you, concretely, what an auditor would see if they examined a specific candidate's scoring record.

Black boxes and legal risk

A black-box AI system is one where inputs go in, outputs come out, and no one — not even the vendor — can fully account for what happened in between. In consumer applications, this is sometimes an acceptable trade-off for performance. In employment decisions, it is not.

The core legal risk is simple: if your organisation cannot explain why a candidate was not selected, you are vulnerable. Employment discrimination law in most jurisdictions requires that selection decisions be defensible on non-discriminatory grounds. "The algorithm said so" is not a defence. In fact, an inability to explain the basis for a decision is, in some legal interpretations, itself evidence of an inadequate process.

This risk is compounded by the fact that AI systems can produce patterns of disparate impact without any discriminatory intent. A model trained on historical data from a workforce with limited diversity may systematically score certain groups lower — not because anyone designed it to, but because the training data reflected historical inequities. If you cannot see how the model works and what data it uses, you have no way to detect or correct these patterns before they become a liability.

What a black box costs you

You cannot respond to candidate inquiries about why they were screened out
You cannot detect whether the AI is producing systematically biased outcomes
You cannot provide adequate documentation in an audit or legal proceeding
You cannot verify that the AI's grading criteria align with your actual performance standards

The regulatory landscape

Regulation of AI in hiring is accelerating globally. Organisations evaluating vendors now should be buying ahead of this curve, not scrambling to catch up.

New York City Local Law 144

NYC Local Law 144 requires employers using automated employment decision tools (AEDTs) to conduct annual bias audits, publish the results publicly, and notify candidates that AI is being used in the hiring process. It applies to employers with employees based in New York City. The law is among the most prescriptive in the world and is widely viewed as a model for broader regulation.

EU AI Act

The European Union's AI Act classifies AI tools used in employment as "high-risk" systems, subject to strict requirements around transparency, data governance, human oversight, and documentation. Organisations using high-risk AI systems are required to maintain detailed technical documentation and provide meaningful explanations of AI-assisted decisions to individuals.

EEOC guidance on AI and disparate impact

The US Equal Employment Opportunity Commission has issued guidance making clear that employers remain responsible for the discriminatory impact of AI tools they use, even if those tools were developed by a third party. Delegating selection decisions to an AI vendor does not transfer the employer's legal obligations — it transfers the risk while leaving the liability in place.

GDPR and data subject rights

Under GDPR, individuals have rights in relation to automated decision-making, including the right to obtain an explanation of decisions made solely by automated processing. For organisations operating in the EU or processing data of EU residents, this creates a direct requirement to be able to explain AI-assisted hiring decisions to candidates on request.

10 questions to ask every AI hiring vendor

Take these into every vendor demo and evaluation call. Strong vendors will answer them clearly and with supporting documentation. Evasive or vague answers are significant red flags.

1. What is the source of your training data?

You need to know whether the AI was trained on generic third-party data, on your organisation's data, or on a combination. Generic training data means the AI's standards are not your standards.

2. Can I see the training data that informs scoring for my specific assessments?

Not just a description — the actual data. If a vendor cannot show you the data, or says the data is proprietary and cannot be shared, you cannot verify that their grading criteria are appropriate for your roles.

3. How does your model update based on feedback from my organisation?

You need active learning — the AI should improve based on your team's feedback over time. If the model is static and pre-trained, it is using standards that were set without your input and will not adapt to your specific needs.

4. How much time does active learning require from my team?

Active learning should be low-touch. If keeping the AI calibrated to your standards requires significant manual effort from hiring managers, the workflow will break down in practice.

5. For a specific candidate score, can you show me the exact evidence that produced it?

Ask to see a worked example. The vendor should be able to show you the candidate's response, the criteria applied, and why the response was scored the way it was — not just a label or a general description.

6. What would an auditor see if they requested the scoring record for a rejected candidate?

This is the compliance test. The answer should be: a full record of the candidate's responses, the criteria used to grade each one, and the basis for the overall score. Anything less is inadequate for audit purposes.

7. Have you conducted a bias audit of your AI tool? Can I see the results?

Reputable vendors conduct regular audits for adverse impact across demographic groups and share those results. If a vendor has not conducted a bias audit or will not share the findings, that is a serious concern.

8. What is your process for detecting and correcting disparate impact?

Not just how they audit for it, but what they do when they find it. The answer should include a clear process for investigating causes and updating the model.

9. What documentation do you provide to support compliance with the EU AI Act, NYC Local Law 144, or EEOC guidelines?

Vendors serious about compliance will have documentation ready. Be specific about the regulations that apply to your organisation's footprint.

10. What human oversight does your platform support, and how can reviewers override AI-generated scores?

AI should support human decision-making, not replace it. A strong platform makes it easy for reviewers to see the AI's reasoning, disagree with it, and override it where necessary.

What good looks like

A genuinely transparent AI hiring tool is one where every grading decision can be traced back to observable evidence, where the training data came from your organisation and continues to be shaped by your team's feedback, and where an auditor, a regulator, or a rejected candidate can be shown exactly why a score was what it was.

This is not a futuristic standard. It is a reasonable baseline that responsible vendors should already meet. The difference between tools that meet it and tools that do not is the difference between AI that genuinely supports better hiring decisions and AI that creates the appearance of rigour without the accountability to back it up.

How Vervoe approaches this

Vervoe's AI grading is built around active learning from your organisation's own data. When your team reviews a candidate response, that feedback trains the model to understand what good performance looks like for your specific role — not a generic industry average. The training data is yours, the grading criteria are visible, and every score is backed by the candidate's actual response.

The result is a grading system you can explain to a candidate, defend in an audit, and trust to reflect your own standards — not someone else's.

See it in action How Vervoe AI scoring works