Glossary · Document fraud & KYC

Document fraud & KYC glossary

Q: What is Document fraud detection?

The process of analysing a document file - PDF, image, or scan - for evidence of editing, forgery, or AI generation.

Q: What is Tamper check?

A forensic analysis that determines whether a document has been edited, forged, or AI-generated.

Q: What is AI KYC?

The use of artificial intelligence to automate Know Your Customer verification - reading documents, matching faces, and detecting forgery.

Q: What is Deepfake document?

A document - typically an ID, payslip, or bank statement - fabricated by a generative AI model rather than edited from a real source.

Q: What is Error Level Analysis (ELA)?

A forensic technique that detects edited regions in an image by recompressing it and mapping the differences.

Q: What is MRZ (Machine Readable Zone)?

The two- or three-line string of OCR-readable characters at the bottom of a passport or ID, encoding the document's key fields with checksums.

Q: What is Synthetic identity fraud?

Fraud committed using a fabricated identity that doesn't correspond to any real person - typically built from a mix of real and fake personal data.

Q: What is Document tampering?

Any unauthorised modification of a document - editing fields, inserting transactions, swapping photos, or altering metadata.

Plain-English definitions of the terms that come up in document fraud detection, AI KYC, and forensic verification. Each entry includes a one-sentence definition and a longer explanation. Linked entries show where each concept fits in a real workflow.

Document fraud detection

Document fraud detection is the process of analysing a document file - PDF, image, or scan - for evidence of editing, forgery, or AI generation.

Document fraud detection combines structural analysis (PDF object streams, font tables, metadata), pixel-level forensics (Error Level Analysis, noise residue mapping), and AI-generation detection to identify documents that have been tampered with or fabricated. Modern detection systems run 100+ checks per document and return a verdict in under a minute.

Tamper check

Tamper check is a forensic analysis that determines whether a document has been edited, forged, or AI-generated.

A complete tamper check inspects the file's internal structure, the pixel-level properties of any images, and the logical consistency of the content (arithmetic, dates, field relationships). Visual inspection alone catches less than 20% of sophisticated edits; a forensic tamper check catches 90%+.

RelatedRun a tamper check

AI KYC

AI KYC is the use of artificial intelligence to automate Know Your Customer verification - reading documents, matching faces, and detecting forgery.

Modern AI KYC stacks combine OCR, face matching, liveness detection, and document forensics. The biggest gap in most AI KYC pipelines is document forensics, which is where deepfake and AI-generated IDs slip through. Forensic layers like TamperCheck sit alongside traditional KYC providers to close this gap.

RelatedAI KYC solution·KYC use case

Deepfake document

Deepfake document is a document - typically an ID, payslip, or bank statement - fabricated by a generative AI model rather than edited from a real source.

Deepfake documents pass visual inspection because every field is plausibly formatted and the layout matches real templates. They're detected forensically: generative AI models leave spectral signatures, characteristic noise patterns, and compression artifacts that real scanners and printers don't produce.

Error Level Analysis (ELA)

Error Level Analysis (ELA) is a forensic technique that detects edited regions in an image by recompressing it and mapping the differences.

When a JPEG is saved repeatedly, each compression introduces predictable error patterns. Edited regions have different compression histories than the original, so ELA highlights them as bright spots in the analysis map. ELA is one of the foundational checks in pixel-level document forensics.

MRZ (Machine Readable Zone)

MRZ (Machine Readable Zone) is the two- or three-line string of OCR-readable characters at the bottom of a passport or ID, encoding the document's key fields with checksums.

MRZ lines follow ICAO 9303 format and include the holder's name, document number, nationality, date of birth, sex, expiry date, and check digits. Forensic verification confirms that the MRZ characters are valid, the checksums match, and the encoded fields agree with the visible portions of the document.

Synthetic identity fraud

Synthetic identity fraud is fraud committed using a fabricated identity that doesn't correspond to any real person - typically built from a mix of real and fake personal data.

Synthetic identities are common in credit fraud: a fraudster combines a real SSN (often belonging to a child or deceased person) with a fabricated name, address, and DOB to build credit. Document fraud detection plays a role by catching the supporting documents (IDs, payslips, utility bills) that synthetic-identity applications submit.

Document tampering

Document tampering is any unauthorised modification of a document - editing fields, inserting transactions, swapping photos, or altering metadata.

Document tampering ranges from crude edits (Photoshop on a JPG) to sophisticated structural edits (modifying PDF objects directly) to fully synthetic generation. Tampering detection identifies edits across all of these layers, often catching changes that are invisible to the human eye.

RelatedAutomated tampering detection

Automated document verification

Automated document verification is software-driven verification of document authenticity, identity matching, and data extraction, with no human reviewer in the happy path.

Automated verification typically combines OCR (read the document), data validation (do the fields make sense), forensic analysis (is the document authentic), and policy logic (does this document satisfy the workflow's requirements). Verdicts return in under a minute, and human reviewers only see flagged cases.

Forensic document analysis

Forensic document analysis is the application of scientific techniques to determine the authenticity, origin, and editing history of a document.

Forensic document analysis predates computers - examiners historically used microscopes, ink analysis, and handwriting comparison. Modern digital forensic analysis inspects PDF internals, font metrics, pixel-level signals, and metadata, and increasingly uses ML models trained on examples of known forgeries.

Liveness detection

Liveness detection is a check that confirms the person submitting a selfie or video is physically present, not a photo, video replay, or deepfake.

Liveness detection is a face-matching adjacent check - it's about the person, not the document. Document forensics (TamperCheck's focus) is the parallel check on the documents themselves. A complete KYC stack uses both: liveness on the selfie, forensics on the ID.

PDF producer metadata

PDF producer metadata is metadata fields embedded in every PDF identifying the software that created or last modified it.

Genuine documents from a known issuer (a bank, a payroll provider, a government agency) have consistent producer metadata. Tampered documents often show traces of editing tools - Adobe Acrobat, online PDF editors, LibreOffice - that the issuer would not use. This is one of the strongest single tamper signals.

Running balance arithmetic

Running balance arithmetic is on a bank statement, the running balance is the cumulative total after each transaction. It must reconcile across every row.

Forensic verification of a bank statement includes confirming that every transaction's running balance equals the previous balance plus or minus the transaction amount. A single edited number breaks the chain - one of the easiest sophisticated edits to detect automatically.

JPEG ghost

JPEG ghost is a pixel-level forensic technique that reveals regions of an image that were saved at a different JPEG quality than the rest.

When an image is edited and re-saved, the edited region carries the compression history of its source, which often differs from the rest of the image. JPEG ghost analysis recompresses the image at various quality levels and highlights the regions whose compression history doesn't match.

Photo zone substitution

Photo zone substitution is an identity document attack where the original portrait has been replaced with a different person's photo.

Photo zone substitution is detected via boundary contrast analysis (the edge of the photo zone is unusually sharp or unusually blurred), compression mismatch (the photo region has a different JPEG signature than the rest of the document), and sharpness gradients (the photo is sharper or softer than the surrounding card).

Document verification API

Document verification API is a REST API that accepts a document upload and returns a structured verdict - authentic, tampered, or inconclusive - typically with findings and a risk score.

A document verification API is the integration shape most B2B teams use to add forensic checks to their workflows. One POST per document, one structured response per verdict. Async variants use webhooks for batch processing.

RelatedTamperCheck API docs

Risk score

Risk score is a calibrated 0–100 numeric output of a document verification system, where lower means more likely authentic and higher means more likely tampered.

Risk scores let workflows make policy decisions: auto-approve below a threshold, auto-flag above one, and route the ambiguous middle to a human reviewer. Good risk scores are calibrated per document type because the distribution of edits differs between, say, bank statements and passports.

Zero document storage

Zero document storage is a processing model in which documents are analysed and immediately discarded, with no copy retained on the verification provider's infrastructure.

Zero storage is a compliance posture - it removes the verification provider from the data retention map entirely. This is increasingly required by KYC, lending, and insurance compliance regimes and is a core design principle in modern document forensics platforms.

OCR (Optical Character Recognition)

OCR (Optical Character Recognition) is the process of extracting text from an image or PDF.

OCR is necessary but not sufficient for document verification - it tells you what the document says, not whether the document is authentic. Forensic verification complements OCR by checking whether the text the OCR extracted has been edited, fabricated, or generated by AI.

Document deepfake

Document deepfake is a fully synthetic document produced by a generative AI model from a text prompt or template.

Document deepfakes differ from edited documents because there's no original to compare against - the entire file is fabricated. Detection relies on spectral signatures and noise patterns characteristic of generative models, which differ from the patterns produced by real scanners and printers.

Income verification

Income verification is the process of confirming an applicant's stated income, typically using payslips, bank statements, or tax documents.

Income verification is the primary fraud surface in personal lending, mortgages, and rental applications. Forensic document analysis is the modern complement to traditional income verification - confirming not just the figures but that the documents themselves are authentic.

KYC (Know Your Customer)

KYC (Know Your Customer) is regulatory and operational process of verifying the identity of a customer at onboarding and during the customer lifecycle.

KYC requirements vary by jurisdiction but typically include identity verification, address verification, and ongoing monitoring for sanctions and PEP status. Document forensics is a key layer in modern KYC - catching forged or AI-generated identity documents that pass template-based checks.

AML (Anti-Money Laundering)

AML (Anti-Money Laundering) is regulatory framework requiring financial institutions to detect, prevent, and report money laundering.

AML and KYC are intertwined - KYC is the identification step, AML the ongoing monitoring. Document fraud detection supports AML compliance by catching the falsified documents (bank statements, source-of-funds letters, corporate registrations) that money launderers submit.

Liveness vs document forensics

Liveness vs document forensics is liveness confirms the person is real; document forensics confirms the document is real. They solve different halves of the same problem.

A complete identity verification stack runs liveness on the selfie or video and forensic checks on the submitted documents. Either one alone leaves a major attack surface open - a real person can submit forged documents, and a real document can be paired with a deepfake selfie.

BNPL fraud (Buy Now, Pay Later fraud)

BNPL fraud (Buy Now, Pay Later fraud) is fraud committed against Buy Now, Pay Later providers using synthetic identities or forged income documents.

BNPL is a high-volume, fast-decision credit product, which makes it a prime target for synthetic-identity and document fraud. Most BNPL fraud loss traces to forged or AI-generated payslips, bank statements, and identity documents - detectable via document forensics at the application step.