Data Cleansing in Preparation for SEPA: Part One

While Gartner defines Big Data as “high-volume, high-velocity and high-variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making” others have a more pragmatic approach to it: “I know it when I see it”. While both are helpful, it doesn’t really address the problem and solution.

It may be more useful to define big data as: “A partially-formatted, information stream too large to store permanently, merely process as it passes through?” While this seems potentially easier to use, it does adapt to the capability to store the type of processing required and how much computing power is deployable to do it. What is Big Data today may be a more tractable problem tomorrow. Today’s Big Data methods tend to restrict the window of time, triage the data or intentionally discard data if overloaded.

So is payments information Big Data? Looking at annual data from the UK’s six billion automated clearing house (ACH) payments leaves us with an estimate of around £700 per year;  easy to store on cheap, modern hard drives. If the processing is merely grouping payments by destination account, it is likely to be tractable. If, however, it is deriving useful additional information, such as spotting trends and identifying fraud networks, that may take more processing.

Because payments may be mission-critical for businesses and are crucial for consumers, ensuring the quality of the underlying information is essential and migrating to new formats adds unwelcome risk into the process. While validating information locally, performing more than superficial validation at the point where all the separate payment streams come together therefore may become a Big Data problem, more for the size of the detection and exception processing than for the size of the datafile itself. So ensuring that data is valid upfront, before being consolidated into the payments file, is vital.

Payments Cleansing and Validation Method

Ensuring data integrity in payments is about cleansing input data to give confidence in payments. Businesses and consumers rely on data that is as accurate as it can be, firstly because customers expect payment success in 100% of cases and, secondly, businesses face considerable costs when payments fail. – around €50 to €70 for rectifying each failed transaction according to the Euro Banking Association. Being unprepared for the February 2014 SEPA deadline, essentially failing to execute a data cleansing and validating process, could cost businesses dearly as they face heavy additional costs from banks for repairing faulty transactions.

Where organisations are forced to standardise data for SEPA, it is vitally important to cleanse the data source before using it to ensure accuracy and avoid payment failure. Data cleansing results in lower error rates and, in payment terms, more reliable payment services. The ideal method is to create an extract from the database, restructure it as appropriate and then perform an initial validation. Some customers have good data hygiene and this is sufficient to identify exceptions, but in many cases, with many different data entry standards in different countries, it is necessary to address the issues with the way the data are stored.

In the second case, a manual adjustment of the input data is necessary to achieve the highest levels of conversion possible, such as addressing problems with invalid bank identifiers in the Netherlands, and this results in an average rate of 50% of the errors resolved. At this stage the data is separated into good data, which can simply be converted to the international bank account number (IBAN) standard, and bad data which requires contact with the account holder to obtain correct information.

As the deadline looms, many organisations may be tempted to convert basic bank account numbers (BBAN) to IBAN using a simple, algorithmic method, but this will perpetuate the level of error inherent in banking data at the moment, so they need to validate their current data before migrating to IBAN format and becoming SEPA-compliant. A thorough validation process is crucial in order to ensure that the migrated data is not only in the correct format and inclusive of relevant content, such as bank identifier codes (BICs), but has passed an integrity check too. The most valuable and cost-effective way for businesses to migrate their data is to separate good data from bad via a validation solution and then address the bad data, which should in total cost less than €1 per account number held in the database.

It is important to identify all the conditions which may affect payments information to ensure that domestic data is correct for the purpose intended. For example, knowing up-front whether an account can support the SEPA Direct Debit Business to Business (SDD B2B) scheme will provide insight into how commercial collections can be performed.

To minimise the time it takes to validate and convert, it is most efficient to use automation where necessary, but a solution that can build in consultancy expertise to assist in fixing current data where problems occur is valuable. With properly validated data, a business’ new database has a vastly reduced error rate and so can derive value from SEPA, in terms of more efficient systems and streamlined processes as well as increased success rates and better customer service.

Challenges of Data Cleansing and why Validation is Essential

There are many reasons why an account number can be invalid for use: closure of the associated branch, merger of the bank, update of the clearing and bank codes, invalid formatting or a simple mis-transcription of any of the account numbers. It is also possible for an account to be valid, but invalid for the purpose intended, for example a business customer account currently used for a domestic direct debit may be intended to move to the SDD B2B scheme, but the owning bank does not support it.

We group these errors into three types:

  • Format – where the account information breaches the rules of the country or bank regarding length or  types of digits enclosed.
  • Content – where the bank codes contained within the account are invalid or no longer correct.
  • Integrity – where the account number breaches numeric or syntax rules for the branch.

From Experian’s analysis of customers’ data from its SEPA conversion service, the most common error is related to content, closely followed by format. Although integrity errors are around 1.5%, this is likely to be an underestimate because integrity errors may not be apparent if there is either format or content errors.

Bank account data from each country is different: the number and lengths of each field is rarely the same between two different EU member states. Germany uses a bank code and an account number of 8 and 10 digits whereas France uses four pieces of information: two bank codes and account number and a couple of check digits. The IBAN was introduced to smooth over these differences and has been important in standardising the way these numbers are presented, however the key information and therefore variances are still within it.

In addition, there are domestic practices which can further complicate the process. In Germany the Unterkontonummer – sub-account number – is a two-digit field originally intended to separate the purpose of a transaction within an account. Virtually all transactions are for the default purpose and therefore the sub-account number is set to zero and customarily omitted by German consumers. So, for example, account 7876543100 might be presented as 78765431. Simply converting to the IBAN format would pad with leading zeros and it would therefore become 0078765431, a potentially valid, but different, account.

In the Netherlands, where Experian operates the IBAN BIC service on behalf of the Dutch Banking Association (NVB), there is a different problem: there are no branch codes and therefore it is impossible to tell reliably, solely from the account number, which bank an account is with. Because of this omitted information, the Netherlands IBAN includes (as the UK does) the bank identifier such as INGB however it is then not possible to populate this without knowing which bank. Most Dutch businesses do not store this information as it has historically not been relevant for domestic clearing.

Finally, the banking system is not set in stone: banks closed branch codes periodically and therefore render the IBANs from the accounts at those branches invalid. Mergers have implications on the validity of BICs required until 2016 for cross-border payments. Simple consolidation or re-structuring of a banks payment systems can also invalidate previously valid data. It is therefore vital to check and cleanse now and not to rely on data hygiene exercises previously undertaken.


Related reading