Sikov - stock.adobe.com
How to balance data access and security in fintech testing
Using real data is beneficial in software testing -- but teams must be careful not to compromise security and privacy. Six core strategies for fintech testing can help.
In fintech, real customer data provides the most powerful and realistic software testing scenarios. Yet regulations and standards -- or a company's security team -- may insist on controls or restricted permissions that make that impossible.
The security team is not wrong. The company may be obligated to keep customers' Social Security numbers, birth dates and full names private. Anyone with personally identifiable information (PII) could use it for identity theft or fraud. In the case of PII-sensitive data, tests that include live credit card numbers can facilitate fraud and abuse.
The testers aren't wrong either. The best test includes conditions actually seen in production. With live data, it's more likely the software will perform consistently across testing and production environments.
Fortunately, there are ways to balance security with excellent testing practices. Most of these strategies are intended for transactional systems -- such as those used for insurance claims processing, monthly billing and interest calculations -- but they apply to any system that uses PII, where there are concerns of using production data.
6 core strategies for fintech testing
These six strategies can help software teams balance accurate fintech testing with data security.
Use a golden master
Most systems can export and import data, at least for backup purposes. The golden master takes that idea one step further, creating a simple example test data set. That data set has known cases, such as a user with bad credit, a user with great credit, a user under 18 who's not able to legally contract and others.
With consistently known good data, the team can write static test cases, checking the same users for the same expected results on every run. The simplest option is to store the export in version control or a test data management tool. Note that in some cases the export will have dates in it, such as the dates on an insurance claim, and the program may need to update the dates on import.
Mask identifiable information
To minimize the risk of identity theft, production data masking takes information and changes aspects such as names, birthdays or Social Security numbers so that they are scrambled but still in valid form. This enables teams to perform realistic, accurate software tests while protecting sensitive data.
This still leaves the problem of access to the original, pre-transformed data. Some tools exist to automate data masking with security controls, so no tester or programmer has access to the "upstream" pre-scrambled data.
Follow the permission principle
When I worked at an insurance company, I wrote a simple code library to determine if any individual person had coverage as of the current date. The unit tests used my own personal information. It wasn't a HIPAA violation because I granted the company permission. When I left the company, somebody else took over this strategy and maintained the code.
This approach can work well, but isn't ideal when running a large numbers of tests at the same time in the same database. As such, personal IDs with permission can be something to use while you wait for synthetic users.
Test with synthetic users
With this approach, there's a code library from which testers can request a particular type of user -- based, for example, on age or credit score -- and get back a unique user ID. If every test asks for a new synthetic user, there won't be "collisions" that happen in the database when the same user is reused for tests. For example, if a test applies for a loan over and over, a synthetic user could trip a new scenario where credit is overextended.
Partner with customer service
Access to some production data is usually essential for customer service. When problems come up in production from customer service, they can provide the information to debug, isolate and fix issues in production. This is an essential aspect of testing. Formalizing this process could allow the team to access some production data some of the time.
Use high-volume automated testing
One company I worked with ran through a few thousand users every day, producing a text file that would become a mail merge. For every build, they took two golden masters of previous production data, ran the production version of the software and the new build, and compared the output. This moved their number of test cases from a few dozen over a few days to a few thousand over a couple of hours.
In regulated markets, teams might need to combine this automated testing approach with data masking or randomization. For example, an analysis of production users by age or credit score might make it possible to create a golden master of synthetic users that match production use.