How Cashfree Payments Reached 0 Config Errors with a Scalable TDM Framework

8 hours ago 1

Nisith Pati

Imagine this: you are running your tests, and someone (or some other test) has modified your test data. Bam! Test fails → pipeline fails → deployment stops.

Managing test data effectively is a critical challenge in large-scale systems. In industries like fintech, where companies deal with an enormous volume of data points, poorly structured test data, scattered across multiple test cases and used by various services, often leads to redundant configurations and integration test (IT) failures. Stability across environments becomes difficult to maintain when test data is manually altered, modified by automation, or applied inconsistently.

To address this, we built a centralised and rule-driven Test Data Management (TDM) framework. Result? 0 config-related failures for migrated services!

In our recent brown bag session, we discussed how TDM has significantly improved test data reliability, reduced configuration failures, and streamlined the integration testing process.

Before TDM, test data management at Cashfree Payments imposed several challenges:

  • Scattered Test Data: Test data was hardcoded in various test cases, leading to inconsistency and redundancy.
  • Frequent Configuration Failures: Manually altered configurations caused IT failures, making debugging difficult.
  • Lack of Visibility: Teams had no centralised view of which test data was being used across services.
  • Redundant Data Creation: Different teams often created duplicate test data for similar use cases, leading to inefficiencies.

To mitigate these issues, we needed a single source of truth that would standardise and manage test data efficiently.

  1. Test Data Storage: TestData is mapped to ConfigRules, linking it to specific Configs.
  2. Validation & Auto-Updates: Helm-based crons run periodically to check for rule violations and restore tampered data.
  3. Alert Mechanism: If invalid data is detected, alerts are sent to the respective service owners.
  4. Integration with ITs: Validated test data is automatically injected into integration test environments.

TDM centralises and standardises test data using:

  • Config Rules & Mapping: Each TestData entry is mapped to relevant configs via ConfigRules.
  • Helm-Based Crons: Periodic validation via cron-job scheduled through Helm, ensures each of the TestData and sticks to its corresponding ConfigRules. This not only validates but also auto-updates TestData based on the rules violated via the corresponding service APIs.
Validation & Updation Flow of TDM
  • Service Ownership Model: Each TestData is associated with multiple services that use it. Each service has an owner, and upon failure of validation/auto-update, an alert is triggered in a channel tagging the said owner and the respective team(this comes as a last-ditch effort to correct the data if at all the auto-update fails for any reason whatsoever).

TDM enforces strict integrity checks to prevent inconsistent test data:

  • Rule-Based Validation: Each TestData entry must comply with predefined rules.
  • Redundancy Prevention: Existing test data must be reused instead of creating duplicates. We prevent the addition of new data if the added ConfigRules are a subset of any test data’s ConfigRules.
  • Conflict Resolution: Conflicting test data (same configId but different ConfigRules) is automatically rejected. e.g.: A merchant with UPI_Txn inactive rule cannot be added together with UPI_Txn active rule.

TDM integrates seamlessly with our environment factory pattern, allowing test data to be injected dynamically into ITs. To use TDM, teams simply need to:

  1. Ensure variable names in the TestData table match the POJO definitions.
  2. Call a single function to retrieve validated test data.
  3. Rely on the backend script to handle all mapping and injections.

Since implementing TDM, we have observed:

  • Zero config-related IT failures in services like pgpaymentsvc and pgcards.
  • Improved visibility into which test data is used across which all services.
  • Reduced redundancy, as multiple teams now reuse existing test data instead of creating new entries.

TDM has enhanced test reliability and visibility and has reduced the risks associated with manual/automated data changes.

Moving forward, we aim to expand TDM coverage across more services to further eliminate config-related IT failures. We are also working on:

  • Enhancing UI-based access to improve the developer experience in managing test data.
  • Scaling the framework to support more complex testing scenarios, like data creation, manipulation, etc.
  • Supporting an ephemeral environment to create and inject data on demand.

Stay tuned to the Cashfree Payments Tech Blog for more insights on how we continue to innovate in this space!

Read Entire Article