Validation and Transformation Webhooks

In this section you will learn how to provide us a hosted validation webhook to validate incoming data against for an Osmos Uploader and/or Pipeline.

Validation Webhooks provide support for running arbitrary per-field validation logic on data before it is written to destination connectors. This provides the ability to prevent bad data from being written to your systems, adding another layer of protection to the built-in validations that Osmos provides. They are compatible with both Pipelines as well as the Uploader, and they can work with data from any source connector.

By using validation webhooks, any desired validation logic can be added to any destination connector. This includes things like querying internal databases, making requests to private/internal APIs, performing conditional checks based off of values from multiple fields, and much more.

Validation webhooks are called both during the data cleanup process and during transformation - just before data is written to the destination connector. While the user is mapping columns, applying QuickFixes, and performing other data cleanup actions, validation webhooks will be called to make sure that the output of the configured transforms are valid for all rows.

Writeback provides the ability to programmatically "write back" data to individual fields, working as a part of Validation Webhooks when a validation endpoint has been configured. After writeback, no additional validation is performed on values that are returned from the validation endpoint; they are treated as the final source of truth since they come directly from the customer's own API. If there are type validation errors or other validation errors for that cell, writeback will override them.

Configuring

Validation webhooks are set up during the Connector configuration process while creating a new Destination Connector. Click the "Show Advanced Options" button at the bottom of the Connector configurator UI.

To initiate the validation webhook, provide the HTTP endpoint at which your webhook is available. Your validation webhook must be publicly available on the internet. If your endpoint is behind a firewall or other restrictive system, please contact Osmos support for help with whitelisting IPs.

There are two additional configuration options available when adding the webhook URL, the batch size and the max parallel requests. Note: These fields are only available once you enter the URL.

Validation Webhook Batch Size

Adds support for the user to set the max rows per request. In this field, the maximum number of rows entered will be sent to the provided validation endpoint in each request. The field defaults to 10,000 rows.

If maxRequestSizeBytes is set, then the request will either include the batchSize number of records or be maxRequestSizeBytes bytes long, whichever limit is encountered first.

Validation Webhook Max Parallel Requests

Adds support for running multiple validation webhook requests in parallel. This means that multiple chunks of rows to validate will be sent to the customer API simultaneously. In this field, populate the maximum number of concurrent requests that will be sent to the provided validation endpoint. The field defaults to 1,000 request at a time.

Validation Webhook Max Request Size in Bytes

Adds support for setting a maximum request size for individual validation webhook requests. This is useful if you occasionally have very large pieces of data that cause your webhook to return a 413. If left unset, there is effectively no size limit. The request will be limited by either the batch size or request size, whichever limit is hit first. If your data is larger than your minimum request size, a single record will be sent in each request. The calculation is a best-effort and will right-size requests based on a number of factors. It is best to set the limit below your actual threshold to avoid accidentally exceeding the limit.

API Specification

Validation Webhooks use a simple JSON-based schema for providing data for validation and receiving validation outcomes. Data is provided in batches of up to 100,000 rows and sent to the endpoint as a HTTP POST request.

Request Payload Schema

The request body is a two-dimensional JSON-encoded array of data to validate: a top level array of rows, each of which contains an array of objects containing field name and field value.

An example request consisting of four rows with two fields may look like this:

[
  [
    { "fieldName": "color", "value": "green" },
    { "fieldName": "shape", "value": "square" }
  ],
  [
    { "fieldName": "color", "value": "yellow" },
    { "fieldName": "shape", "value": "square" }
  ],
  [
    { "fieldName": "color", "value": "blue" },
    { "fieldName": "shape", "value": "circle" }
  ],
    [
    { "fieldName": "color", "value": "red" },
    { "fieldName": "shape", "value": "three sided polygon" }
  ]
]

Response Schema

Your validation webhook is expected to return a two-dimensional JSON array of validation outcomes, matching the shape of the request body. It should consist of a top level array of rows, each of which contains an array of validation outcomes for each field in that row. The ordering of rows and fields should match that of the request.

There are four possible outcomes for each field:

  • Success: the field is valid and can be written to the destination system.

  • Warning: the field isn't invalid and won't be blocked from being written to the destination system. However, a message will be displayed to the user during the data cleanup process to indicate the situation.

  • Error: the field is invalid and will be rejected from being written to the destination system. An error message will be shown during the training process and the user will be blocked from saving the transformation until the error is resolved. During transformation, the record will be marked as an error and the underlying pipeline/uploader will need to be retrained.

  • Writeback: the specified replacement value will be written back into the cell, overwriting what was there previously.

For the error and warning cases, a message can optionally be provided to aid the user performing the cleanup by explaining the reason the field is invalid or providing some extra context. Additionally, the error case may include array strings that are valid for that field. When included, these values will be shown to the user in a dropdown menu on the transformation builder.

The following TypeScript types represent the options for a response from a validation webhook regarding a value within a field:

type FieldValidationOutput =
    | boolean
    | {
            isValid: boolean;
            errorMessage?: string;
            warningMessage?: string;
            validOptions?: string[];
      }
    | { 
            replacement: string;
            infoMessage?: string;
      };
    
type RowValidationOutput = FieldValidationOutput[];

type ValidationResponse = RowValidationOutput[];

Boolean values can be provided as validation output for values within a field, with true corresponding to valid and false corresponding to invalid. However, it is recommended that you provide error or warning messages in order to help users know how to resolve the validation failure.

Writeback output is mutually exclusive with validation output, i.e. one can't both provide a writeback value via replacement: and mark a field as invalid. If a writeback value is returned, it is assumed that the provided replacement value is valid.

Here is a possible response to the request made earlier in the article:

[
  [
    {
      "isValid": true,
      "warningMessage": "The color green will not be supported in the future"
    },
    { "isValid": true }
  ],
  [
    true,
    true
  ],
  [
    { 
      "isValid": false,
      "errorMessage": "All circles must be red" 
    },
    { "isValid": true }
  ],
  [
    { "isValid": true },
    { 
      "replacement": "triangle",
      "infoMessage": "The submitted value was replaced" 
    }
  ]
]

You can see in this example, all six fields from the request have been validated. The number and ordering of rows matches that of the request, along with the number and ordering of fields within those rows.

Invalid Response Handling

In the case of a validation endpoint returning a non-200 response code, being unreachable, or failing for some other reason, the validation request to the endpoint will be retried up to 5 times with a 15 second timeout window. Osmos also leverages backoff and jitter between retries. All error types will be retried.

If the issue persists after several attempts over the course of the ~100 second window, the default behavior is to reject all records being validated as invalid. Details about the error that was encountered will be included in error records which can be viewed on the connector details page of the destination connector or on the retrain page for the Pipeline or Uploader.

For cases where the lengths of the returned validation array are not equal to the number of elements provided in the request array, the behavior is undefined. Please make sure that you return exactly one validation outcome for every field in the request.

User Experience

The examples of validations and writeback listed earlier in the article can be seen from a user's perspective here:

Limitations

Validation Webhooks work with all destination connectors except the "Call an Osmos API HTTP" Connector.

Last updated