Send partial row updates to Real-Time Customer Profile using Data Prep
Streaming upserts in Data Prep allows you to send partial row updates to Real-Time Customer Profile data while also creating and establishing new identity links with a single API request.
By streaming upserts, you can retain the format of your data while translating that data to Real-Time Customer Profile PATCH requests during ingestion. Based on the inputs you provide, Data Prep allows you to send a single API payload and translate the data to both Real-Time Customer Profile PATCH and Identity Service CREATE requests.
This document provides information on how to stream upserts in Data Prep.
Getting started
This overview requires a working understanding of the following components of Adobe Experience Platform:
- Data Prep: Data Prep allows data engineers to map, transform, and validate data to and from Experience Data Model (XDM).
- Identity Service: Gain a better view of individual customers and their behavior by bridging identities across devices and systems.
- Real-Time Customer Profile: Provides a unified, customer profile in real-time based on aggregated data from multiple sources.
- Sources: Experience Platform allows data to be ingested from various sources while providing you with the ability to structure, label, and enhance incoming data using Platform services.
Use streaming upserts in Data Prep streaming-upserts-in-data-prep
Streaming upserts high-level workflow
Streaming upserts in Data Prep works as follows:
-
You must first create and enable a dataset for Profile consumption. See the guide on enabling a dataset for Profile for more information.
-
If new identities must be linked, then you must also create an additional dataset with the same schema as your Profile dataset.
-
Once your dataset(s) are prepared, you must create a dataflow to map your incoming request to the Profile dataset;
-
Next, you must update the incoming request to include the necessary headers. These headers define:
- The data operation that is needed to be performed with Profile:
create
,merge
, anddelete
. - The optional identity operation to be performed with Identity Service:
create
.
- The data operation that is needed to be performed with Profile:
Configure the identity dataset
If new identities must be linked, then you must create and pass an additional dataset in the incoming payload. When creating an identity dataset, you must ensure that the following requirements are met:
- The identity dataset must have its associated schema as the Profile dataset. A mismatch of schemas may lead to inconsistent system behavior.
- However, you must ensure that the identity dataset is different from the Profile dataset. If the datasets are the same, then data will be overwritten instead of updated.
- While the initial dataset must be enabled for Profile, the identity dataset should not be enabled for Profile. Otherwise, data will also be overwritten instead of updated. However, the identity dataset should be enabled for Identity Service.
Required fields in the schemas associated with the identity dataset identity-dataset-required-fileds
If your schema contains required fields, validation of the dataset must be suppressed in order to enable Identity Service to only receive the identities. You can suppress validation by applying the disabled
value to the acp_validationContext
parameter. See the example below:
curl -X POST 'https://platform.adobe.io/data/foundation/catalog/dataSets/62257bef7a75461948ebcaaa' \
-H 'Authorization: Bearer {ACCESS_TOKEN}' \
-H 'Content-Type: application/json' \
-H 'x-api-key: {API_KEY}' \
-H 'x-gw-ims-org-id: {IMS_ORG}' \
-H 'x-sandbox-name: {SANDBOX_NAME}' \
-d '{
"tags": {
"acp_validationContext": [
"disabled"
],
"unifiedProfile": [
"enabled:false"
],
"unifiedIdentity": [
"enabled:true"
]
}
}'
Incoming payload structure
The following displays an example of an incoming payload structure that establishes new identity links.
Payload with identity configuration
{
"header": {
"flowId": "923e2ac3-3869-46ec-9e6f-7012c4e23f69",
"imsOrgId": "{ORG_ID}",
"datasetId": "621fc19ab33d941949af16c8",
"operations": {
"data": "create" (default)/"merge"/"delete",
"identity": "create",
"identityDatasetId": "621fc19ab33d941949af16d9"
}
}
... //The raw data attributes are included here as the key/value pairs of the "body" property.
}
flowId
datasetId
parameter.imsOrgId
datasetId
operations
operations.data
operations.identity
operations.identityDatasetId
Supported operations
The following operations are supported by Real-Time Customer Profile:
create
merge
delete
The following operations are supported by Identity Service:
create
create
is passed as a value for operations.identity
, then Data Prep generates an XDM entity create request for Identity Service. If the identity already exists, then the identity is ignored. Note: If operations.identity
is set to create
, then the identityDatasetId
must also be specified. The XDM entity create message generated internally by Data Prep component will be generated for this dataset id.Payload without identity configuration
If new identities do not need to be linked, then you can omit the identity
and identityDatasetId
parameters in the operations. Doing so sends data only to Real-Time Customer Profile and skips the Identity Service. See the payload below for an example:
{
"header": {
"flowId": "923e2ac3-3869-46ec-9e6f-7012c4e23f69",
"imsOrgId": "{ORG_ID}",
"datasetId": "621fc19ab33d941949af16c8",
"operations": {
"data": "create"/"merge"/"delete",
}
}
... //The raw data attributes are included here as the key/value pairs of the "body" property.
}
Dynamically pass primary identities
For XDM updates, the schema must be enabled for Profile and contain a primary identity. You can specify the primary identity of an XDM schema in two ways:
- Designate a static field as the primary identity in the XDM schema;
- Designate one of the identity fields as the primary identity through the identity map field group in the XDM schema.
Designate a static field as the primary identity field in the XDM schema
In the example below, state
, homePhone.number
and other attributes are upserted with their respective given values into the Profile with the primary identity of sampleEmail@gmail.com
. An XDM entity update message is then generated by the streaming Data Prep component. Real-Time Customer Profile then confirms that XDM update message to upsert the profile record.
curl -X POST 'https://dcs.adobedc.net/collection/9aba816d350a69c4abbd283eb5818ec3583275ffce4880ffc482be5a9d810c4b' \
-H 'Content-Type: application/json' \
-H 'x-adobe-flow-id: d5262d48-0f47-4949-be6d-795f06933527' \
-d '{
"header": {
"flowId" : "d5262d48-0f47-4949-be6d-795f06933527",
"imsOrgId": "{ORG_ID}",
"datasetId": "62259f817f62d71947929a7b",
"operations": {
"data": "create"
}
},
{
"body": {
"homeAddress": {
"country": "US",
"state": "GA",
"region": "va7"
},
"homePhone": {
"number": "123.456.799"
},
"identityMap": {
"Email": [{
"id": "sampleEmail@gmail.com",
"primary": true
}]
},
"personalEmail": {
"address": "sampleEmail@gmail.com",
"primary": true
},
"personID": "346576345",
"_id": "346576345",
"timestamp": "2021-05-05T17:51:45.1880+02",
"workEmail": "sampleWorkEmail@gmail.com"
}
}'
Designate one of the identity fields as the primary identity through the identity map field group in the XDM schema
In this example, the header contains the operations
attribute with the identity
and identityDatasetId
properties. This allows data to be merged with Real-Time Customer Profile and also for identities to be passed to Identity Service.
curl -X POST 'https://dcs.adobedc.net/collection/9aba816d350a69c4abbd283eb5818ec3583275ffce4880ffc482be5a9d810c4b' \
-H 'Content-Type: application/json' \
-H 'x-adobe-flow-id: d5262d48-0f47-4949-be6d-795f06933527' \
-d '{
"header": {
"flowId" : "d5262d48-0f47-4949-be6d-795f06933527",
"imsOrgId": "{ORG_ID}",
"datasetId": "62259f817f62d71947929a7b",
"operations": {
"data": "merge",
"identity": "create",
"identityDatasetId": "6254a93b851ecd194b64af9e"
}
},
{
"body": {
"homeAddress": {
"country": "US",
"state": "GA",
"region": "va7"
},
"homePhone": {
"number": "123.456.799"
},
"identityMap": {
"Email": [{
"id": "sampleEmail@gmail.com",
"primary": true
}]
},
"personalEmail": {
"address": "sampleEmail@gmail.com",
"primary": true
},
"personID": "346576345",
"_id": "346576345",
"timestamp": "2021-05-05T17:51:45.1880+02",
"workEmail": "sampleWorkEmail@gmail.com"
}
}'
Known limitations and key considerations
The following outlines a list of known limitations to consider when streaming upserts with Data Prep:
- The streaming upserts method should only be used when sending partial row updates to Real-Time Customer Profile. Partial row updates are not consumed by data lake.
- The streaming upserts method does not support updating, replacing, and removing identities. New identities are created if they do not exist. Hence the
identity
operation must always be set to create. If an identity already exists, the operation is a no-op. - The streaming upserts method currently does not support Adobe Experience Platform Web SDK and Adobe Experience Platform Mobile SDK.
Next steps
By reading this document, you should now understand how to stream upserts in Data Prep to send partial row updates to your Real-Time Customer Profile data, while also creating and linking identities with a single API request. For more information on other Data Prep features, please read the Data Prep overview. To learn how to use mapping sets within the Data Prep API, please read the Data Prep developer guide.