This functionality is available to customers who have purchased the Real-Time CDP Prime and Ultimate package, Adobe Journey Optimizer, or Customer Journey Analytics. Contact your Adobe representative for more information.
This article explains the workflow required to use the Flow Service API to export datasets from Adobe Experience Platform to your preferred cloud storage location, such as Amazon S3, SFTP locations, or Google Cloud Storage.
TIP
You can also use the Experience Platform user interface to export datasets. Read the export datasets UI tutorial for more information.
Datasets available for exporting datasets-to-export
The datasets that you can export depend on the Experience Platform application (Real-Time CDP, Adobe Journey Optimizer), the tier (Prime or Ultimate), and any add-ons that you purchased (for example: Data Distiller).
This guide requires a working understanding of the following components of Adobe Experience Platform:
Experience Platform datasets: All data that is successfully ingested into Adobe Experience Platform is persisted within the Data Lake as datasets. A dataset is a storage and management construct for a collection of data, typically a table, that contains a schema (columns) and fields (rows). Datasets also contain metadata that describes various aspects of the data they store.
Sandboxes: Experience Platform provides virtual sandboxes which partition a single Platform instance into separate virtual environments to help develop and evolve digital experience applications.
The following sections provide additional information that you must know in order to export datasets to cloud storage destinations in Platform.
Required permissions permissions
To export datasets, you need the View Destinations, View Datasets, and Manage and Activate Dataset Destinationsaccess control permissions. Read the access control overview or contact your product administrator to obtain the required permissions.
To ensure that you have the necessary permissions to export datasets and that the destination supports exporting datasets, browse the destinations catalog. If a destination has an Activate or an Export datasets control, then you have the appropriate permissions.
Reading sample API calls reading-sample-api-calls
This tutorial provides example API calls to demonstrate how to format your requests. These include paths, required headers, and properly formatted request payloads. Sample JSON returned in API responses is also provided. For information on the conventions used in documentation for sample API calls, see the section on how to read example API calls in the Experience Platform troubleshooting guide.
Gather values for required and optional headers gather-values-headers
In order to make calls to Platform APIs, you must first complete the Experience Platform authentication tutorial. Completing the authentication tutorial provides the values for each of the required headers in all Experience Platform API calls, as shown below:
Authorization: Bearer {ACCESS_TOKEN}
x-api-key: {API_KEY}
x-gw-ims-org-id: {ORG_ID}
Resources in Experience Platform can be isolated to specific virtual sandboxes. In requests to Platform APIs, you can specify the name and ID of the sandbox that the operation will take place in. These are optional parameters.
For descriptions of the terms that you will be encountering in this API tutorial, read the glossary section of the API reference documentation.
Gather connection specs and flow specs for your desired destination gather-connection-spec-flow-spec
Before starting the workflow to export a dataset, identify the connection spec and flow spec IDs of the destination to which you are intending to export datasets to. Use the table below for reference.
Destination
Connection spec
Flow spec
Amazon S3
4fce964d-3f37-408f-9778-e597338a21ee
269ba276-16fc-47db-92b0-c1049a3c131f
Azure Blob Storage
6d6b59bf-fb58-4107-9064-4d246c0e5bb2
95bd8965-fc8a-4119-b9c3-944c2c2df6d2
Azure Data Lake Gen 2(ADLS Gen2)
be2c3209-53bc-47e7-ab25-145db8b873e1
17be2013-2549-41ce-96e7-a70363bec293
Data Landing Zone(DLZ)
10440537-2a7b-4583-ac39-ed38d4b848e8
cd2fc47e-e838-4f38-a581-8fff2f99b63a
Google Cloud Storage
c5d93acb-ea8b-4b14-8f53-02138444ae99
585c15c4-6cbf-4126-8f87-e26bff78b657
SFTP
36965a81-b1c6-401b-99f8-22508f1e6a26
354d6aad-4754-46e4-a576-1b384561c440
You need these IDs to construct various Flow Service entities. You also need to refer to parts of the Connection Spec itself to set up certain entities so you can retrieve the Connection Spec from Flow Service APIs. See the examples below of retrieving connection specs for all the destinations in the table:
Follow the steps below to set up a dataset dataflow to a cloud storage destination. For some steps, the requests and responses differ between the various cloud storage destinations. In those cases, use the tabs on the page to retrieve the requests and responses specific to the destination that you want to connect and export datasets to. Be sure to use the correct connection spec and flow spec for the destination you are configuring.
Retrieve a list of datasets retrieve-list-of-available-datasets
To retrieve a list of datasets eligible for activation, start by making an API call to the below endpoint.
Note that to retrieve eligible datasets, the connection spec ID used in the request URL must be the data lake source connection spec ID, 23598e46-f560-407b-88d5-ea6207e49db0, and the two query parameters outputField=datasets and outputType=activationDatasets must be specified. All other query parameters are the standard ones supported by the Catalog Service API.
A successful response contains a list of datasets eligible for activation. These datasets can be used when constructing the source connection in the next step.
Create a source connection create-source-connection
After retrieving the list of datasets that you want to export, you can create a source connection using those dataset IDs.
Request
Create source connection - Request
Note the highlighted lines with inline comments in the request example, which provide additional information. Remove the inline comments in the request when copy-pasting the request into your terminal of choice.
A successful response returns the ID (id) of the newly created source connection and an etag. Note down the source connection ID as you will need it later when creating the dataflow.
Please also remember that:
The source connection created in this step needs to be linked to a dataflow for its datasets to be activated to a destination. See the create a dataflow section for information on how to link a source connection to a dataflow.
The dataset IDs of a source connection cannot be modified after creation. If you need to add or remove datasets from a source connection, you must create a new source connection and link the ID of the new source connection to the dataflow.
Create a (target) base connection create-base-connection
A base connection securely stores the credentials to your destination. Depending on the destination type, the credentials needed to authenticate against that destination can vary. To find these authentication parameters, first retrieve the connection spec for your desired destination as described in the section Gather connection specs and flow specs and then look at the authSpec of the response. Reference the tabs below for the authSpec properties of all supported destinations.
Amazon S3
accordion
Amazon S3 - Connection spec showing auth spec
Note the highlighted line with inline comments in the connection spec example below, which provide additional information about where to find the authentication parameters in the connection spec.
Note the highlighted line with inline comments in the connection spec example below, which provide additional information about where to find the authentication parameters in the connection spec.
Azure Data Lake Gen 2(ADLS Gen2) - Connection spec showing auth spec
Note the highlighted line with inline comments in the connection spec example below, which provide additional information about where to find the authentication parameters in the connection spec.
Google Cloud Storage - Connection spec showing auth spec
Note the highlighted line with inline comments in the connection spec example below, which provide additional information about where to find the authentication parameters in the connection spec.
{
"items": [
{
"id": "c5d93acb-ea8b-4b14-8f53-02138444ae99",
"name": "Google Cloud Storage",
"providerId": "14e34fac-d307-11e9-bb65-2a2ae2dbcce4",
"version": "1.0",
"authSpec": [ // describes the authentication parameters
{
"name": "Google Cloud Storage authentication credentials",
"type": "GoogleCloudStorageAuth",
"spec": {
"$schema": "http://json-schema.org/draft-07/schema#",
"description": "defines auth params required for connecting to google cloud storage connector.",
"type": "object",
"properties": {
"accessKeyId": {
"description": "Access Key Id for the user account",
"type": "string"
},
"secretAccessKey": {
"description": "Secret Access Key for the user account",
"type": "string",
"format": "password"
}
},
"required": [
"accessKeyId",
"secretAccessKey"
]
}
}
],
//...
SFTP
accordion
SFTP - Connection spec showing auth spec
note note
NOTE
The SFTP destination contains two separate items in the auth spec, as it supports both password and SSH key authentication.
Note the highlighted line with inline comments in the connection spec example below, which provide additional information about where to find the authentication parameters in the connection spec.
Using the properties specified in the authentication spec (i.e. authSpec from the response) you can create a base connection with the required credentials, specific to each destination type, as shown in the examples below:
Amazon S3
Request
accordion
Amazon S3 - Base connection request
note tip
TIP
For information on how to obtain the required authentication credentials, refer to the authenticate to destination section of the Amazon S3 destination documentation page.
Note the highlighted lines with inline comments in the request example, which provide additional information. Remove the inline comments in the request when copy-pasting the request into your terminal of choice.
For information on how to obtain the required authentication credentials, refer to the authenticate to destination section of the Azure Blob Storage destination documentation page.
Note the highlighted lines with inline comments in the request example, which provide additional information. Remove the inline comments in the request when copy-pasting the request into your terminal of choice.
Azure Data Lake Gen 2(ADLS Gen2) - Base connection request
note tip
TIP
For information on how to obtain the required authentication credentials, refer to the authenticate to destination section of the Azure Data Lake Gen 2(ADLS Gen2) destination documentation page.
Note the highlighted lines with inline comments in the request example, which provide additional information. Remove the inline comments in the request when copy-pasting the request into your terminal of choice.
No authentication credentials are required for the Data Landing Zone destination. For more information, refer to the authenticate to destination section of the Data Landing Zone destination documentation page.
For information on how to obtain the required authentication credentials, refer to the authenticate to destination section of the Google Cloud Storage destination documentation page.
Note the highlighted lines with inline comments in the request example, which provide additional information. Remove the inline comments in the request when copy-pasting the request into your terminal of choice.
For information on how to obtain the required authentication credentials, refer to the authenticate to destination section of the SFTP destination documentation page.
Note the highlighted lines with inline comments in the request example, which provide additional information. Remove the inline comments in the request when copy-pasting the request into your terminal of choice.
For information on how to obtain the required authentication credentials, refer to the authenticate to destination section of the SFTP destination documentation page.
Note the highlighted lines with inline comments in the request example, which provide additional information. Remove the inline comments in the request when copy-pasting the request into your terminal of choice.
Note the connection ID from the response. This ID will be required in the next step when creating the target connection.
Create a target connection create-target-connection
Next, you need to create a target connection which stores the export parameters for your datasets. Export parameters include location, file format, compression, and other details. Refer to the targetSpec properties provided in the destination’s connection spec to understand the supported properties for each destination type. Reference the tabs below for the targetSpec properties of all supported destinations.
WARNING
Exports to JSON files are supported in a compressed mode only. Exports to Parquet files are supported in a compressed and uncompressed mode.
Note the highlighted lines with inline comments in the connection spec example below, which provide additional information about where to find the target spec parameters in the connection spec. You can see also in the example below which target parameters are not applicable to dataset export destinations.
Note the highlighted lines with inline comments in the connection spec example below, which provide additional information about where to find the target spec parameters in the connection spec. You can see also in the example below which target parameters are not applicable to dataset export destinations.
Azure Data Lake Gen 2(ADLS Gen2) - Connection spec showing target connection parameters
Note the highlighted lines with inline comments in the connection spec example below, which provide additional information about where to find the target spec parameters in the connection spec. You can see also in the example below which target parameters are not applicable to dataset export destinations.
{
"items": [
{
"id": "be2c3209-53bc-47e7-ab25-145db8b873e1",
"name": "Azure Data Lake Gen2",
"providerId": "14e34fac-d307-11e9-bb65-2a2ae2dbcce4",
"version": "1.0",
"authSpec": [...],
"encryptionSpecs": [...],
"targetSpec": { // describes the target connection parameters
"name": "User based target",
"type": "UserNamespace",
"spec": {
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
"properties": {
"path": {
"title": "Folder path",
"description": "Enter the path to your Azure Data Lake Storage folder",
"type": "string"
},
"fileType": {...}, // not applicable to dataset destinations
"datasetFileType": {
"conditional": {
"field": "flowSpec.attributes._workflow",
"operator": "CONTAINS",
"value": "DATASETS"
},
"title": "File Type",
"description": "Select file format",
"type": "string",
"enum": [
"JSON",
"PARQUET"
]
},
"csvOptions":{...}, // not applicable to dataset destinations
"compression": {
"title": "Compression format",
"description": "Select the desired file compression format.",
"type": "string",
"enum": [
"NONE",
"GZIP"
]
}
},
"required": [
"path",
"datasetFileType",
"compression",
"fileType"
]
}
//...
Data Landing Zone(DLZ)
accordion
Data Landing Zone(DLZ) - Connection spec showing target connection parameters
Note the highlighted lines with inline comments in the connection spec example below, which provide additional information about where to find the target spec parameters in the connection spec. You can see also in the example below which target parameters are not applicable to dataset export destinations.
Google Cloud Storage - Connection spec showing target connection parameters
Note the highlighted lines with inline comments in the connection spec example below, which provide additional information about where to find the target spec parameters in the connection spec. You can see also in the example below which target parameters are not applicable to dataset export destinations.
Note the highlighted lines with inline comments in the connection spec example below, which provide additional information about where to find the target spec parameters in the connection spec. You can see also in the example below which target parameters are not applicable to dataset export destinations.
By using the above spec, you can construct a target connection request specific to your desired cloud storage destination, as shown in the tabs below.
Amazon S3
Request
accordion
Amazon S3 - Target connection request
note tip
TIP
For information on how to obtain the required target parameters, refer to the fill in destination details section of the Amazon S3 destination documentation page. For other supported values of datasetFileType, see the API reference documentation.
Note the highlighted lines with inline comments in the request example, which provide additional information. Remove the inline comments in the request when copy-pasting the request into your terminal of choice.
For information on how to obtain the required target parameters, refer to the fill in destination details section of the Azure Blob Storage destination documentation page. For other supported values of datasetFileType, see the API reference documentation.
Note the highlighted lines with inline comments in the request example, which provide additional information. Remove the inline comments in the request when copy-pasting the request into your terminal of choice.
For information on how to obtain the required target parameters, refer to the fill in destination details section of the Azure Data Lake Gen 2(ADLS Gen2) destination documentation page. For other supported values of datasetFileType, see the API reference documentation.
Note the highlighted lines with inline comments in the request example, which provide additional information. Remove the inline comments in the request when copy-pasting the request into your terminal of choice.
For information on how to obtain the required target parameters, refer to the fill in destination details section of the Data Landing Zone destination documentation page. For other supported values of datasetFileType, see the API reference documentation.
Note the highlighted lines with inline comments in the request example, which provide additional information. Remove the inline comments in the request when copy-pasting the request into your terminal of choice.
For information on how to obtain the required target parameters, refer to the fill in destination details section of the Google Cloud Storage destination documentation page. For other supported values of datasetFileType, see the API reference documentation.
Note the highlighted lines with inline comments in the request example, which provide additional information. Remove the inline comments in the request when copy-pasting the request into your terminal of choice.
For information on how to obtain the required target parameters, refer to the fill in destination details section of the SFTP destination documentation page. For other supported values of datasetFileType, see the API reference documentation.
Note the highlighted lines with inline comments in the request example, which provide additional information. Remove the inline comments in the request when copy-pasting the request into your terminal of choice.
Note the Target Connection ID from the response. This ID will be required in the next step when creating the dataflow to export datasets.
Create a dataflow create-dataflow
The final step in the destination configuration is to set up a dataflow. A dataflow ties together previously created entities and also provides options for configuring the dataset export schedule. To create the dataflow, use the payloads below, depending on your desired cloud storage destination, and replace the entity IDs from previous steps.
Amazon S3
Request
accordion
Create dataset dataflow to Amazon S3 destination - Request
Note the highlighted lines with inline comments in the request example, which provide additional information. Remove the inline comments in the request when copy-pasting the request into your terminal of choice.
Create dataset dataflow to Azure Blob Storage destination - Request
Note the highlighted lines with inline comments in the request example, which provide additional information. Remove the inline comments in the request when copy-pasting the request into your terminal of choice.
Create dataset dataflow to Azure Data Lake Gen 2(ADLS Gen2) destination - Request
Note the highlighted lines with inline comments in the request example, which provide additional information. Remove the inline comments in the request when copy-pasting the request into your terminal of choice.
Create dataset dataflow to Data Landing Zone destination - Request
Note the highlighted lines with inline comments in the request example, which provide additional information. Remove the inline comments in the request when copy-pasting the request into your terminal of choice.
Create dataset dataflow to Google Cloud Storage destination - Request
Note the highlighted lines with inline comments in the request example, which provide additional information. Remove the inline comments in the request when copy-pasting the request into your terminal of choice.
Create dataset dataflow to SFTP destination - Request
Note the highlighted lines with inline comments in the request example, which provide additional information. Remove the inline comments in the request when copy-pasting the request into your terminal of choice.
Note the Dataflow ID from the response. This ID will be required in the next step when retrieving the dataflow runs to validate the successful dateset exports.
Get the dataflow runs get-dataflow-runs
To check the executions of a dataflow, use the Dataflow Runs API:
Request
Get dataflow runs - Request
In the request to retrieve dataflow runs, add as query parameter the dataflow ID that you obtained in the previous step, when creating the dataflow.
When exporting datasets, Experience Platform creates a .json or .parquet file in the storage location that you provided. Expect a new file to be deposited in your storage location according to the export schedule you provided when creating a dataflow.
Experience Platform creates a folder structure in the storage location you specified, where it deposits the exported dataset files. A new folder is created for each export time, following the pattern below:
The default file name is randomly generated and ensures that exported file names are unique.
Sample dataset files sample-files
The presence of these files in your storage location is confirmation of a successful export. To understand how the exported files are structured, you can download a sample .parquet file or .json file.
Note the difference in file format between the two file types, when compressed:
When exporting compressed JSON files, the exported file format is json.gz
When exporting compressed parquet files, the exported file format is gz.parquet
API error handling api-error-handling
The API endpoints in this tutorial follow the general Experience Platform API error message principles. Refer to API status codes and request header errors in the Platform troubleshooting guide for more information on interpreting error responses.
Next steps next-steps
By following this tutorial, you have successfully connected Platform to one of your preferred batch cloud storage destinations and set up a dataflow to the respective destination to export datasets. See the following pages for more details, such as how to edit existing dataflows using the Flow Service API: