Documentation Experience Platform Data Ingestion Guide

Ingest data into Adobe Experience Platform

Last update: Wed Jun 14 2023 00:00:00 GMT+0000 (Coordinated Universal Time)

Topics:
Data Ingestion

CREATED FOR:

Developer
User
Admin
Leader

Adobe Experience Platform allows you to easily import data into Platform as batch files. Examples of data to be ingested may include profile data from a flat file in a CRM system (such as a Parquet file) or data that conforms to a known Experience Data Model (XDM) schema in the Schema Registry.

Getting started

In order to complete this tutorial, you must have access to Experience Platform. If you do not have access to an organization in Experience Platform, please speak to your system administrator before proceeding.

If you would prefer to ingest data using Data Ingestion APIs please begin by reading the Batch Ingestion developer guide.

Datasets workspace

The Datasets workspace within Experience Platform allows you to view and manage all of the datasets that your organization has made, as well as create new ones.

View the Datasets workspace by clicking Datasets in the left-hand navigation. The Datasets workspace contains a list of datasets, including columns showing name, created (date and time), source, schema, and last batch status, as well as the date and time the dataset was last updated.

NOTE

Click on the filter icon next to the Search bar to use filtering capabilities to view only those datasets enabled for Profile.

View all datasets

Create a dataset

To create a dataset, click Create Dataset in the top right corner of the Datasets workspace.

On the Create Dataset screen, select whether you would like to “Create Dataset from Schema” or “Create Dataset from CSV File”.

For this tutorial, a schema will be used to create the dataset. Click Create Dataset from Schema to continue.

Select data source

Select dataset schema

On the Select Schema screen, choose a schema by clicking the radio button beside the schema you wish to use. For this tutorial, the dataset will be made using the Loyalty Members schema. Using the search bar to filter schemas is a helpful way to find the exact schema you are looking for.

Once you have selected the radio button next to the schema you wish to use, click Next.

Select schema

Configure dataset

On the Configure Dataset screen, you will be required to give your dataset a name and may also provide a description of the dataset as well.

Notes on Dataset Names:

Dataset names should be short and descriptive so that the dataset can be easily found in the library later.
Dataset names must be unique, meaning it should also be specific enough that it will not be reused in the future.
It is best practice to provide additional information about the dataset using the description field, as it may help other users differentiate between datasets in the future.

Once the dataset has a name and description, click Finish.

Configure dataset

Dataset activity

An empty dataset has now been created and you have been returned to the Dataset Activity tab in the Datasets workspace. You should see the name of the dataset in the top-left corner of the workspace, along with a notification that “No batches have been added.” This is to be expected since you have not added any batches to this dataset yet.

On the right-hand side of the Datasets workspace you will see the Info tab containing information related to your new dataset such as dataset ID, name, description, table name, schema, streaming, and source. The Info tab also includes information about when the dataset was created and its last modified date.

Also in the Info tab is a Profile toggle that is used for enabling your dataset for use with Real-Time Customer Profile. Use of this toggle, and Real-Time Customer Profile, will be explained in more detail in the section that follows.

Dataset activity

Enable dataset for Real-Time Customer Profile

Datasets are used for ingesting data into Experience Platform, and that data is ultimately used to identify individuals and stitch together information coming from multiple sources. That stitched together information is called a Real-Time Customer Profile. In order for Platform to know which information should be included in the Real-Time Profile, datasets can be marked for inclusion using the Profile toggle.

By default, this toggle is off. If you choose to toggle on Profile, all data ingested into the dataset will be used to help identify an individual and stitch together their Real-Time Profile.

To learn more about Real-Time Customer Profile and working with identities, please review the Identity Service documentation.

To enable the dataset for Real-Time Customer Profile, click the Profile toggle in the Info tab.

Profile toggle

A dialog will appear asking you to confirm that you want to enable the dataset for Real-Time Customer Profile.

Enable Profile dialog

Click Enable and the toggle will turn blue, indicating it is on.

Enabled for Profile

Add data to dataset

Data can be added into a dataset in a number of different ways. You could choose to use Data Ingestion APIs or an ETL partner such as Unifi or Informatica. For this tutorial, data will be added to the dataset using the Add Data tab within the UI.

To begin adding data to the dataset, click on the Add Data tab. You can now drag and drop files or browse your computer for the files you wish to add.

NOTE

Platform supports two files types for data ingestion, Parquet or JSON. You may add up to five files at a time, with the maximum file size of each file being 1 GB.

Add Data tab

Upload a file upload-file

Once you drag and drop (or browse and select) a Parquet or JSON file that you wish to upload, Platform will immediately begin to process the file and an Uploading dialog will appear on the Add Data tab showing the progress of your file upload.

Uploading dialog

Dataset metrics

After the file has finished uploading, the Dataset Activity tab no longer shows that “No batches have been added.” Instead, the Dataset Activity tab now shows dataset metrics. All metrics will show “0” at this stage as the batch has not yet loaded.

At the bottom of the tab is a list showing the Batch ID of the data that was just ingested through the “Add data to dataset” process. Also included is information related to the batch, including ingested date, number of records ingested, and the current batch status.

Dataset metrics

Batch details

Click on the Batch ID to view a Batch Overview, showing additional details regarding the batch. Once the batch has finished loading, the information about the batch will update to show the number of records ingested and the file size. The status will also change to “Success” or “Failed”. If the batch fails the Error Code section will contain details regarding any errors during ingestion.

For more information and frequently asked questions regarding batch ingestion, see the Batch Ingestion troubleshooting guide.

To return to the Dataset Activity screen, click the name of the dataset (Loyalty Details) in the breadcrumb.

Batch Overview

Preview dataset

Once the dataset is ready, an option to Preview Dataset appears at the top of the Dataset Activity tab.

Click Preview Dataset to open a dialog showing sample data from within the dataset. If the dataset was created using a schema, details for the dataset schema will appear on the left-side of the preview. You can expand the schema using the arrows to see the schema structure. Each column header in the preview data represents a field in the dataset.

Dataset details

Next steps and additional resources

Now that you have created a dataset and successfully ingested data into Experience Platform, you can repeat these steps to create a new dataset or ingest more data into the existing dataset.

To learn more about batch ingestion, please read the Batch Ingestion overview and supplement your learning by watching the video below.

WARNING

The Platform UI shown in the following video is out-of-date. Please refer to the documentation above for the latest UI screenshots and functionality.

https://video.tv.adobe.com/v/27269?quality=12&learn=on

Transcript

Hi there, data ingestion gives you the ability to bring your data together in one open and scalable Platform. When your data is marked as schemas, it becomes easy to combine data from multiple sources and do things like creating a Real-time Customer Profile. In this video, we will ingest Loyalty data from our Luma brand and map it to the Luma Loyalty members schema that we created in a separate video. Let’s take a look at how we can ingest Loyalty data into Adobe Experience Platform. When you first log into Platform, you land on the homepage from here, you will select Datasets on the left to navigate the Dataset workspace. Once you arrive in Dataset workspace, you will be presented with a list of Datasets already in Platform. From here you can view and manage all Datasets in your organization. To create a new Dataset, you will need to click on the Create Dataset button in the upper right corner of the page. Once you click Create Dataset, you will be given two options, Create Dataset from schema, and Create Dataset from CSV file. Since we have already defined the Luma Loyalty members schema in another video let’s select Create Dataset from schema option. Next, you need to select the Luma Loyalty schema from a list of available schemas, and then click Next. Now you need to give this Dataset a friendly name. We will call it Luma Loyalty Data, click on Finish, and an empty Dataset gets created for our Loyalty Data. Adobe Experience Platform lets users Ingest Data into a Dataset through batch ingestion and streaming ingestion. Batch ingestion lets you import data in a Batch from any number of data sources. Streaming ingestion allows users to send data to a Platform in real-time from client and server-side devices. For batch ingestion you could select the Dataset we created when setting up the source connect of workflow. Data can also be ingested into a Dataset using the data ingestion API. Add Data UI tool lets you perform some initial testing of the data to ensure it looks right before configuring the source or using the API for data ingestion. In the right panel under the Add Data section in performing a Batch data ingestion into a Dataset, partial ingestion enables ingestion of valid records with a specific error threshold for failed records before the entire Batch fails. Enabling partial ingestion also allows you to perform an error diagnosis or error download using the API for failed records. The error threshold will allow you to set the percentage of acceptable errors before the entire Batch fails. By default, this value is set to 5%. Next let’s add data to this Dataset. Let’s drag the file containing Loyalty Data stored in JSON format into the panel. As soon as you drop the file in the interface a Batch gets created with the initial status as loading and then moves to a processing state. You can also see that no batches have been added, message gets replaced with Batch metrics. To find more information about the Batch, you can click on the Batch ID to get more details. The Batch overview page shows the current status, the number of records ingested, file size and a few additional details. In our case, we have zero failed records. When there is a record failure it’s associated error code and description gets displayed within the Batch overview page. Let’s navigate to the Dataset activity page, and you can notice that the Batch status changed from processing to success. Let’s also preview the Dataset to make sure that the data ingestion was successful. If you want to use your Loyalty data in Real-time Customer Profile, you will need to enable it by toggling the button in the right panel. This lets the Real-time Customer Profile service know to start enriching customer profiles with any data in this Dataset. Next, you need to confirm that you want to enable this Dataset for Real-time Customer Profile. Once you click Enable, this Dataset is now ready to enrich profiles stored in Real-time Customer Profile. Let’s upload the Loyalty data again to our Dataset with the Real-time Customer Profile enabled. In the next successful Batch run, data ingested into our Dataset will be used to create real-time customer profiles. Our Dataset contains identity fields that will be populated and then be used to build identities in Platform. Platform Identity Service helps you to gain a better view of your customer and their behavior by bridging identities across devices and systems, allowing you to deliver impactful personal digital experiences in real time. I hope this video provides you an overview of data ingestion in Adobe Experience Platform. -

recommendation-more-help

2ee14710-6ba4-4feb-9f79-0aad73102a9a