Flatten nested data structures for use with third-party BI tools
Adobe Experience Platform Query Service supports the FLATTEN
setting when connecting to a database through third-party BI tools. This feature flattens nested data structures in third-party BI tools to improve their usability and reduce the required workload to retrieve, analyze, transform, and report data.
Many BI tools like Tableau and Power BI do not natively support nested data structures. The FLATTEN
setting removes the need to create SQL views on top of your data to provide a flat version, or to use Query Service CTAS
or INSERT INTO
jobs to duplicate your datasets into flat versions when using ad hoc schemas.
The FLATTEN
setting pulls the structure of each leaf field into the root of the table and names the field after the original namespace. This allows you to use dot notation to match a field to its Experience Data Model (XDM) path while preserving the field’s context.
Prerequisites
Using the FLATTEN
setting requires a working understanding of the following components of Adobe Experience Platform:
-
XDM System: A high-level overview of XDM and its implementation in Experience Platform.
- Create an ad hoc schema: An XDM schema with fields that are namespaced for usage only by a single dataset, is referred to as an ad hoc schema. Ad hoc schemas are used in various data ingestion workflows for Experience Platform and creating certain kinds of source connections.
-
Sandboxes: Experience Platform provides virtual sandboxes that partition a single Platform instance into separate virtual environments to help develop and evolve digital experience applications.
-
Nested data structures: This document provides examples of how to create, process, or transform datasets with complex data types including nested data structures.
Connect to a database using the FLATTEN setting connect-with-flatten
The FLATTEN
setting flattens nested data structures into separate columns where the attribute name becomes the column name that holds the row values. When working with data in BI tools that do not support nested data structures, this setting improves the usability of ad hoc schemas and reduces the necessary workload.
When connecting to Query Service with your chosen third-party client, append the FLATTEN
setting to the database name. For information on how to connect a specific BI tool, please see its respective documentation in the connect clients to Query Service overview. The connection string should contain:
- The sandbox name.
- A colon followed by
all
or a specific dataset ID, view ID, or database name. - A question mark (?) followed by the
FLATTEN
keyword.
The input should take the following format:
{sandbox_name}:{all/ID/database_name}?FLATTEN
An example connection string might look as below:
prod:all?FLATTEN
Example example
The example schema used in this guide employs the standard field group Commerce Details, which utilizes the commerce
object structure and the productListItems
array. See the XDM documentation for more information on the Commerce Details field group. A representation of the schema structure can be seen in the image below.
If your BI tool does not support nested data structures, it can be difficult to reference nested fields should they contain serialized values (such as commerce
and productListItems
in the example schema). These values may appear as parts of a single encoded commerce
string field and are not realistically unusable.
The following values represent commerce.order.priceTotal
(3018.0), commerce.order.purchaseID
(c9b5aff9-25de-450b-98f4-4484a2170180), and commerce.purchases.value
(1.0) in poorly formatted nested fields.
("(3018.0,c9b5aff9-25de-450b-98f4-4484a2170180)","(1.0)")
By using the FLATTEN
setting, you can access separate fields within your schema or whole sections of the nested data structure by using dot notation and their original pathname. An example of this format using the commerce
field group is given below.
commerce.order.priceTotal
commerce.order.purchaseID
commerce.purchases.value
The FLATTEN
setting has certain limitations when dealing with other data structures. Full details are provided in the limitations section.
Use quotation marks for fields in queries quotation-marks
The flattened root fields now use dot notation to match their XDM paths. When used in a query the fields need to be enclosed in quotation marks (" ").
The SQL example below displays the original state of the nested query:
SELECT YEAR(timestamp) AS year,
SUM(commerce.order.priceTotal) AS revenue
FROM purchase_events_dataset
WHERE commerce.purchases.value > 0
GROUP BY 1;
When using the flattened data fields, the query is written using dot notation and enclosed in quotation marks as seen below:
SELECT YEAR(timestamp) AS year,
SUM("commerce.order.priceTotal") AS revenue
FROM purchase_events_dataset
WHERE "commerce.purchases.value" > 0
GROUP BY 1;
Limitations limitations
The FLATTEN
setting does not currently flatten the following data structures:
EXPLODE
function to access arrays.To resolve both Map and Array limitations you need to use the BI tools raw SQL editing like the Advanced Options -> SQL Statement in Power BI.
BI tools such as raw SQL editing are necessary to resolve both map and array limitations. See the guide on how to use Power BI advanced options to enter a custom SQL query in the SQL statement section.
Next steps
This document covered how to flatten nested data structures for use with third-party BI tools. If you have not already connected your client, see the client connection overview for a list of supported third-party clients.