SQL syntax in Query Service
You can use standard ANSI SQL for SELECT
statements and other limited commands in Adobe Experience Platform Query Service. This document covers the SQL syntax supported by Query Service.
SELECT queries select-queries
The following syntax defines a SELECT
query supported by Query Service:
[ WITH with_query [, ...] ]
SELECT [ ALL | DISTINCT [( expression [, ...] ) ] ]
[ * | expression [ [ AS ] output_name ] [, ...] ]
[ FROM from_item [, ...] ]
[ SNAPSHOT { SINCE start_snapshot_id | AS OF end_snapshot_id | BETWEEN start_snapshot_id AND end_snapshot_id } ]
[ WHERE condition ]
[ GROUP BY grouping_element [, ...] ]
[ HAVING condition [, ...] ]
[ WINDOW window_name AS ( window_definition ) [, ...] ]
[ { UNION | INTERSECT | EXCEPT | MINUS } [ ALL | DISTINCT ] select ]
[ ORDER BY expression [ ASC | DESC | USING operator ] [ NULLS { FIRST | LAST } ] [, ...] ]
[ LIMIT { count | ALL } ]
[ OFFSET start ]
The tabs section below provides the available options for the FROM, GROUP, and WITH keywords.
code language-sql |
---|
|
code language-sql |
---|
|
code language-sql |
---|
|
code language-sql |
---|
|
code language-sql |
---|
|
code language-sql |
---|
|
code language-sql |
---|
|
code language-sql |
---|
|
code language-sql |
---|
|
code language-sql |
---|
|
code language-sql |
---|
|
The following subsections provide details on additional clauses that you can use in your queries, provided they follow the format outlined above.
SNAPSHOT clause
This clause can be used to incrementally read data on a table based on snapshot IDs. A snapshot ID is a checkpoint marker represented by a Long-type number that is applied to a data lake table every time data is written to it. The SNAPSHOT
clause attaches itself to the table relation that it is used next to.
[ SNAPSHOT { SINCE start_snapshot_id | AS OF end_snapshot_id | BETWEEN start_snapshot_id AND end_snapshot_id } ]
Example
SELECT * FROM Customers SNAPSHOT SINCE 123;
SELECT * FROM Customers SNAPSHOT AS OF 345;
SELECT * FROM Customers SNAPSHOT BETWEEN 123 AND 345;
SELECT * FROM Customers SNAPSHOT BETWEEN HEAD AND 123;
SELECT * FROM Customers SNAPSHOT BETWEEN 345 AND TAIL;
SELECT * FROM (SELECT id FROM CUSTOMERS BETWEEN 123 AND 345) C
SELECT * FROM Customers SNAPSHOT SINCE 123 INNER JOIN Inventory AS OF 789 ON Customers.id = Inventory.id;
A SNAPSHOT
clause works with a table or table alias but not on top of a subquery or view. A SNAPSHOT
clause works anywhere a SELECT
query on a table can be applied.
Also, you can use HEAD
and TAIL
as special offset values for snapshot clauses. Using HEAD
refers to an offset before the first snapshot, while TAIL
refers to an offset after the last snapshot.
resolve_fallback_snapshot_on_failure
) is set:-
If the optional fallback behavior flag is set, Query Service chooses the earliest available snapshot, set it as the start snapshot, and return the data between the earliest available snapshot and the specified end snapshot. This data is inclusive of the earliest available snapshot.
-
If the optional fallback behavior flag is not set, an error is returned.
WHERE clause
By default, matches produced by a WHERE
clause on a SELECT
query are case-sensitive. If you want matches to be case-insensitive, you can use the keyword ILIKE
instead of LIKE
.
[ WHERE condition { LIKE | ILIKE | NOT LIKE | NOT ILIKE } pattern ]
The logic of the LIKE and ILIKE clauses are explained in the following table:
WHERE condition LIKE pattern
~~
WHERE condition NOT LIKE pattern
!~~
WHERE condition ILIKE pattern
~~*
WHERE condition NOT ILIKE pattern
!~~*
Example
SELECT * FROM Customers
WHERE CustomerName ILIKE 'a%';
This query returns customers with names beginning in “A” or “a”.
JOIN
A SELECT
query that uses joins has the following syntax:
SELECT statement
FROM statement
[JOIN | INNER JOIN | LEFT JOIN | LEFT OUTER JOIN | RIGHT JOIN | RIGHT OUTER JOIN | FULL JOIN | FULL OUTER JOIN]
ON join condition
UNION, INTERSECT, and EXCEPT
The UNION
, INTERSECT
, and EXCEPT
clauses are used to combine or exclude like rows from two or more tables:
SELECT statement 1
[UNION | UNION ALL | UNION DISTINCT | INTERSECT | EXCEPT | MINUS]
SELECT statement 2
CREATE TABLE AS SELECT create-table-as-select
The following syntax defines a CREATE TABLE AS SELECT
(CTAS) query:
CREATE TABLE table_name [ WITH (schema='target_schema_title', rowvalidation='false', label='PROFILE') ] AS (select_query)
schema
rowvalidation
true
.label
profile
to label your dataset as enabled for profile. This means that your dataset automatically gets marked for profile as it gets created. See the derived attribute extension document for more information on the use of label
.select_query
Example
CREATE TABLE Chairs AS (SELECT color, count(*) AS no_of_chairs FROM Inventory i WHERE i.type=="chair" GROUP BY i.color)
CREATE TABLE Chairs WITH (schema='target schema title', label='PROFILE') AS (SELECT color, count(*) AS no_of_chairs FROM Inventory i WHERE i.type=="chair" GROUP BY i.color)
CREATE TABLE Chairs AS (SELECT color FROM Inventory SNAPSHOT SINCE 123)
SELECT
statement must have an alias for the aggregate functions such as COUNT
, SUM
, MIN
, and so on. Also, the SELECT
statement can be provided with or without parentheses (). You can provide a SNAPSHOT
clause to read incremental deltas into the target table.INSERT INTO
The INSERT INTO
command is defined as follows:
INSERT INTO table_name select_query
table_name
select_query
Example
INSERT INTO Customers SELECT SupplierName, City, Country FROM OnlineCustomers;
INSERT INTO Customers AS (SELECT * from OnlineCustomers SNAPSHOT AS OF 345)
SELECT
statement in parentheses (). Also, the schema of the result of the SELECT
statement must conform to that of the table defined in the INSERT INTO
statement. You can provide a SNAPSHOT
clause to read incremental deltas into the target table.Most fields in a real XDM schema are not found at the root level and SQL does not permit the use of dot notation. To achieve a realistic result using nested fields, you must map each field in your INSERT INTO
path.
To INSERT INTO
nested paths, use the following syntax:
INSERT INTO [dataset]
SELECT struct([source field1] as [target field in schema],
[source field2] as [target field in schema],
[source field3] as [target field in schema]) [tenant name]
FROM [dataset]
Example
INSERT INTO Customers SELECT struct(SupplierName as Supplier, City as SupplierCity, Country as SupplierCountry) _Adobe FROM OnlineCustomers;
DROP TABLE
The DROP TABLE
command drops an existing table and deletes the directory associated with the table from the file system if it is not an external table. If the table does not exist, an exception occurs.
DROP TABLE [IF EXISTS] [db_name.]table_name
IF EXISTS
CREATE DATABASE
The CREATE DATABASE
command creates an Azure Data Lake Storage (ADLS) database.
CREATE DATABASE [IF NOT EXISTS] db_name
DROP DATABASE
The DROP DATABASE
command deletes the database from an instance.
DROP DATABASE [IF EXISTS] db_name
IF EXISTS
DROP SCHEMA
The DROP SCHEMA
command drops an existing schema.
DROP SCHEMA [IF EXISTS] db_name.schema_name [ RESTRICT | CASCADE]
IF EXISTS
RESTRICT
CASCADE
CREATE VIEW
The following syntax defines a CREATE VIEW
query for a dataset. This dataset can be an ADLS or accelerated store dataset.
CREATE VIEW view_name AS select_query
view_name
select_query
Example
CREATE VIEW V1 AS SELECT color, type FROM Inventory
CREATE OR REPLACE VIEW V1 AS SELECT model, version FROM Inventory
The following syntax defines a CREATE VIEW
query which creates a view in the context of a database and schema.
Example
CREATE VIEW db_name.schema_name.view_name AS select_query
CREATE OR REPLACE VIEW db_name.schema_name.view_name AS select_query
db_name
schema_name
view_name
select_query
Example
CREATE VIEW <dbV1 AS SELECT color, type FROM Inventory;
CREATE OR REPLACE VIEW V1 AS SELECT model, version FROM Inventory;
SHOW VIEWS
The following query shows the list of views.
SHOW VIEWS;
Db Name | Schema Name | Name | Id | Dataset Dependencies | Views Dependencies | TYPE
----------------------------------------------------------------------------------------------
qsaccel | profile_agg | view1 | view_id1 | dwh_dataset1 | | DWH
| | view2 | view_id2 | adls_dataset | adls_views | ADLS
(2 rows)
DROP VIEW
The following syntax defines a DROP VIEW
query:
DROP VIEW [IF EXISTS] view_name
IF EXISTS
view_name
Example
DROP VIEW v1
DROP VIEW IF EXISTS v1
Anonymous block anonymous-block
An anonymous block consists of two sections: executable and exception-handling sections. In an anonymous block, the executable section is mandatory. However, the exception-handling section is optional.
The following example shows how to create a block with one or more statements to be executed together:
$$BEGIN
statementList
[EXCEPTION exceptionHandler]
$$END
exceptionHandler:
WHEN OTHER
THEN statementList
statementList:
: (statement (';')) +
Below is an example using anonymous block.
$$BEGIN
SET @v_snapshot_from = select parent_id from (select history_meta('email_tracking_experience_event_dataset') ) tab where is_current;
SET @v_snapshot_to = select snapshot_id from (select history_meta('email_tracking_experience_event_dataset') ) tab where is_current;
SET @v_log_id = select now();
CREATE TABLE tracking_email_id_incrementally
AS SELECT _id AS id FROM email_tracking_experience_event_dataset SNAPSHOT BETWEEN @v_snapshot_from AND @v_snapshot_to;
EXCEPTION
WHEN OTHER THEN
DROP TABLE IF EXISTS tracking_email_id_incrementally;
SELECT 'ERROR';
$$END;
Conditional statements in an anonymous block conditional-anonymous-block-statements
The IF-THEN-ELSE control structure enables the conditional execution of a list of statements when a condition is evaluated as TRUE. This control structure is only applicable within an anonymous block. If this structure is used as a standalone command, it results in a syntax error (“Invalid command outside Anonymous Block”).
The code snippet below demonstrates the correct format for an IF-THEN-ELSE conditional statement in an anonymous block.
IF booleanExpression THEN
List of statements;
ELSEIF booleanExpression THEN
List of statements;
ELSEIF booleanExpression THEN
List of statements;
ELSE
List of statements;
END IF
Example
The example below executes SELECT 200;
.
$$BEGIN
SET @V = SELECT 2;
SELECT @V;
IF @V = 1 THEN
SELECT 100;
ELSEIF @V = 2 THEN
SELECT 200;
ELSEIF @V = 3 THEN
SELECT 300;
ELSE
SELECT 'DEFAULT';
END IF;
END$$;
This structure can be used with raise_error();
to return a custom error message. The code block seen below terminates the anonymous block with “custom error message”.
Example
$$BEGIN
SET @V = SELECT 5;
SELECT @V;
IF @V = 1 THEN
SELECT 100;
ELSEIF @V = 2 THEN
SELECT 200;
ELSEIF @V = 3 THEN
SELECT 300;
ELSE
SELECT raise_error('custom error message');
END IF;
END$$;
Nested IF statements
Nested IF statements are supported within anonymous blocks.
Example
$$BEGIN
SET @V = SELECT 1;
IF @V = 1 THEN
SELECT 100;
IF @V > 0 THEN
SELECT 1000;
END IF;
END IF;
END$$;
Exception blocks
Exception blocks are supported within anonymous blocks.
Example
$$BEGIN
SET @V = SELECT 2;
IF @V = 1 THEN
SELECT 100;
ELSEIF @V = 2 THEN
SELECT raise_error(concat('custom-error for v= ', '@V' ));
ELSEIF @V = 3 THEN
SELECT 300;
ELSE
SELECT 'DEFAULT';
END IF;
EXCEPTION WHEN OTHER THEN
SELECT 'THERE WAS AN ERROR';
END$$;
Auto to JSON auto-to-json
Query Service supports an optional session-level setting to return top-level complex fields from interactive SELECT queries as JSON strings. The auto_to_json
setting allows for data from complex fields to be returned as JSON then parsed into JSON objects using standard libraries.
SET the feature flag auto_to_json
to true before executing your SELECT query that contains complex fields.
set auto_to_json=true;
Before setting the auto_to_json
flag
The following table provides an example query result before the auto_to_json
setting is applied. The same SELECT query (as seen below) that targets a table with complex fields was used in both scenarios.
SELECT * FROM TABLE_WITH_COMPLEX_FIELDS LIMIT 2;
The results are as follows:
_id | _experience | application | commerce | dataSource | device | endUserIDs | environment | identityMap | placeContext | receivedTimestamp | timestamp | userActivityRegion | web | _adcstageforpqs
-----------------------------------+----------------------------------------------------------------------------+--------------+----------------------------------------------+------------+--------------------------------------------------------------------+--------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------+-------------------------------------------------------------------------+-----------------------+-----------------------+--------------------+--------------------------------------------------------------------------------------+-----------------
31892EE15DE00000-401D52664FF48A52 | ("("("(1,1)","(1,1)")","(-209479095,4085488201,-2105158467,2189808829)")") | (background) | (NULL,"(USD,NULL)",NULL,NULL,NULL,NULL,NULL) | (475341) | (32,768,1024,205202,https://ns.adobe.com/xdm/external/deviceatlas) | ("("(31892EE080007B35-E6CE00000000000,"(AAID)",t)")") | ("(en-US,f,f,t,1.6,"Mozilla/5.0 (iPhone; U; CPU iPhone OS 4_1 like Mac OS X; ja-jp) AppleWebKit/532.9 (KHTML, like Gecko) Version/4.0.5 Mobile/8B117 Safari/6531.22.7",490,1125)",xo.net,64.3.235.13) | [AAID -> "{(31892EE080007B35-E6CE00000000000,t)}"] | ("("(34.01,-84.0)",lawrenceville,US,524,30043,ga)",600) | 2022-09-02 19:47:14.0 | 2022-09-02 19:47:14.0 | (UT1) | ("(f,Search Results,"(1.0)")","(http://www.google.com/search?ie=UTF-8&q=,internal)") |
31892EE15DE00000-401B92664FF48AE8 | ("("("(1,1)","(1,1)")","(-209479095,4085488201,-2105158467,2189808829)")") | (background) | (NULL,"(USD,NULL)",NULL,NULL,NULL,NULL,NULL) | (475341) | (32,768,1024,205202,https://ns.adobe.com/xdm/external/deviceatlas) | ("("(31892EE100007BF3-215FE00000000001,"(AAID)",t)")") | ("(en-US,f,f,t,1.5,"Mozilla/5.0 (iPhone; U; CPU iPhone OS 4_1 like Mac OS X; ja-jp) AppleWebKit/532.9 (KHTML, like Gecko) Version/4.0.5 Mobile/8B117 Safari/6531.22.7",768,556)",ntt.net,219.165.108.145) | [AAID -> "{(31892EE100007BF3-215FE00000000001,t)}"] | ("("(34.989999999999995,138.42)",shizuoka,JP,392005,420-0812,22)",-240) | 2022-09-02 19:47:14.0 | 2022-09-02 19:47:14.0 | (UT1) | ("(f,Home - JJEsquire,"(1.0)")","(NULL,typed_bookmarked)") |
(2 rows)
After setting the auto_to_json
flag
The following table demonstrates the difference in results that the auto_to_json
setting has on the resulting dataset. The same SELECT query was used in both scenarios.
_id | receivedTimestamp | timestamp | _experience | application | commerce | dataSource | device | endUserIDs | environment | identityMap | placeContext | userActivityRegion | web | _adcstageforpqs
-----------------------------------+-----------------------+-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------+----------------------------------+------------------+-------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------
31892EE15DE00000-401D52664FF48A52 | 2022-09-02 19:47:14.0 | 2022-09-02 19:47:14.0 | {"analytics":{"customDimensions":{"eVars":{"eVar1":"1","eVar2":"1"},"props":{"prop1":"1","prop2":"1"}},"environment":{"browserID":-209479095,"browserIDStr":"4085488201","operatingSystemID":-2105158467,"operatingSystemIDStr":"2189808829"}}} | {"userPerspective":"background"} | {"order":{"currencyCode":"USD"}} | {"_id":"475341"} | {"colorDepth":32,"screenHeight":768,"screenWidth":1024,"typeID":"205202","typeIDService":"https://ns.adobe.com/xdm/external/deviceatlas"} | {"_experience":{"aaid":{"id":"31892EE080007B35-E6CE00000000000","namespace":{"code":"AAID"},"primary":true}}} | {"browserDetails":{"acceptLanguage":"en-US","cookiesEnabled":false,"javaEnabled":false,"javaScriptEnabled":true,"javaScriptVersion":"1.6","userAgent":"Mozilla/5.0 (iPhone; U; CPU iPhone OS 4_1 like Mac OS X; ja-jp) AppleWebKit/532.9 (KHTML, like Gecko) Version/4.0.5 Mobile/8B117 Safari/6531.22.7","viewportHeight":490,"viewportWidth":1125},"domain":"xo.net","ipV4":"64.3.235.13"} | {"AAID":[{"id":"31892EE080007B35-E6CE00000000000","primary":true}]} | {"geo":{"_schema":{"latitude":34.01,"longitude":-84.0},"city":"lawrenceville","countryCode":"US","dmaID":524,"postalCode":"30043","stateProvince":"ga"},"localTimezoneOffset":600} | {"dataCenterLocation":"UT1"} | {"webPageDetails":{"isHomePage":false,"name":"Search Results","pageViews":{"value":1.0}},"webReferrer":{"URL":"http://www.google.com/search?ie=UTF-8&q=","type":"internal"}} |
31892EE15DE00000-401B92664FF48AE8 | 2022-09-02 19:47:14.0 | 2022-09-02 19:47:14.0 | {"analytics":{"customDimensions":{"eVars":{"eVar1":"1","eVar2":"1"},"props":{"prop1":"1","prop2":"1"}},"environment":{"browserID":-209479095,"browserIDStr":"4085488201","operatingSystemID":-2105158467,"operatingSystemIDStr":"2189808829"}}} | {"userPerspective":"background"} | {"order":{"currencyCode":"USD"}} | {"_id":"475341"} | {"colorDepth":32,"screenHeight":768,"screenWidth":1024,"typeID":"205202","typeIDService":"https://ns.adobe.com/xdm/external/deviceatlas"} | {"_experience":{"aaid":{"id":"31892EE100007BF3-215FE00000000001","namespace":{"code":"AAID"},"primary":true}}} | {"browserDetails":{"acceptLanguage":"en-US","cookiesEnabled":false,"javaEnabled":false,"javaScriptEnabled":true,"javaScriptVersion":"1.5","userAgent":"Mozilla/5.0 (iPhone; U; CPU iPhone OS 4_1 like Mac OS X; ja-jp) AppleWebKit/532.9 (KHTML, like Gecko) Version/4.0.5 Mobile/8B117 Safari/6531.22.7","viewportHeight":768,"viewportWidth":556},"domain":"ntt.net","ipV4":"219.165.108.145"} | {"AAID":[{"id":"31892EE100007BF3-215FE00000000001","primary":true}]} | {"geo":{"_schema":{"latitude":34.989999999999995,"longitude":138.42},"city":"shizuoka","countryCode":"JP","dmaID":392005,"postalCode":"420-0812","stateProvince":"22"},"localTimezoneOffset":-240} | {"dataCenterLocation":"UT1"} | {"webPageDetails":{"isHomePage":false,"name":"Home - JJEsquire","pageViews":{"value":1.0}},"webReferrer":{"type":"typed_bookmarked"}} |
(2 rows)
Resolve fallback snapshot on failure resolve-fallback-snapshot-on-failure
The resolve_fallback_snapshot_on_failure
option is used to resolve the issue of an expired snapshot ID. Snapshot metadata expires after two days and an expired snapshot can invalidate the logic of a script. This can be a problem when using anonymous blocks.
Set the resolve_fallback_snapshot_on_failure
option to true to override a snapshot with a previous snapshot ID.
SET resolve_fallback_snapshot_on_failure=true;
The following line of code overrides the @from_snapshot_id
with the earliest available snapshot_id
from metadata.
$$ BEGIN
SET resolve_fallback_snapshot_on_failure=true;
SET @from_snapshot_id = SELECT coalesce(last_snapshot_id, 'HEAD') FROM checkpoint_log a JOIN
(SELECT MAX(process_timestamp)process_timestamp FROM checkpoint_log
WHERE process_name = 'DIM_TABLE_ABC' AND process_status = 'SUCCESSFUL' )b
on a.process_timestamp=b.process_timestamp;
SET @to_snapshot_id = SELECT snapshot_id FROM (SELECT history_meta('DIM_TABLE_ABC')) WHERE is_current = true;
SET @last_updated_timestamp= SELECT CURRENT_TIMESTAMP;
INSERT INTO DIM_TABLE_ABC_Incremental
SELECT * FROM DIM_TABLE_ABC SNAPSHOT BETWEEN @from_snapshot_id AND @to_snapshot_id WHERE NOT EXISTS (SELECT _id FROM DIM_TABLE_ABC_Incremental a WHERE _id=a._id);
Insert Into
checkpoint_log
SELECT
'DIM_TABLE_ABC' process_name,
'SUCCESSFUL' process_status,
cast( @to_snapshot_id AS string) last_snapshot_id,
cast( @last_updated_timestamp AS TIMESTAMP) process_timestamp;
EXCEPTION
WHEN OTHER THEN
SELECT 'ERROR';
END
$$;
Data asset organization
It is important to logically organize your data assets within the Adobe Experience Platform data lake as they grow. Query Service extends SQL constructs that enable you to logically group data assets within a sandbox. This method of organization allows for the sharing of data assets between schemas without the need to move them physically.
The following SQL constructs using standard SQL syntax are supported for you to logically organize your data.
CREATE DATABASE dg1;
CREATE SCHEMA dg1.schema1;
CREATE table t1 ...;
CREATE view v1 ...;
ALTER TABLE t1 ADD PRIMARY KEY (c1) NOT ENFORCED;
ALTER TABLE t2 ADD FOREIGN KEY (c1) REFERENCES t1(c1) NOT ENFORCED;
See the logical organization of data assets guide for a more detailed explanation on Query Service best practices.
Table exists
The table_exists
SQL command is used to confirm whether a table currently exists in the system. The command returns a boolean value: true
if the table does exist, and false
if the table does not exist.
By validating whether a table exists before running the statements, the table_exists
feature simplifies the process of writing an anonymous block to cover both the CREATE
and INSERT INTO
use cases.
The following syntax defines the table_exists
command:
$$
BEGIN
#Set mytableexist to true if the table already exists.
SET @mytableexist = SELECT table_exists('target_table_name');
#Create the table if it does not already exist (this is a one time operation).
CREATE TABLE IF NOT EXISTS target_table_name AS
SELECT *
FROM profile_dim_date limit 10;
#Insert data only if the table already exists. Check if @mytableexist = 'true'
INSERT INTO target_table_name (
select *
from profile_dim_date
WHERE @mytableexist = 'true' limit 20
) ;
EXCEPTION
WHEN other THEN SELECT 'ERROR';
END $$;
Inline inline
The inline
function separates the elements of an array of structs and generates the values into a table. It can only be placed in the SELECT
list or a LATERAL VIEW
.
The inline
function cannot be placed in a select list where there are other generator functions.
By default, the columns produced are named “col1”, “col2”, and so on. If the expression is NULL
then no rows are produced.
RENAME
command.Example
> SELECT inline(array(struct(1, 'a'), struct(2, 'b'))), 'Spark SQL';
The example returns the following:
1 a Spark SQL
2 b Spark SQL
This second example further demonstrates the concept and application of the inline
function. The data model for the example is illustrated in the image below.
Example
select inline(productListItems) from source_dataset limit 10;
The values taken from the source_dataset
are used to populate the target table.
Spark SQL commands
The subsection below covers the Spark SQL commands supported by Query Service.
SET
The SET
command sets a property and either returns the value of an existing property or lists all the existing properties. If a value is provided for an existing property key, the old value is overridden.
SET property_key = property_value
property_key
property_value
To return the value for any setting, use SET [property key]
without a property_value
.
PostgreSQL commands
The subsections below cover the PostgreSQL commands supported by Query Service.
ANALYZE TABLE analyze-table
The ANALYZE TABLE
command performs a distribution analysis and statistical calculations for the named table or tables. The use of ANALYZE TABLE
varies depending on whether the datasets are stored on the accelerated store or the data lake. See their respective sections for more information on its use.
COMPUTE STATISTICS on the accelerated store compute-statistics-accelerated-store
The ANALYZE TABLE
command computes statistics for a table on the accelerated store. The statistics are calculated on executed CTAS or ITAS queries for a given table on the accelerated store.
Example
ANALYZE TABLE <original_table_name>
The following is a list of statistical calculations that are available after using the ANALYZE TABLE
command:-
field
data-type
count
distinct-count
missing
max
min
mean
stdev
COMPUTE STATISTICS on the data lake compute-statistics-data-lake
You can now calculate column-level statistics on Azure Data Lake Storage (ADLS) datasets with the COMPUTE STATISTICS
SQL command. Compute column statistics on either the entire dataset, a subset of a dataset, all columns, or a subset of columns.
COMPUTE STATISTICS
extends the ANALYZE TABLE
command. However, the COMPUTE STATISTICS
, FILTERCONTEXT
, and FOR COLUMNS
commands are not supported on accelerated store tables. These extensions for the ANALYZE TABLE
command are currently only supported for ADLS tables.
Example
ANALYZE TABLE tableName FILTERCONTEXT (timestamp >= to_timestamp('2023-04-01 00:00:00') and timestamp <= to_timestamp('2023-04-05 00:00:00')) COMPUTE STATISTICS FOR COLUMNS (commerce, id, timestamp);
The FILTER CONTEXT
command calculates statistics on a subset of the dataset based on the filter condition provided. The FOR COLUMNS
command targets specific columns for analysis.
Statistics ID
and the statistics generated are only valid for each session and cannot be accessed across different PSQL sessions.Limitations:
- Statistics generation is not supported for array or map data types
- Computed statistics are not persisted across sessions.
skip_stats_for_complex_datatypes
SET skip_stats_for_complex_datatypes = false
The console output appears as seen below.
| Statistics ID |
| ---------------------- |
| adc_geometric_stats_1 |
(1 row)
You can then query the computed statistics directly by referencing the Statistics ID
. Use the the Statistics ID
or the alias name as shown in the example statement below, to view the output in full. To learn more about this feature, see the alias name documentation.
-- This statement gets the statistics generated for `alias adc_geometric_stats_1`.
SELECT * FROM adc_geometric_stats_1;
Use the SHOW STATISTICS
command to display the metadata for all the temporary statistics generated in the session. This command can help you refine the scope of your statistical analysis.
SHOW STATISTICS;
An example output of SHOW STATISTICS is seen below.
statsId | tableName | columnSet | filterContext | timestamp
----------------------+---------------+-----------+-----------------------------+--------------------
adc_geometric_stats_1 | adc_geometric | (age) | | 25/06/2023 09:22:26
demo_table_stats_1 | demo_table | (*) | ((age > 25)) | 25/06/2023 12:50:26
age_stats | castedtitanic | (age) | ((age > 25) AND (age < 40)) | 25/06/2023 09:22:26
See the dataset statistics documentation for more information.
TABLESAMPLE tablesample
Adobe Experience Platform Query Service provides sample datasets as part of its approximate query processing capabilities.
Data set samples are best used when you do not need an exact answer for an aggregate operation over a dataset. To conduct more efficient exploratory queries on large datasets by issuing an approximate query to return an approximate answer, use the TABLESAMPLE
feature.
Sample datasets are created with uniform random samples from existing Azure Data Lake Storage (ADLS) datasets, using only a percentage of records from the original. The dataset sample feature extends the ANALYZE TABLE
command with the TABLESAMPLE
and SAMPLERATE
SQL commands.
In the example below, line one demonstrates how to compute a 5% sample of the table. Line two demonstrates how to compute a 5% sample from a filtered view of the data within the table.
Example
ANALYZE TABLE tableName TABLESAMPLE SAMPLERATE 5;
ANALYZE TABLE tableName FILTERCONTEXT (timestamp >= to_timestamp('2023-01-01')) TABLESAMPLE SAMPLERATE 5:
See the dataset samples documentation for more information.
BEGIN
The BEGIN
command, or alternatively the BEGIN WORK
or BEGIN TRANSACTION
command, initiates a transaction block. Any statements that are inputted after the begin command will be executed in a single transaction until an explicit COMMIT or ROLLBACK command is given. This command is the same as START TRANSACTION
.
BEGIN
BEGIN WORK
BEGIN TRANSACTION
CLOSE
The CLOSE
command frees the resources associated with an open cursor. After the cursor is closed, no subsequent operations are allowed on it. A cursor should be closed when it is no longer needed.
CLOSE name
CLOSE ALL
If CLOSE name
is used, name
represents the name of an open cursor that must be closed. If CLOSE ALL
is used, all open cursors are closed.
DEALLOCATE
To deallocate a previously prepared SQL statement, use the DEALLOCATE
command. If you did not explicitly deallocate a prepared statement, it is deallocated when the session ends. More information about prepared statements can be found in the PREPARE command section.
DEALLOCATE name
DEALLOCATE ALL
If DEALLOCATE name
is used, name
represents the name of the prepared statement that must be deallocated. If DEALLOCATE ALL
is used, all the prepared statements are deallocated.
DECLARE
The DECLARE
command allows a user to create a cursor, which can be used to retrieve a small number of rows out of a larger query. After the cursor is created, rows are fetched from it using FETCH
.
DECLARE name CURSOR FOR query
name
query
SELECT
or VALUES
command which provides the rows to be returned by the cursor.EXECUTE
The EXECUTE
command is used to execute a previously prepared statement. Since prepared statements only exist during a session, the prepared statement must have been created by a PREPARE
statement executed earlier in the current session. More information about using prepared statements can be found in the PREPARE
command section.
If the PREPARE
statement that created the statement specified some parameters, a compatible set of parameters must be passed to the EXECUTE
statement. If these parameters are not passed in, an error is raised.
EXECUTE name [ ( parameter ) ]
name
parameter
EXPLAIN
The EXPLAIN
command displays the execution plan for the supplied statement. The execution plan shows how the tables referenced by the statement will be scanned. If multiple tables are referenced, it shows what join algorithms are used to bring together the required rows from each input table.
EXPLAIN statement
To define the format of the response, use the FORMAT
keyword with the EXPLAIN
command.
EXPLAIN FORMAT { TEXT | JSON } statement
FORMAT
FORMAT
command to specify the output format. The available options are TEXT
or JSON
. Non-text output contains the same information as the text output format, but is easier for programs to parse. This parameter defaults to TEXT
.statement
SELECT
, INSERT
, UPDATE
, DELETE
, VALUES
, EXECUTE
, DECLARE
, CREATE TABLE AS
, or CREATE MATERIALIZED VIEW AS
statement, whose execution plan you want to see.SELECT
statement might return is discarded when run with the EXPLAIN
keyword. Other side effects of the statement happen as usual.Example
The following example shows the plan for a simple query on a table with a single integer
column and 10000 rows:
EXPLAIN SELECT * FROM foo;
QUERY PLAN
---------------------------------------------------------
Seq Scan on foo (dataSetId = "6307eb92f90c501e072f8457", dataSetName = "foo") [0,1000000242,6973776840203d3d,6e616c58206c6153,6c6c6f430a3d4d20,74696d674c746365]
(1 row)
FETCH
The FETCH
command retrieves rows using a previously created cursor.
FETCH num_of_rows [ IN | FROM ] cursor_name
num_of_rows
cursor_name
PREPARE prepare
The PREPARE
command lets you create a prepared statement. A prepared statement is a server-side object that can be used to templatize similar SQL statements.
Prepared statements can take parameters, which are values that are substituted into the statement when it is executed. Parameters are referred by position, using $1, $2, and so on, when using prepared statements.
Optionally, you can specify a list of parameter data types. If a parameter’s data type isn’t listed, the type can be inferred from the context.
PREPARE name [ ( data_type [, ...] ) ] AS SELECT
name
data_type
ROLLBACK
The ROLLBACK
command undoes the current transaction and discards all the updates made by the transaction.
ROLLBACK
ROLLBACK WORK
SELECT INTO
The SELECT INTO
command creates a new table and fills it with data computed by a query. The data is not returned to the client, as it is with a normal SELECT
command. The new table’s columns have the names and data types associated with the output columns of the SELECT
command.
[ WITH [ RECURSIVE ] with_query [, ...] ]
SELECT [ ALL | DISTINCT [ ON ( expression [, ...] ) ] ]
* | expression [ [ AS ] output_name ] [, ...]
INTO [ TEMPORARY | TEMP | UNLOGGED ] [ TABLE ] new_table
[ FROM from_item [, ...] ]
[ WHERE condition ]
[ GROUP BY expression [, ...] ]
[ HAVING condition [, ...] ]
[ WINDOW window_name AS ( window_definition ) [, ...] ]
[ { UNION | INTERSECT | EXCEPT } [ ALL | DISTINCT ] select ]
[ ORDER BY expression [ ASC | DESC | USING operator ] [ NULLS { FIRST | LAST } ] [, ...] ]
[ LIMIT { count | ALL } ]
[ OFFSET start [ ROW | ROWS ] ]
[ FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } ONLY ]
[ FOR { UPDATE | SHARE } [ OF table_name [, ...] ] [ NOWAIT ] [...] ]
More information about the standard SELECT query parameters can be found in the SELECT query section. This section only lists parameters that are exclusive to the SELECT INTO
command.
TEMPORARY
or TEMP
UNLOGGED
new_table
Example
The following query creates a new table films_recent
consisting of only recent entries from the table films
:
SELECT * INTO films_recent FROM films WHERE date_prod >= '2002-01-01';
SHOW
The SHOW
command displays the current setting of runtime parameters. These variables can be set using the SET
statement, by editing the postgresql.conf
configuration file, through the PGOPTIONS
environmental variable (when using libpq or a libpq-based application), or through command-line flags when starting the Postgres server.
SHOW name
SHOW ALL
name
SERVER_VERSION
: This parameter shows the server’s version number.SERVER_ENCODING
: This parameter shows the server-side character set encoding.LC_COLLATE
: This parameter shows the database’s locale setting for collation (text ordering).LC_CTYPE
: This parameter shows the database’s locale setting for character classification.IS_SUPERUSER
: This parameter shows if the current role has superuser privileges.ALL
Example
The following query shows the current setting of the parameter DateStyle
.
SHOW DateStyle;
DateStyle
-----------
ISO, MDY
(1 row)
COPY
The COPY
command duplicates the output of any SELECT
query to a specified location. The user must have access to this location for this command to succeed.
COPY query
TO '%scratch_space%/folder_location'
[ WITH FORMAT 'format_name']
query
format_name
format_name
can be one of parquet
, csv
, or json
. By default, the value is parquet
.adl://<ADLS_URI>/users/<USER_ID>/acp_foundation_queryService/folder_location/<QUERY_ID>
ALTER TABLE alter-table
The ALTER TABLE
command lets you add or drop primary or foreign key constraints and add columns to the table.
ADD or DROP CONSTRAINT
The following SQL queries show examples of adding or dropping constraints to a table. Primary key and foreign key constraints can be added to multiple columns with comma-separated values. You can create composite keys by passing two or more column name values as seen in the examples below.
Define primary or composite keys
ALTER TABLE table_name ADD CONSTRAINT PRIMARY KEY ( column_name ) NAMESPACE namespace
ALTER TABLE table_name ADD CONSTRAINT PRIMARY KEY ( column_name1, column_name2 ) NAMESPACE namespace
Define a relationship between tables based on one or more keys
ALTER TABLE table_name ADD CONSTRAINT FOREIGN KEY ( column_name ) REFERENCES referenced_table_name ( primary_column_name )
ALTER TABLE table_name ADD CONSTRAINT FOREIGN KEY ( column_name1, column_name2 ) REFERENCES referenced_table_name ( primary_column_name1, primary_column_name2 )
Define an identity column
ALTER TABLE table_name ADD CONSTRAINT PRIMARY IDENTITY ( column_name ) NAMESPACE namespace
ALTER TABLE table_name ADD CONSTRAINT IDENTITY ( column_name ) NAMESPACE namespace
Drop a constraint/relationship/identity
ALTER TABLE table_name DROP CONSTRAINT PRIMARY KEY ( column_name )
ALTER TABLE table_name DROP CONSTRAINT PRIMARY KEY ( column_name1, column_name2 )
ALTER TABLE table_name DROP CONSTRAINT FOREIGN KEY ( column_name )
ALTER TABLE table_name DROP CONSTRAINT FOREIGN KEY ( column_name1, column_name2 )
ALTER TABLE table_name DROP CONSTRAINT PRIMARY IDENTITY ( column_name )
ALTER TABLE table_name DROP CONSTRAINT IDENTITY ( column_name )
table_name
column_name
referenced_table_name
primary_column_name
Add or drop primary and secondary identities
To add or delete constraints for both primary and secondary identity table columns, use the ALTER TABLE
command.
The following examples add a primary identity and a secondary identity by adding constraints.
ALTER TABLE t1 ADD CONSTRAINT PRIMARY IDENTITY (id) NAMESPACE 'IDFA';
ALTER TABLE t1 ADD CONSTRAINT IDENTITY(id) NAMESPACE 'IDFA';
Identities can also be removed by dropping constraints, as seen in the example below.
ALTER TABLE t1 DROP CONSTRAINT PRIMARY IDENTITY (c1) ;
ALTER TABLE t1 DROP CONSTRAINT IDENTITY (c1) ;
For more detailed information, see the document on setting identities in an ad hoc datasets.
ADD COLUMN
The following SQL queries show examples of adding columns to a table.
ALTER TABLE table_name ADD COLUMN column_name data_type
ALTER TABLE table_name ADD COLUMN column_name_1 data_type1, column_name_2 data_type2
Supported data types
The following table lists the accepted data types for adding columns to a table with Postgres SQL, XDM, and the Accelerated Database Recovery (ADR) in Azure SQL.
bigint
int8
bigint
integer
int4
integer
smallint
int2
smallint
tinyint
int1
tinyint
varchar(len)
string
varchar(len)
varchar
is best used when the sizes of the column data entries vary considerably.double
float8
double precision
FLOAT8
and FLOAT
are valid synonyms for DOUBLE PRECISION
. double precision
is a floating-point data type. Floating-point values are stored in 8 bytes.double precision
float8
double precision
FLOAT8
is a valid synonym for double precision
.double precision
is a floating-point data type. Floating-point values are stored in 8 bytes.date
date
date
date
data types are 4-byte stored calendar date values without any timestamp information. The range of valid dates is from 01-01-0001 to 12-31-9999.datetime
datetime
datetime
datetime
includes the qualifiers of: year, month, day, hour, second, and fraction. A datetime
declaration can include any subset of these time units that are joined in that sequence, or even comprise only a single time unit.char(len)
string
char(len)
char(len)
keyword is used to indicate that the item is fixed-length character.ADD SCHEMA
The following SQL query shows an example of adding a table to a database / schema.
ALTER TABLE table_name ADD SCHEMA database_name.schema_name
REMOVE SCHEMA
The following SQL query shows an example of removing a table from a database / schema.
ALTER TABLE table_name REMOVE SCHEMA database_name.schema_name
Parameters
table_name
column_name
data_type
SHOW PRIMARY KEYS
The SHOW PRIMARY KEYS
command lists all the primary key constraints for the given database.
SHOW PRIMARY KEYS
tableName | columnName | datatype | namespace
------------------+----------------------+----------+-----------
table_name_1 | column_name1 | text | "ECID"
table_name_2 | column_name2 | text | "AAID"
SHOW FOREIGN KEYS
The SHOW FOREIGN KEYS
command lists all the foreign key constraints for the given database.
SHOW FOREIGN KEYS
tableName | columnName | datatype | referencedTableName | referencedColumnName | namespace
------------------+---------------------+----------+---------------------+----------------------+-----------
table_name_1 | column_name1 | text | table_name_3 | column_name3 | "ECID"
table_name_2 | column_name2 | text | table_name_4 | column_name4 | "AAID"
SHOW DATAGROUPS
The SHOW DATAGROUPS
command returns a table of all associated databases. For each database, the table includes schema, group type, child type, child name, and child ID.
SHOW DATAGROUPS
Database | Schema | GroupType | ChildType | ChildName | ChildId
-------------+-------------------+-----------+----------------------+----------------------------------------------------+--------------------------------------
adls_db | adls_scheema | ADLS | Data Lake Table | adls_table1 | 6149ff6e45cfa318a76ba6d3
adls_db | adls_scheema | ADLS | Accelerated Store | _table_demo1 | 22df56cf-0790-4034-bd54-d26d55ca6b21
adls_db | adls_scheema | ADLS | View | adls_view1 | c2e7ddac-d41c-40c5-a7dd-acd41c80c5e9
adls_db | adls_scheema | ADLS | View | adls_view4 | b280c564-df7e-405f-80c5-64df7ea05fc3
SHOW DATAGROUPS FOR table
The SHOW DATAGROUPS FOR 'table_name'
command returns a table of all associated databases that contain the parameter as its child. For each database, the table includes schema, group type, child type, child name, and child ID.
SHOW DATAGROUPS FOR 'table_name'
Parameters
table_name
: The name of the table that you want to find associated databases for.
Database | Schema | GroupType | ChildType | ChildName | ChildId
-------------+-------------------+-----------+----------------------+----------------------------------------------------+--------------------------------------
dwh_db_demo | schema2 | QSACCEL | Accelerated Store | _table_demo2 | d270f704-0a65-4f0f-b3e6-cb535eb0c8ce
dwh_db_demo | schema1 | QSACCEL | Accelerated Store | _table_demo2 | d270f704-0a65-4f0f-b3e6-cb535eb0c8ce
qsaccel | profile_aggs | QSACCEL | Accelerated Store | _table_demo2 | d270f704-0a65-4f0f-b3e6-cb535eb0c8ce