Test KQL table mappings inline

Azure Data Explorer makes it possible to ingest data from external sources in many ways.

This can be done using eg. the database data connections, using programming code via the SDKs, or using the Azure Stream Analytics output for Azure Data Explorer.

Once a table is created and the incoming file/message format (CSV, JSON, etc) and data structure in the file or message stream are identified, a recurring theme is designing the table mapping for that format:

Here, a typical mapping for a JSON message is shown.

Testing a mapping can be cumbersome if you rely on that default ingest method involving those resources outside Azure Data Explorer.

Can this be done in a more easy way?

Yes, we can.

But first, let’s set up a test environment.

I assume you have an ADX cluster set up with a database in it.

You have access to the Query editor:

We create a table first:

.create table arraytest
(
    d: string,
    m: int,
    t: double,
    h : double,
    ts : datetime  
)

And we want to ingest a JSON message like this:

{
  "d": "dev1", 
  "tel": [ 
    {
      "t": 42.1, 
      "h": 56.1
    } 
  ], 
  "m": 21,
  "ts": "2023-05-17T18:23" 
}

So, we need to read three fields from the top level and two values from the first (and only) element in that array!

Reading a value from an array seems straightforward but the documentation is not directly clear.

The first documentation found regarding JSON mapping is more about demonstrating mv-expand so multiple array elements are ‘pivoted’ into multiple rows.

But what I needed was a description like this. There, the documentation talks about separate elements in a dynamic structure as if it’s an array or dictionary.

So we need a mapping like this:

.create table arraytest ingestion json mapping "JsonMapping"
'['
'    { "column" : "d", "datatype" : "string", "Properties":{"Path":"$.d"}},'
'    { "column" : "t", "datatype" : "double", "Properties":{"Path":"$.tel[0].t"}},'
'    { "column" : "h", "datatype" : "double", "Properties":{"Path":"$.tel[0].h"}},'
'    { "column" : "m", "datatype" : "int", "Properties":{"Path":"$.m"}},'
'    { "column" : "ts", "datatype" : "datetime", "Properties":{"Path":"$.ts"}}'
']'

This is a mapping for messages represented in the JSON format. It’s named ‘JsonMapping’.

The most important parts are the paths.

A top-level value is read using this ‘$.somepropertyname’ description.

The first array element is read using this ‘$.somepropertyname[index].somepropertyname’ description.

We are almost ready to test.

For convenience, I also make use of streaming ingest for this table:

While developing, I want an immediate response. Waiting on the batching mechanism is not productive

Normally, Azure Data Explorer ingests in an asynchronous way using batches.

.alter table arraytest policy streamingingestion enable

Note: This gives a hit on cluster resource usage. If this ‘direct response’ is not needed in production, please skip it.

Note 2: This only works if streaming ingest is enabled on the cluster level.

Finally, we can test the mapping.

Instead of testing this using some external tool, we test it in-line, in the Azure Data Explorer query editor:

// successful ingest

.ingest inline into table arraytest with (format = "json", ingestionMappingReference = "JsonMapping") <|
{"d": "dev1", "tel": [ {"t": 42.1, "h": 56.1} ], "m": 21, "ts": "2023-05-17T18:21" }
{"d": "dev2", "tel": [ {"t": 42.2, "h": 56.2} ], "m": 22, "ts": "2023-05-17T18:22" }
{"d": "dev3", "tel": [ {"t": 42.3, "h": 56.3} ], "m": 23, "ts": "2023-05-17T18:23" }

Here, I ingest three lines in one go. Notice that an element as an array with one element is part of the added rows.

Compare this with the mapping as seen above.

We actually look for the first array element.

Note: According to the documentation, you can even find elements starting from the end of the array (retrieving the ‘index’-th value from the end of the array).

Note: if you have a JSON dictionary, you could use the element name instead of the index.

Once executed, let’s check the table content:

arraytest

The outcome is perfect:

You can count the number of records being added. You can also compare the rows by hand.

It’s recommended to also check if the ingestion succeeded in another way:

.show ingestion failures

This gives a historical overview of failed mapping commands:

This information is available for up to fourteen days.

Warning: The formatting of the separate lines to add, can influence the outcome. If the lines end with a comma separator instead of ‘newline’, not all lines are ingested! Plus, this is not seen as an ingestion failure.

DropMappedField transformation

The transformation mapping also supports collecting dropped fields:


A DropMappedField transfotmation maps an object in the JSON document to a column and removes any nested fields already referenced by other column mappings

This helps to check if any fields are not mapped.

To test this, I recreated the same table with two extra columns of the dynamic type:

.create table arraytest
(
    d: string,
    m: int,
    t: double,
    h : double,
    ts : datetime,
    dmroot : dynamic,
    dmtel : dynamic 
)

I updated the mapping:

.create table arraytest ingestion json mapping "JsonMapping"
'['
'    { "column" : "d", "datatype" : "string", "Properties":{"Path":"$.d"}},'
'    { "column" : "t", "datatype" : "double", "Properties":{"Path":"$.tel[0].t"}},'
'    { "column" : "h", "datatype" : "double", "Properties":{"Path":"$.tel[0].h"}},'
'    { "column" : "m", "datatype" : "int", "Properties":{"Path":"$.m"}},'
'    { "column" : "ts", "datatype" : "datetime", "Properties":{"Path":"$.ts"}},'
'    { "Column": "dmroot", "Properties": { "Path": "$", "Transform":"DropMappedFields" }},'
'    { "Column": "dmtel", "Properties": { "Path": "$.tel", "Transform":"DropMappedFields" }}'
']'

The ‘dmroot’ column is filled with left-overs from the root ‘$’.

The ‘dmtel’ column is filled with left-over from the tel element ‘$.tel’.

Here is the test I run:

.ingest inline into table arraytest with (format = "json", ingestionMappingReference = "JsonMapping") <|
{"d": "dev1", "tel": [ {"t": 42.1, "h": 56.1, "x":41}], "m": 21, "ts": "2023-05-17T18:21", "y":441 }
{"d": "dev2", "tel": [ {"t": 42.2, "h": 56.2, "x":42}], "m": 22, "ts": "2023-05-17T18:22", "y":442 }
{"d": "dev3", "tel": [ {"t": 42.3, "h": 56.3, "x":43}, {"t": 1, "h": 2, "x":3}], "m": 23, "ts": "2023-05-17T18:23" }

I added extra values both on the root level (the ‘y’ value) and on the first element of the array (the ‘x’ value).

The third row gets a second array element. The ‘y’ is omitted.

If I run this, this is the outcome:

At root level, in ‘dmroot’, only the ‘Y’ value is seen. Any omitted values on the array level are left out.

In ‘dmtel’ we see both the omitted ‘X’ on the first array element and the complete second array element (on the third row).

This demonstrates perfectly how omitted mapping fields can be matched eventually.

Conclusion

We have seen how we can test an Azure Data Explorer table mapping a bit more conveniently using ingesting table messages inline, in the query editor.

A similar approach could work for table policies too.

Alongside this, we have seen how table ingestion mapping can work on arrays and dictionaries.

This little nugget makes your ADX a bit more convenient.

This post is the tenth part of this Azure Data Explorer blog series:

Advertentie

10 gedachten over “Test KQL table mappings inline

Geef een reactie

Vul je gegevens in of klik op een icoon om in te loggen.

WordPress.com logo

Je reageert onder je WordPress.com account. Log uit /  Bijwerken )

Facebook foto

Je reageert onder je Facebook account. Log uit /  Bijwerken )

Verbinden met %s