Test KQL table mappings inline

Azure Data Explorer makes it possible to ingest data from external sources in many ways.

This can be done using eg. the database data connections, using programming code via the SDKs, or using the Azure Stream Analytics output for Azure Data Explorer.

Once a table is created and the incoming file/message format (CSV, JSON, etc) and data structure in the file or message stream are identified, a recurring theme is designing the table mapping for that format:

Here, a typical mapping for a JSON message is shown.

Testing a mapping can be cumbersome if you rely on that default ingest method involving those resources outside Azure Data Explorer.

Can this be done in a more easy way?

Yes, we can.

But first, let’s set up a test environment.

I assume you have an ADX cluster set up with a database in it.

You have access to the Query editor:

We create a table first:

.create table arraytest
(
    d: string,
    m: int,
    t: double,
    h : double,
    ts : datetime  
)

And we want to ingest a JSON message like this:

{
  "d": "dev1", 
  "tel": [ 
    {
      "t": 42.1, 
      "h": 56.1
    } 
  ], 
  "m": 21,
  "ts": "2023-05-17T18:23" 
}

So, we need to read three fields from the top level and two values from the first (and only) element in that array!

Reading a value from an array seems straightforward but the documentation is not directly clear.

The first documentation found regarding JSON mapping is more about demonstrating mv-expand so multiple array elements are ‘pivoted’ into multiple rows.

But what I needed was a description like this. There, the documentation talks about separate elements in a dynamic structure as if it’s an array or dictionary.

So we need a mapping like this:

.create table arraytest ingestion json mapping "JsonMapping"
'['
'    { "column" : "d", "datatype" : "string", "Properties":{"Path":"$.d"}},'
'    { "column" : "t", "datatype" : "double", "Properties":{"Path":"$.tel[0].t"}},'
'    { "column" : "h", "datatype" : "double", "Properties":{"Path":"$.tel[0].h"}},'
'    { "column" : "m", "datatype" : "int", "Properties":{"Path":"$.m"}},'
'    { "column" : "ts", "datatype" : "datetime", "Properties":{"Path":"$.ts"}}'
']'

This is a mapping for messages represented in the JSON format. It’s named ‘JsonMapping’.

The most important parts are the paths.

A top-level value is read using this ‘$.somepropertyname’ description.

The first array element is read using this ‘$.somepropertyname[index].somepropertyname’ description.

We are almost ready to test.

For convenience, I also make use of streaming ingest for this table:

While developing, I want an immediate response. Waiting on the batching mechanism is not productive.

Normally, Azure Data Explorer ingests in an asynchronous way using batches.

.alter table arraytest policy streamingingestion enable

Note: This gives a hit on cluster resource usage. If this ‘direct response’ is not needed in production, please skip it.

Note 2: This only works if streaming ingest is enabled on the cluster level.

Finally, we can test the mapping.

Instead of testing this using some external tool, we test it in-line, in the Azure Data Explorer query editor:

// successful ingest

.ingest inline into table arraytest with (format = "json", ingestionMappingReference = "JsonMapping") <|
{"d": "dev1", "tel": [ {"t": 42.1, "h": 56.1} ], "m": 21, "ts": "2023-05-17T18:21" }
{"d": "dev2", "tel": [ {"t": 42.2, "h": 56.2} ], "m": 22, "ts": "2023-05-17T18:22" }
{"d": "dev3", "tel": [ {"t": 42.3, "h": 56.3} ], "m": 23, "ts": "2023-05-17T18:23" }

Here, I ingest three lines in one go. Notice that an element as an array with one element is part of the added rows.

Compare this with the mapping as seen above.

We actually look for the first array element.

Note: According to the documentation, you can even find elements starting from the end of the array (retrieving the ‘index’-th value from the end of the array).

Note: if you have a JSON dictionary, you could use the element name instead of the index.

Once executed, let’s check the table content:

arraytest

The outcome is perfect:

You can count the number of records being added. You can also compare the rows by hand.

It’s recommended to also check if the ingestion succeeded in another way:

.show ingestion failures

This gives a historical overview of failed mapping commands:

This information is available for up to fourteen days.

Warning: The formatting of the separate lines to add, can influence the outcome. If the lines end with a comma separator instead of ‘newline’, not all lines are ingested! Plus, this is not seen as an ingestion failure.

DropMappedField transformation

The transformation mapping also supports collecting dropped fields:

A DropMappedField transfotmation maps an object in the JSON document to a column and removes any nested fields already referenced by other column mappings

This helps to check if any fields are not mapped.

To test this, I recreated the same table with two extra columns of the dynamic type:

.create table arraytest
(
    d: string,
    m: int,
    t: double,
    h : double,
    ts : datetime,
    dmroot : dynamic,
    dmtel : dynamic 
)

I updated the mapping:

.create table arraytest ingestion json mapping "JsonMapping"
'['
'    { "column" : "d", "datatype" : "string", "Properties":{"Path":"$.d"}},'
'    { "column" : "t", "datatype" : "double", "Properties":{"Path":"$.tel[0].t"}},'
'    { "column" : "h", "datatype" : "double", "Properties":{"Path":"$.tel[0].h"}},'
'    { "column" : "m", "datatype" : "int", "Properties":{"Path":"$.m"}},'
'    { "column" : "ts", "datatype" : "datetime", "Properties":{"Path":"$.ts"}},'
'    { "Column": "dmroot", "Properties": { "Path": "$", "Transform":"DropMappedFields" }},'
'    { "Column": "dmtel", "Properties": { "Path": "$.tel", "Transform":"DropMappedFields" }}'
']'

The ‘dmroot’ column is filled with left-overs from the root ‘$’.

The ‘dmtel’ column is filled with left-over from the tel element ‘$.tel’.

Here is the test I run:

.ingest inline into table arraytest with (format = "json", ingestionMappingReference = "JsonMapping") <|
{"d": "dev1", "tel": [ {"t": 42.1, "h": 56.1, "x":41}], "m": 21, "ts": "2023-05-17T18:21", "y":441 }
{"d": "dev2", "tel": [ {"t": 42.2, "h": 56.2, "x":42}], "m": 22, "ts": "2023-05-17T18:22", "y":442 }
{"d": "dev3", "tel": [ {"t": 42.3, "h": 56.3, "x":43}, {"t": 1, "h": 2, "x":3}], "m": 23, "ts": "2023-05-17T18:23" }

I added extra values both on the root level (the ‘y’ value) and on the first element of the array (the ‘x’ value).

The third row gets a second array element. The ‘y’ is omitted.

If I run this, this is the outcome:

At the root level, in ‘dmroot’, only the ‘Y’ value is seen. Any omitted values on the array level are left out.

In ‘dmtel’ we see both the omitted ‘X’ on the first array element and the complete second array element (on the third row).

This demonstrates perfectly how omitted mapping fields can be matched eventually.

Bonus: .append

The .ingest command is intended for manual ad-hoc testing. You just create new rows.

If you want to move/copy existing rows from one table to another, check out the ‘ingest from query‘ KQL command.

You could do something like:

.set-or-append arraytestTarget <| arraytest | where d == 'dev2'

This creates a second table ‘arraytestTarget’ resembling the ‘arraytest’ table and fills it with all rows matching the query.

If you only use ‘.append’ the target table needs to exist already.

This brings us also to ‘chards’ or ‘extends’.

Each group of rows (eg. coming from an ingestion file) can be marked as a group using tags.

Using the same ‘ingest from query’ commands, you could replace or drop the tagged groups.

Check out the special tag which can even prevent the ingestion of the same data twice:

.set-or-append arraytestTarget with (ingestIfNotExists = '["onlyonce1"]', tags = '["ingest-by:onlyonce1"]')  <| arraytest | where d == 'dev2'

The first time this code is executed, it returns an extend id:

The second time this line is executed, no rows are added, and no extend is created:

We can even measure this:

.show commands  
| project CommandType, Text, Duration, TotalCpu, LastUpdatedOn
| order by LastUpdatedOn desc

The result is:

We see that only the first time the same command is executed some CPU time is used. The other times this is neglectable.

The .append command can also be used to add new rows inline:

.append async RawTable <| 
  print rawpayload = todynamic("a"), deviceId= tostring( "device1") , timeStamp= now()

Notice we need to both type each column and name the ‘mapping’ to the rich column. The part after the | (pipe) sign must be a command. Hence we use a print command to feed the row.

Bonus: Tags

Tags can be added to separate chards (aka extents).

Adding tags is done right at the moment of ingestion (although appending this, later on, is also possible but a bit harder). Here is an example. I do it in two steps.

We first create a table:

.create table t1 ( Timestamp:datetime, From:string, TravelMode:string, Tier:string);

The first run creates an extent with four rows:

.ingest inline into table t1 with (tags = '["run:1"]') <|  
2010-01-01,NY,Flight,Premium
2010-01-02,Boston,Train,Economy
2010-01-02,NY,Flight,Business
2010-01-02,Boston,Bus,Business

The second run created an extent with one row:

.ingest inline into table  t1 with (tags = '["run:2"]') <|
2010-01-01T00:00:00Z,SF,Cruise,Premium

Two extents are created, the same way you have experienced it.

You can check for the tags with either ‘contains’:

Or you can check it individual ones with ‘has’:

Now you have control over the tags during the creation of records.

Conclusion

We have seen how we can test an Azure Data Explorer table mapping a bit more conveniently using ingesting table messages inline, in the query editor.

A similar approach could work for table policies too.

Alongside this, we have seen how table ingestion mapping can work on arrays and dictionaries.

This little nugget makes your ADX a bit more convenient.

This post is the tenth part of this Azure Data Explorer blog series:

Test KQL table mappings inline

DropMappedField transformation

Bonus: .append

Bonus: Tags

Conclusion

Gepubliceerd door Sander van de Velde | Microsoft MVP Azure IoT | IoT Solution Architect | Speaker | Advisor

10 gedachten over “Test KQL table mappings inline”

DropMappedField transformation

Bonus: .append

Bonus: Tags

Conclusion

Dit delen:

Gerelateerd

Gepubliceerd door Sander van de Velde | Microsoft MVP Azure IoT | IoT Solution Architect | Speaker | Advisor

10 gedachten over “Test KQL table mappings inline”