Add more flexibility to StreamAnalytics with reference data

I started this series about IoT Hub communication to show how easy it is to get a long way. But we reached a point where simple telemetry communication is not the issue anymore, we want to make it stable, production ready and configurable.

In the previous blogs we used some fixed values in StreamAnalytics, just to generate some extra values (baselumen). Now we want to make this configurable.

This is part 6 of a series of blogs about IoT Hub communication using UWP:

  • Part 1: A very simple complex IoT Hub example
  • Part 2: Using the Visual Studio 2015 IoT Hub Extension and IoT Hub Device explorer
  • Part 3: Adding the power of BI to your IoT Hub
  • Part 4: Closing the Windows IoT Core feedback loop using Azure Functions
  • Part 5: PowerBI showing telemetry of past ten minutes, always!
  • Part 6: Add more flexibility to StreamAnalytics

StreamAnalytics supports reference data. And the solution is pretty simple: a blob with reference data (json or csv) must be provided in some Azure Storage and this is defined as an input.

This reference data can be combined with your telemetry using a T-SQL like ‘join’.

And to make it more interesting, it’s even possible to provide multiple blobs, each relative to a certain timestamp. So in time, StreamAnalytics can switch to a newer version of the same reference data.

This means we do not have to shut down our StreamAnalytics, just to alter reference data.

So let’s create a StreamAnalytics and add some reference data.

Create SteamAnalytics and simulate input

By now you must be familiar with how to create a StreamAnalytics. Create an EventHub and add it as output named ehoutput. And create an IoTHub as input named hubinput.

The query we want to use in out StreamAnalytics is this:

CAST(time as datetime) as time,
CAST(temperature as float) as temperature,
400 as baselumen,
CAST(lumen as float) as lumen

First, we want to test the query using som test data.

So here is some test telemetry data save it as telemetry.json:


Just below the query, there is a test option. Just ‘feed’ the query this file and see that the query executes:


Great, the query is working, We see a baselumen of 400 for each device.

Let’s look at using reference data:

Adding a reference to reference data

When you add an input, you are asked to choose between a data stream or reference data. Until now we have always chosen the first option, now we choose the latter:


Remember, we are just telling StreamAnalytics how reference data can be discovered. So we need some Azure storage account, a path pattern which describes where the data is stored:


Here we tell StreamAnalytics the data will be available at ref/YYYY/MM/DD/HH/ref.json:


And the format of the reference data will be passed as json:


So now the administration is done. Let’s add actual data:

Adding reference data in storage

For this demonstration, we will use a simple example. We tell StreamAnalytics which device uses which baselumen. Note that the baselumen for this device is 600:


Note: in this demonstration, I only use one device so that one is added. Please add more devices on separate lines if needed.

Save this reference data as ref.json. We now want to upload this file into the blob storage. Unfortunately, the Azure portal has no abilities to actually add or delete blobs in a storage container. You can see the container but not touch it:


There are several free and free blob storage explorers. But in this example, I use Visual Studio. Within the Service Explorer view the container can be opened and an Upload Blob button is shown.

So select the ref.json. StreamAnalytics must be able to access it like All telemetry arriving after may 31 2016, will use this reference data. So we have to store it in a path like ‘ref/2016/06/01/00’:


So no we have a file with reference data in blob storage:


Just for the fun of it, let’s check it out in the Azure portal:


Not that’s a lot of blades!

Altering the query to reference the data

Ok, we have reference data so let’s reference it in a new version of the query:

CAST(H.time as datetime) as time,
CAST(H.temperature as float) as temperature,
R.baselumen as baselumen,
CAST(H.lumen as float) as lumen
hubinput H
JOIN refinput R
ON R.devicename = H.devicename

We join the incoming telemetry with the reference data based on the name of the device the telemetry is coming from. The baselumen column is replaced by a reference data column reference.

Testing the Stream Analytics query

Before we submit real telemetry, let’s test the query by hand. Because a second input is defined, we have to provide two input files:


Now check the test output. The telemetry of device ‘DeviceOne’ is enriched with a baselumen of 600.


Let’s go for the real deal. Let’s pass real telemetry.

Passing real telemetry through StreamAnalytics from device

Just like in previous parts of this series, we use an IoTHub for telemetry input. I use for this demonstration an Azure Function as output (See Part 4 for more details). Why? Azure Function provides logging by default. So after

So after sending some telemetry from ‘DeviceOne’ we see the telemetry including the expected baselumen with value 600!


So it’s quite simple to provide reference data.

Using updated reference data, upload new file

But what if a few days later, or even a few hours later, the reference data changes?

Let’s add a second file:


The update has an altered baselumen, now 700.

Save it, with the same filename but a slightly changed path, so it will become active twenty hours later:


So now we have two files active:


And again, after passing some telemetry we get updated telemetry with baselumen 700:


So we have seen how reference data can be added and used in StreamAnalytics. And we have proven that updated reference data will be used once the start date  of the data is reached.