The Azure IoT Hub is your cloud gateway for ingesting telemetry.
The IoT Hub cannot persist incoming messages so these must be forwarded to other Azure services.
Traditionally, the messages are exposed over an Event Hub-compatible endpoint.
More recent, (non-functional) IoT Hub routing is added where specific Azure services can be connected as an endpoint:
At this moment we can define:
- The build-in endpoint (to keep the original way of distributing messages)
- Event Hub
- Service Bus (Topics/Queues)
- Storage account, blob storage (perfect for cheap cold storage)
Lately, a native endpoint for CosmosDB has been made available.
This takes the pain away of having to set up extra resources between the IoT Hub and CosmosDB, just to transport messages from one resource to another. This is mostly done using a Stream Analytics job or custom Azure Functions.
In this post, let’s check out the new endpoint.
When using IoT Hub routing, it all starts with adding a specific endpoint first and then consuming that endpoint for a specific route.
An example is a built-in endpoint named ‘events’ as a replacement for the Event Hub compatible endpoint:
We can route incoming telemetry (or IoT Hub-specific events) to this endpoint and add a condition if needed:
Note: you need to add this specific route if you are adding other routes to other endpoints. The standard behavior is that, when introducing routes, the Event Hub-compatible endpoint will act as a fallback route.
Let’s see what is needed for the ConmosDB endpoint:
Next to a unique endpoint name, you need to reference an existing CosmosDB collection in a database in an account.
You also need to mention the partition key.
To limit CosmosDB costs, it’s a best practice to query data from the context of a partition key. Here, it’s the combination of both the deviceId and the year/month of telemetry creation (you could even add the Day part to the key).
This way, we try to fill CosmosDB (logical partitions) in the smartest way so querying is still affordable. Check this documentation for more details.
Note: Want to know more about partitioning? Check out this video.
So, before we can add the endpoint, we need to add a CosmosDB account.
Creating a Free CosmosDB account
Note: If you already have an account, skip this part.
Note: This is not a full CosmosDB tutorial. If you are interested in CosmosDB details, check out this learning path on MS Learn.
Within the same resource group, I add this new CosmosDB account:
Cosmos DB exposes several APIs around the same data storage so we need to pick one:
Just for convenience, I used the NoSQL API. You are free to pick another one.
Note: Because IoT telemetry is typically identified as time-series data, you should try to handle it that way.
From there, a wizard is started:
After filling in an account name, I’m asked if I’m interested in the free tier. Let’s try to make this as affordable as possible. Yes, I am!
Although one of the strongest points of CosmosDB is the worldwide availability of data, I’m not interested at this moment:
I’m also not interested in redundancy or backups:
Let’s review the creation and start the creation of the account:
After a few minutes, the account is created, and we are allowed to create a container (inside a database):
Now, we add the database and container at once:
The database name and container name are straightforward (in lowercase).
Default, the partition key is set to ‘/id’…
Be sure to change this is a separate field. I use ‘/partitionKey’ here.
Why? because the ‘/id’ is the standard available unique document ID of each new document. Our partition key will not change for each device, for each month!
Press OK and see, in the Data Explorer, the container is created:
We are now ready to create and attach the CosmosDB endpoint.
Creating the CosmosDB endpoint and route
Go back to the IoT Hub and start creating the endpoint:
The dialog is smart enough to make all fields selectable. It even knows the name of the partition key.
Note: Because I use the Free tier of the IoT Hub, I’m allowed to only add one endpoint.
Once created, we consume the endpoint in a route. We keep the default data source and query:
At this moment, we have set up both an IoT Hub and CosmosDB and connected those two resources.
Let’s see how telemetry arrives inside the container.
Ingesting device telemetry
I sent telemetry from a Coffee maker simulation:
You can see the JSON message bodies being sent.
Once we check out the container, we see the message arrive:
But if we look closer, we see the body is showing gibberish!
This is actually the standard Base64 encoded body of incoming messages.
We can check this by decoding it:
Why is this?
Devices are free to send whatever message they want to expose: JSON, XML, encrypted data, compressed data, pictures, etc.
To support this, the Message body is just a byte array. Byte arrays are stored in base64 format to prevent corruption.
Can we fix this? Can we see the original JSON message?
This is a simple fix on the side of the sender, the device.
First, let’s start over with an empty collection.
We remove all ‘corrupted’ rows.
The Data Explorer has a nice trick for this, we just shorten the retention time to zero seconds using the settings page of the container:
Note: this is a neat trick but please don’t try this in production!
Do not forget to set this setting back to the original selection:
Now, we fix the code of the device:
message.contentType = 'application/json'; message.contentEncoding = 'utf-8';
Note: This functionality is both offered by the Azure IoT Device SDKs and the plain non-SDK MQTT support.
Now, execute the device code again:
If we check the incoming IoT Hub telemetry (eg. using the IoT Explorer or Visual Code) we see the encoding is part of the communication:
Now we check the Data Explorer again:
Yes, the data is arriving again, including the human-readable message body.
Notice both the system properties and application properties (eg. waterLevelAlert) are available too.
Although the IoT Hub integration for CosmosDB is in preview, the functionality is promising.
This direct endpoint takes away the need for custom plumbing (eg. an Azure Function) just to pass on the telemetry to the CosmosDB database.
If you are in need of custom mapping of enrichment with additional business logic, you have to keep relying on extra logic like Azure Functions, Azure Stream Analytics, or Azure Logic Apps.
Once messages arrive in CosmosDB, you can react to the events using the CosmosDB change feed.