Azure IoT Edge Blob module posts BlockBlobs blocks dosed in Storage

Already last year, I wrote a blog about the Azure Blob storage for IoT Edge module. Back then it was just in preview but just now it’s generally available.

The module still provides the same functionality: you can read and write to blob storage with the same SDK and programming modal you use for handling blobs in the Azure Cloud Storage:

There are some limitations regarding the API to use for the Blob module (eg. no support for lease blobs) but there are also extra features.

The most interesting feature is:

It enables you to automatically upload data to Azure from your local block blob storage using deviceToCloudUpload properties

Yes, you can configure the blob storage module running on your IoT Edge device to automatic upload blobs to the cloud. This is a great data pump!

Microsoft enumerates some advantages in their documentation. For me this is the ideal way to move raw data with low priority to the cloud in a cheap but reliable way without much effort.

I was especially interested in the BlockBlob synchronization:

The module is uploading blob and internet connection goes away; when the connectivity is back again it uploads only the remaining blocks and not the whole blob.

This is potentially the most efficient solution, especially for large files.

Let’s check out how this works.

This is the architecture of my test:

We need a custom module which fills blobs with blocks locally. The local blobs are automatically synchronized with an Azure Blob Storage (after configuration). And we check the arrival of the blobs using the Microsoft Azure Storage explorer.

Note: we do not have to configure any IoT Edge routes to bind these modules.

Configuring the Blob module

First we need to deploy the Azure Blob Storage on IoT Edge module. It’s available on the IoT Edge Marketplace. Just follow my previous blog to see how this is done.

Once the blob module is deployed, it has to be ‘connected’ to an actual Azure Blob Storage in the Azure cloud. This is done with desired properties:

{
  "properties.desired": {
    "deviceAutoDeleteProperties": {
      "deleteOn": true,
      "deleteAfterMinutes": 5,
      "retainWhileUploading": true
    },
    "deviceToCloudUploadProperties": {
      "uploadOn": true,
      "uploadOrder": "OldestFirst",
      "cloudStorageConnectionString": "DefaultEndpointsProtocol=https;AccountName=blobtestnestorage;AccountKey=[key of storage in the cloud];EndpointSuffix=core.windows.net",
      "storageContainersForUpload": {
        "updatedblockblobcontainer": {
          "target": "uploaded"
        }
      },
      "deleteAfterUpload": true
    }
  }
}

We configure:

  • the access to the Azure Blob Storage account,
  • the ability to upload blobs,
  • the ability to delete files after upload,
  • the “retain while uploading” ability to prevent incomplete uploads

See the documentation for more detailed information.

There are two main possible scenarios:

  1. Local files are always deleted after x minutes, already synchronized with the cloud or not.
  2. Local files are deleted after x minutes when synchronized with the cloud or at a later moment when synchronization eventually succeeds.

The first scenario seems only interested when you are almost out of disk space or the value of the data deprecates quickly so it’s not interesting to upload the data anymore.

Note: Watch this video for more information about the different scenarios and how to configure them.

The blob module is now up and running. Let’s generate blobs.

The Generator module

For this test, I created this simple example of uploading blocks to BlockBlob blobs.

Every minute, it create a new blob (due to timestamp (limited to minutes) in the filename creation). Every five seconds, a message is added to the blob as a block:

using System;
using System.IO;
using System.Runtime.Loader;
using System.Text;
using System.Threading;
using System.Threading.Tasks;
using Microsoft.Azure.Devices.Client;
using Microsoft.Azure.Devices.Client.Transport.Mqtt;
using Microsoft.WindowsAzure.Storage;
using Microsoft.WindowsAzure.Storage.Blob;
using System.Collections.Generic;
using System.Linq;
class Program
{
private static MemoryStream _stream;
private static StreamWriter _writer;
private static string _currentFileName;
private const int _interval = 5000;
private static UInt16 _counter;
private static CloudBlobContainer _container;
static void Main(string[] args)
{
Init().Wait();
var cts = new CancellationTokenSource();
AssemblyLoadContext.Default.Unloading += (ctx) => cts.Cancel();
Console.CancelKeyPress += (sender, cpe) => cts.Cancel();
WhenCancelled(cts.Token).Wait();
}
public static Task WhenCancelled(CancellationToken cancellationToken)
{
var tcs = new TaskCompletionSource<bool>();
cancellationToken.Register(s => ((TaskCompletionSource<bool>)s).SetResult(true), tcs);
return tcs.Task;
}
static async Task Init()
{
MqttTransportSettings mqttSetting = new MqttTransportSettings(TransportType.Mqtt_Tcp_Only);
ITransportSettings[] settings = { mqttSetting };
ModuleClient ioTHubModuleClient = await ModuleClient.CreateFromEnvironmentAsync(settings);
await ioTHubModuleClient.OpenAsync();
Console.WriteLine("IoT Hub module Generator client initialized.");
// Blob container construction and thread start
var account = CloudStorageAccount.Parse(
"DefaultEndpointsProtocol=https;BlobEndpoint=http://blob:11002/blobaccount;AccountName=blobaccount;AccountKey=[local storage key]");
var client = account.CreateCloudBlobClient();
_container = client.GetContainerReference("updatedblockblobcontainer");
_container.CreateIfNotExistsAsync().Wait();
var thread = new Thread(() => ThreadBody(ioTHubModuleClient));
thread.Start();
}
private static void ThreadBody(object userContext)
{
while (true)
{
WriteBlockBlob().GetAwaiter().GetResult();
Thread.Sleep(_interval);
}
}
private static async Task WriteBlockBlob()
{
_counter += 1;
var now = DateTime.UtcNow;
var filename = "File" + now.ToString("yyyyMMddHHmm");
var blob = _container.GetBlockBlobReference(filename);
var blockList = new List<string>();
if (blob.ExistsAsync().Result)
{
IEnumerable<ListBlockItem> existingBlockList = await blob.DownloadBlockListAsync();
blockList.AddRange(existingBlockList.Select(a => a.Name));
}
string blockId = Convert.ToBase64String(ASCIIEncoding.ASCII.GetBytes(now.ToString()));
if (filename != _currentFileName)
{
if (!string.IsNullOrEmpty( _currentFileName))
{
_writer.Close();
_stream.Close();
}
_stream = new MemoryStream();
_writer = new StreamWriter(_stream);
_currentFileName = filename;
}
_writer.Write($"We add a test for counter {_counter} at {now:yyyyMMddHHmmss}");
_writer.Flush();
_stream.Position = 0;
await blob.PutBlockAsync(blockId, _stream, null);
blockList.Add(blockId);
await blob.PutBlockListAsync(blockList.ToArray());
Console.WriteLine($"Block {_counter} written for file {filename} at {now:yyyy-MM-dd HH:mm:ss}");
}
}
view raw gistfile1.txt hosted with ❤ by GitHub

As you can see, I do not release the current blob before the minute is over (by closing the StreamWriter). So multiple blocks keep being added to the same file (normally 12 blocks per minute). So I can expect the file will be growing locally and growing in the cloud too due to the synchronization.

Deploy this module and we are ready to go.

Testing

Both modules are now up and running now:

Once the generator module is started, it starts generating messages in separate blobs:

I see the creation of two files. Let’s see these files on disk. I navigate to the local folder the blobs should be living in. And I look at the files in this folder:

Well, checking out these files appears to be harder than expected… But we know the blobs are synchronized constantly and they disappear almost immediately due to the deletion after being synchronized.

So therefor we do not see anything on the IoT Edge.

More important, is the synchronization working as expected. Is the data available in the cloud?

Let’s check out the Azure Blob storage account. We use the Azure Blob Storage Explorer for that:

Yes, the files are shown in the explorer so the synchronization to the cloud is active.

If I check the content of the first two file for the appearance of the blocks, it looks like what we expect:

We see block 1 til block 12 is written in file File201908281125.

We see block 13 til block 24 is written to the next file, File201908281126.

So the blocks are arriving in the correct order.

And the blocks are arriving almost immediately! I have recorded the arrival of blocks by looking at the size of the files. A demonstration can be seen here:

You see the arrival of blocks by looking at the size. The Blobs are growing while we look at them. And we can already download partially completed files to peek into.

Conclusion

This new synchronization feature of Azure Blob storage for IoT Edge is a useful tool.

The interesting part for me is the fact I can immediately access files in the cloud which are still ‘under construction’ on the edge.

There are a few points we have to consider:

  • This does not work with raw data like zipped files (or images, sound recordings, binary files, etc.). These files are only manageable while being complete. Partial zips and images are normally marked as corrupted and therefor not accessible
  • Formatted files (XML, JSON) are also ‘corrupted’ while being partially uploaded. But because these are still human-readable, perhaps it still contains to accessible data with value for you already.

So start using the new features of this Blob module when you want to upload large files to the cloud.

And sorry for the tong twister in the title. I could not resist the temptation 🙂