Towards zero-touch IoT Edge with edgeAgent direct methods

Update: 27-9-2021 The method GetLogs is now called GetModuleLogs. It’s not experimental anymore.

The holy grail of IoT Edge compute is zero-touch configuration and monitoring.

If we look at the life cycle of an edge device, these are the phases where the device is rolled out to production:

The only time when we want to have a person near that edge device is during the initial deployment (Plan, Register), during decommission (Retire) and during physical changes or while repairing the device.

To make zero-touch possible we first need to have a secure cloud connection that supports both sending telemetry to the cloud and retrieving commands from the cloud. And that is supported by Azure IoT Edge by default.

But still, we also need a second communication channel to log-in remotely in a secure way. This is typically done by hand to look at local settings, to check logging, to check connections, or to make repairs to eg. the operating system or the Azure IoT Edge runtime. This could be done using SSH and/or a Remore Desktop connection (RDP). Because this is typically an outbound connection, this is usually provided using a ‘jump box’ or a VPN connection so the connection is set up in a more secure way.

As said, this is done by hand… so far for zero-touch.

Now, if we look at what tasks are performed on the IoT Edge device using an SSH connection:

  • Checking the log of running modules
  • Restarting modules if their performance is not thusted or to force picking up settings
  • Checking the cloud connectivity

What if exactly these three tasks could be performed from the cloud? What if these task could automated?

Let’s check this out.

Recently Microsoft release version 1.0.9 of the Azure IoT runtime. This version introduces the following edgeAgent direct methods:

  1. Ping
  2. RestartModule
  3. GetModuleLogs

Note: The GetModuleLogs method is still in an experimental phase. I tested it just now, it performs well.

Below, the execution of these edgeAgent direct methods are shown, together with the results they produce.

The Azure portal is used to demonstrate the calls. It should be easy to automate these calls.

Ping

The most simple method of the three methods is ‘Ping’:

The ping method is useful for checking whether IoT Edge is running on a device, or whether the device has an open connection to IoT Hub.

Call it like this:

Note: the name of the direct method is case-insensitive.

The body of the direct method is just an empty JSON message:

{}

If the edgeAgent is responding (which is a good thing), you receive this status:

In the logging of the edgeAgent module, you can see the method is called:

If the Ping fails, you get an answer like this:

{"message":"NotFound:{"Message": "{"errorCode":404103, "trackingId...

Note: I beautified the answer a little bit, the original answer is full of escape characters and line separators.

RestartModule

Another very useful edgeAgent direct method is this one, the ‘RestartModule’:

The RestartModule method allows for remote management of modules running on an IoT Edge device. If a module is reporting a failed state or other unhealthy behavior, you can trigger the IoT Edge agent to restart it.

Call it like this:

Note: the name of this direct method is case-insensitive.

The body of this direct method looks like this. I want to restart the default Microsoft simulated temperature sensor:

{
  "schemaVersion": "1.0",
  "id": "SimulatedTemperatureSensor"
}

If this module restart request is picked up, we get this status:

The logging of the edgeAgent module shows this output. We see the method being handled:

Of course, a misspelled module name could be provided:

{
  "schemaVersion": "1.0",
  "id": "NotExistingModule"
}

In that case, a clear error response is returned:

{
  "status":500,
  "payload": {
    "message":"Module NotExistingModule not found in the current environment"
  }
}

Restarting modules is not limited to your own modules. The edgeAgent module and the edgeHub module can be restarted too!

GetModuleLogs (depricated name: GetLogs)

The GetModuleLogs direct method is perfect to retrieve a (subset of) the most recent logging of modules as a response to the direct method call.

This third direct method is at this moment still in an experimental stage. This means it is not supported by the edgeAgent module without passing it an environment variable named ‘ExperimentalFeatures__EnableGetLogs’ first in a deployment manifest to your edge device.

This environment variable is not picked up without passing the ‘ExperimentalFeatures__Enabled’ variable too!

Note: make note of the double underscores.

So pass these two variables in the deployment manifest first before the direct method calls can be picked up:

Your deployment manifest looks like this:

…

# not needed anymore for GetModuleLogs and GetSupportBundle

"edgeAgent": {
  …
},
"env": {
  "ExperimentalFeatures__Enabled": {
    "value": "true"
  },
  "ExperimentalFeatures__EnableGetLogs": {
    "value": "true"
  }
},
…

Warning: once these variables are deployed, these are not picked up always! Due to the rather efficient update strategy of edgeAgent, the update could be ignored. You can force the update by switching between the more specific container URI (eg. 1.0.9) and the less specific URI (the 1.0 version) and back.

Once the environment variables are put in place, calling the ‘GetModuleLogs’ method is pretty straightforward:

Note: the method name is case-insensitive.

The body could look like this. In this case, I filtered the most recent log of the edgeAgent module up to the last 100 lines. And I expect the output to be text-based:

{
  "schemaVersion": "1.0",
  "items": [
    {
      "id": "edgeAgent", 
      "filter": { 
        "tail": 100 
      } 
    } 
  ], 
  "encoding": "none", 
  "contentType": "text" 
}

See the documentation for more samples. These options are selectable:

"filter": {
  "tail": int,
  "since": int,
  "loglevel": int,
  "regex": "regex string"
}

The output of the call is:

What you see is an array of logs coming from selected IoT Edge modules. The ‘id’ field can be a wildcard so receiving logs multiple modules is supported. Eg. ‘edge’ return log files from both the edgeAgent and the edgeHub in separate array items.

Each module comes with a payload which is the actual log text. Loglines are text based and separated by ‘/n’.

It’s not that hard to handle this output using source code if you are a programmer. I demonstrate how to handle the output using Visual Studio Code.

Start by copying the output from the Azure portal in a new file in Visual Studio Code.

We want to replace the ‘\n’ with a new line. This can be done using the Find and Replace dialog (CTRL-H):

  • The Find field must be filled with ‘\\n’
  • The Replace field must be willed with ‘\n’
  • Activate ‘use regular expressions’ by clicking the ‘.*’ button

This should look like this (Notice the blue ‘.*’ button):

After filling in the two field values, click the ‘Replace All’ button. The result looks like this:

Note: the actual log text is surrounded by the response JSON result. If needed, you can remove that JSON stuff first to get to the raw text output.

The Direct Message response size limitation is 128KB according to the documentation. Longer logs are truncated when the limitation length is reached.

UploadModuleLogs method based on GetModuleLogs and GetTaskStatus; UploadSupportBundle method

As stated in the GetModuleLogs documentation, it is related to the ‘UploadModuleLogs’ method.

This other method makes it possible to push the logging to some Blob storage container. See the documentation to see how the SAS URL of the storage account must be passed.

This is a little bit more complicated solution but great if you need to log larger log files (larger than 128Kb).

It follows a asynchronous pattern. The UploadModuleLogs only starts the upload. The ‘GetTaskStatus’ method helps you to find out what the current status of the logs upload is.

The same goes for uploading the support bundle from your edge device.

Please check out my new blog where the use of support bundles is explained in detail.

Conclusion

Although the edgeAgent direct methods are truly helpful, Zero-touch IoT Edge remains the holy grail for IoT Edgers.

These methods are the holy hand grenade in your quest finding true zero-touch.