Introduction to the IoT Edge SDK, part 5: Watchdog

In the past weeks, I have shown how to work with the basics of the current Azure Edge SDK.

Today we will look at a more specific need, a watchdog.

Using this watchdog, we have more control over the quality of the service.

I created my watchdog when I discovered I was depending on the time-out of an HTTP request to find out the sensors were not working correctly. These timeouts can take a long time (there is an example of 100 seconds).

Because I expect a message every five seconds, I want to be warned if a message is not generated in nine seconds!

This blog is the fourth part of a series:

  • Part one, how to use modules, gateway configuration and the broker
  • Part two, connecting to the IoT Hub
  • Part three, message to device
  • Part four, IoT Hub Routing
  • Part five, Watchdog

What is a watchdog?

According to Wikipedia, a watchdog is

“is an electronic timer that is used to detect and recover from computer malfunctions”

So our watchdog needs a timer and a way to detect a malfunction.

Architecture

My idea is to listen to the messages coming in. Every time a sensor generated a message, I remember when it is last seen.

And I check very often this list of last seen moments so I can find out a message is missing so a sensor has stopped sending messages. And when a message is received again, I want to notify my user again!

This means that I need an extra message type which makes it possible to send both error messages and information messages:

The Watchdog module

I have written the following module:

public class WatchdogModule : IGatewayModule, IGatewayModuleStart
{
  private Broker broker;
  private ConfigWatchdog _configWatchdog;
  private Dictionary<string, DeviceHistory> _DeviceHistories;

  public void Create(Broker broker, byte[] configuration)
  {
    this.broker = broker;
    var config = Encoding.UTF8.GetString(configuration);
    _configWatchdog = JsonConvert.DeserializeObject<ConfigWatchdog>(config);
    _DeviceHistories = new Dictionary<string, DeviceHistory>();
  }

  public void Start()
  {
    var oThread = new Thread(new ThreadStart(ThreadBody));
    oThread.Start();
    Console.WriteLine("WatchdogModule started");
  }

  public void Destroy()
  {
  }

  public void Receive(Message received_message)
  {
    if (received_message.Properties["source"] != "sensor"
        || received_message.Properties["route"] != "sensor")
    {
      return;
    }

    var dynamicMessage = JsonConvert.DeserializeObject<dynamic>(Encoding.UTF8.GetString(received_message.Content));

    var now = DateTime.Now;

    var macAddress = received_message.Properties["macAddress"];

    if (_DeviceHistories.ContainsKey(macAddress))
    {
      if (_DeviceHistories[macAddress].Flagged)
      {
        var data = new Dictionary<string, string>
        {
          { "source", "sensor" },
          { "route", "error" },
          { "macAddress", macAddress },
          { "version", "1" }
        };

        var dataMessage = new ErrorMessage
        {
          code = (int)ErrorCode.Watchdog,
          severity = (int)ErrorSeverity.Info,
          text = macAddress,
        };

        var message = JsonConvert.SerializeObject(dataMessage);
        var messageToPublish = new Message(message, data);
        this.broker.Publish(messageToPublish);

        Console.WriteLine($"Watchdog flag turned down for {macAddress}");
      }

      _DeviceHistories[macAddress].LastSeen = now;
      _DeviceHistories[macAddress].Flagged = false;
    }
    else
    {
      _DeviceHistories.Add(macAddress, 
                           new DeviceHistory 
                           { 
                             MacAddress = macAddress, 
                             LastSeen = now, 
                             Flagged = false
                           });
    }
  }

  public void ThreadBody()
  {
    while (true)
    {
      var now = DateTime.Now;

      foreach(var deviceHistory in _DeviceHistories)
      {
        var history = deviceHistory.Value;

        if (!history.Flagged
            && history.LastSeen.AddMilliseconds(_configWatchdog.timeout) < now)
        {
          history.Flagged = true;

          var data = new Dictionary<string, string>
          {
            { "source", "sensor" },
            { "route", "error" },
            { "macAddress", history.MacAddress },
            { "version", "1" }
          };

          var dataMessage = new ErrorMessage
          {
            code = (int)ErrorCode.Watchdog,
            severity = (int) ErrorSeverity.Error,
            text = history.MacAddress,
          };

          var message = JsonConvert.SerializeObject(dataMessage);
          var messageToPublish = new Message(message, data);
          this.broker.Publish(messageToPublish);

          Console.WriteLine($"Watchdog barks at {history.MacAddress}; Flagged:{history.Flagged}; LastSeen:{history.LastSeen}");
        }
      }

      Thread.Sleep(_configWatchdog.interval);
    }
  }
}
public class DeviceHistory
{
  public string MacAddress { get; set; }
  public DateTime LastSeen { get; set; }
  public bool Flagged { get; set; }
}

public class ErrorMessage
{
  public int code { get; set; }
  public int severity { get; set; }
  public string text { get; set; }
}

public enum ErrorCode
{
  Read_Sensors = 1,
  Watchdog = 2,
}

public enum ErrorSeverity
{
  Error = 1,
  Info = 2,
}

What does this Watchdog module do?

First, we read from the configuration the timeout, (9 seconds), the interval to check on (every half a second) and the mac-addresses of the sensors we want to check.

When the module starts, we start a timer and this timer checks every half a second an internal list of sensors which have sent a message.

If within this timer, the interval between the timestamp of the previous message and the current timestamp is too big, the watchdog starts to bark (and we flag the module to prevent we keep barking on this device).

If a  message is received, our registration of sensors is updated. When the sensor of the message received is flagged, we unflag it and sent an ‘Info’ message to inform that our Watchdog has stopped barking at this sensor.

Registering the module

One the Watchdog module is written, we have to register the class with:


{
"name": "watchdog_module",
"loader": {
"name": "dotnet",
"entrypoint": {
"assembly.name": "DotNetModuleSample",
"entry.type": "SensorModule.WatchdogModule"
}
},
"args": {
"watchdogDevices": [
{
"macAddress": "AA:BB:CC:DD:EE:FF"
}
],
"interval": 500,
"timeout": 9000
}
},

and we have to add the routing of the module in the Edge SDK:


{
"source": "sensor_module",
"sink": "watchdog_module"
},
{
"source": "watchdog_module",
"sink": "identity_map_module"
}

So the same message from the sensor module is sent both to the IoTHub and this Watchdog module.

Routing

I added an extra EventHub endpoint and an IoTHub Route for these error messages:

So, for each error message (with severity Error or Info), an Event is added to the EventHub (registered as ErrorEndpoint).

Azure Function

We need the following code to react on the incoming Error EventHub events:

using System;
using Newtonsoft.Json;

public static async Task Run(string myEventHubMessage, IAsyncCollector<Notification> notification, TraceWriter log)
{
  log.Info($"on Error trigger function processed a message: {myEventHubMessage}");

  var errorMessage = JsonConvert.DeserializeObject<ErrorJsonObject>(myEventHubMessage);

  if (errorMessage.code == 2 && errorMessage.severity == 1)
  {
    log.Info($"Push Alarm: {errorMessage.deviceName} Watchdog");
  }

  if (errorMessage.code == 2 && errorMessage.severity == 2)
  {
    log.Info($"Push info: {errorMessage.deviceName} Watchdog");
  }
}

public class ErrorJsonObject
{
  public string deviceName { get; set; }

  /// <summary>
  /// 1 = Module telemetry, 2 = watschdog
  /// </summary>
  public int code { get; set; }

  /// <summary>
  /// 1 = error, 2 = info
  /// </summary>
  public int severity { get; set; }

  public string text { get; set; }

  public DateTime dateTime { get; set; }
}

This results in the following Azure function log output:

2017-09-24T21:51:57.039 Function started(Id= bf0eed29 - efa6 - 4019 - 9d1c-ce7e8038e20b)
2017-09-24T21:51:57.070 on Error trigger function processed a message: {"code":2, "severity":1, "text":"Example error occurred","dateTime":"2017-09-14T19:39:06.7836877Z","deviceName":"Device155"}
2017-09-24T21:51:57.226 Push Alarm: Device155 Watchdog
2017-09-24T21:52:07.906 Function started(Id= c49c9774 - 6074 - 4f83-b875-d92926232516)
2017-09-24T21:52:07.906 on Error trigger function processed a message: {"code":2, "severity":2, "text":"Example error occurred","dateTime":"2017-09-14T19:39:06.7836877Z","deviceName":"Device155"}
2017-09-24T21:52:08.000 Push info: Device155 Watchdog

Conclusion

Using the Azure Edge SDK it’s so easy to build logic like a watchdog. But be aware, my example is a bit simplistic.

I implemented this simple logic with a push notification call. That is so easy to do, and what fun to receive notifications on your telephone!

But if a certain sensor has a bad connection and it just skips every second message, you will be kept informed every few seconds by the Watchdog it’s missing, it’s back, it’s missing, it’s back, it’s missing, it’s back, it’s missing, it’s back, it’s missing, it’s back, etc.

So try to limit the number of messages over time…

 

Advertenties