Your own Cloud-IoT DIY project. Part 3: Device<->Cloud Communications.

11 min readAug 20, 2023

This is the third part of “Your own Cloud-IoT DIY project”. First part can be found here.

TL;DR: MQTT is the main protocol for cloud communication with documented topic naming convention and documented JSON schema for payload. MTLS API could be used for OTA updates and other heavy backend operations. Thing Model (endpoints, commands, etc) will be defined on device not in cloud.

Communication protocols

Our device main communication with backend will be organized with MQTT. I.e. device will send information (data, status, etc.) and receive commands using MQTT Client. So we need to design our communication over MQTT properly. We’ve already mentioned above that X509 certificates will be used for device authentication. But we also need to design MQTT topics and payload. AWS published MQTT design best practices so we’ll follow most of them.

An additional (and optional) communication between devices and cloud can happens over API with Mutual TLS (MTLS) authentication. This communication channel can be used to cover the use case of transmitting large chunks of data (as MQTT has limits on message size). The first candidate for such functionality is firmware Over-The-Air updates (OTA). Device certificate rotation could be also a good candidate.

MQTT communications

For our DIY project we’ll have three main information flows:

Telemetry data (typically referred as Data Plane) — device-to-cloud flow with sensors data.
Status data — device-to-cloud flow of any information other than sensors data . This flow is separated from Telemetry to simplify data handling on the cloud side.
Commands (typically referred as Control Plane) — cloud-to-device flow with various commands and other data to be send to devices. Note that we’ll not use this plane for command response from device. Instead the device will respond either in Telemetry or Status flow.

Note that we don’t have device-to-device flow in our design. This could be added later but ignored for now.

MQTT Client ID

We’ll follow best-practice and use Thing Name as MQTT Client ID. This adds some restrictions to Thing Names including requirement to be unique for this application at least.

MQTT topics

AWS recommended to use this topic structure convention: <Plane>/<application>/<context>/<thing-name>/<data-type> which we’ll follow with minor change.

In our scenario we’ll use:

‘dt’, ‘sdt’ and ‘cmd’ as short name for our three flows described above
‘diyiot’ as short name for application
information about thing location, thing group (diy) and thing type as <context>, like <building_id>/<in building location_id>/<thing type>. Note that we’ll use ‘b001’ as building id instead of address or even mnemonic description like ‘vacationhouse’. This is important as you should not expose personally identifiable information (PII) in the topic name! Same rule applies to in building location id — it’ll be better loc001, loc002, etc instead of bedroom, bathroom, etc.

So our final MQTT topic name convention will be:

<plane_id>/diyiot/<building_id>/<location_id>/diy/<thing_type>/<thing_name>

where plane_id has value of “dt” or “sdt” or “cmd”

Note that we included standard Thing Group ‘diy’ into topic name. This will help us distinguish DIY devices from other ones in our AWS account.

Note that ‘thing_type’ included into the topic has to be the thing type created with AWS IoT. Creation of the thing types with AWS IoT will be the responsibility of our Lambdas but all of the things when provisioned will be of the same type generic which is created during project bootstrapping.

Note that we don’t include <data type> in topic name — we’ll send all data collected by device as one MQTT message. You can obviously have topic name larger and include more information into it. However be careful as AWS has hard limit on number of slashes in topic name — 7 (see here).

Note that we use same topic naming convention for device-to-cloud and cloud-to-device communication.

MQTT message payload

We’ll be using JSON for message payload format (multiple reasons for that includes simplified data handling on the cloud side). And we obviously have different payload format for different Planes.

Simplest format will be for Data Plane:

{
  "<endpoint name>|<data units>|<value type>": <data value>
}

Where <endpoint name> will be self-explanatory like “air-temperature” and includes predefined data type (like “temperature” as a suffix separated with hyphen). So <endpoint name> format by itself is <description>-<data type>.
Second part (separated with “|”) — <data units> will be limited by enum and have values like “C”, “mPa”, etc.
Third part (separated with “|”) describe data value type.
Data value type is any Python simple type — it can be int, float, string, etc.
An example of Data Plane message will be:

{
  "air-temperature|F|float": 76.1,
  "atm-pressure|mmHg|int": 760,
  "device-surface-temperature|C|string": "Normal"
}

Note that we introduced naming convention for the field names. This is maybe not the best way but helps to keep messages structure flat. This also provides us an opportunity to define endpoint on the device side (at the cost of longer filed names and larger MQTT traffic).

Note that with this format we agree that device will send only one set of data per message! I.e. we’ll not support sending multiple measurements of the same parameter in one message.

Note that we do not include Thing Name or Thing Type to the message payload as we agreed that them will be part of the topic name and MQTT Client Id will be equal to Thing Name.

Payload format for Status Plane is also simple but less formal:

{
  "content": "<name of predefined status message content>",
  "data": <object with data specific to this content>,
  "session-id": "optional <id for command/status messages exchange session>",
  "res-topic": "optional <name of the topic to send a response>"
}

“session-id” and “res-topic” are two new parameters used for back and force communication. For example some content message from device will need a confirmation or some command from cloud will need a response. Value for “res-topic” will be standard Control topic but we’ll be able to change that in the future.

An example of Status Plane message payload used to publish Thing Model info to the cloud :

{
  "content": "update-model",
  "session-id": "2db0b7f9-729c-4e3d-8119-53c76524b602",
  "res-topic": "optional <name of the topic to send a response>",
  "data": {
    "measuringInterval|ms|int": 1800000,
    "attributes": {},
    "dataFieldNames": [ "air-temperature|F|float", "atm-pressure|mmHg|int"],
    "cmdFieldsByCommand": {
      "change-measuring-interval": ["measuring-interval|min|int"],
      "rollover-certificate": ["device-certificate", "private-key"]
    },
    "statusFieldsByType": {
      "update-model": ["attributes", "dataFieldNames", "cmdFieldsByCommand", "statusFieldsByType", "measuring-interval|min|int"]
    }
  }
}

Note that with this example we implicitly state that device will be source of truth for Thing model in the cloud. This is NOT typical for enterprise-level system where Thing configuration is defined in the cloud and then pushed to the device. The reason of this difference is DIY specific needs and limitations on our cloud UI:

enterprises typically have number of same devices deployed and will benefit from defining the model once and push it to the devices. With DIY IoT each of your device will be probably unique (different firmware versions, different sensors, commands, etc.)
when new DIY device or same one with new sensors is provisioned no changes on Cloud side is required
proper definition of Device (and overall IoT system) Model will require much more complicated cloud backend and UI with ability to handle whole system versioned ontology.

However this approach obviously make overall system more chaotic which is not good even for DIY so you need to control/remember what command/data format your device is using to reduce diversity and make system more manageable. To synchronize potential firmware changes with cloud backend the device will send “update model” message on every change (or even every boot) so cloud backend will be able to adapt to potential changes (same message can also be used as device ‘heartbeat’).

Another important example of Status Plane message payload is sending Alarm to the backend. One of the decisions made was NOT introducing dedicated Plane for Alarms. This may be questionable but with respect to Thing hardware choices (simple inexpensive microcontrollers) data analysis and alarms generation on the edge (thing side) is not a main objective. With Status Plane we still have an opportunity to send alarms but do not spend extra resources on the backend side. When/if number of alarms generated by Thing become large separate Alarm Plane may be introduced. For now we’ll just use this message payload:

{
  "content": "alarm",
  "data": {
    "ack|required|bool": true,
    ...
  },
  "session-id": "optional <alarm id to get ACK from the backend>",
  "resp-topic": "optional <name of the topic to send ACK>"
}

Note that the only standard data field here is “ack|required|bool” which clarify if the Thing will wait for Alarm ACKnowledgment.

Payload format for Control Plane will be very similar to Status Plane with name command instead of “content”. Value for “res-topic” will be standard Status topic but we’ll be able to change that in the future.

{
  "command": "<name of predefined command>",
  "data": <object with data specific to this command>,
  "session-id": "optional <incoming command id this message responding to>",
  "resp-topic": "optional <name of the topic to send a response>"
}

MQTT broadcast messages

Sometimes it’s needed to broadcast a message to all devices in your application. So in addition to Control Plane topic discussed above all devices will also subscribe to topic <plane_id>/<application> (where <plane_id> will be “cmd” and <application> will be “diyiot”).

Payload format for this topic will be the same as standard Control Plane.

AWS IoT Basic Ingest

To make or solution cost-effective we’ll be using special feature of AWS IoT Core — AWS IoT Basic Ingest which per AWS documentation:

Basic Ingest optimizes data flow for high volume data ingestion workloads by removing the pub/sub Message Broker from the ingestion path. As a result, you have a more cost-effective option to send device data to other AWS services while continuing to benefit from all the security and data processing features of AWS IoT Core.
In cases where devices do not require the publish and subscribe functionality of the Message Broker, Basic Ingest enables you to send data to cloud services through the Rules Engine.

We’ll be using Basic Ingest for our Data Plane so the final default topic name for our Data Plane will be:

$aws/rules/TelemetryInjectiondiyiot/dt/diyiot/<building_id>/<location_id>/<thing_type>/<thing_name>

Note that this functionality can be turned Off/On and Basic Ingest prefix updated with command from the cloud.

Bootstrap topics and workflow

We need one few very special topic s— bootstrapping (used in “Provisioning by Claim” workflow). These topics has just one purpose — support device provisioning.

With AWS we’ll be using “fleet provisioning” functionality which has very specific topics (AWS proprietary topic format) and message payloads.

Important note — device will use common ‘claim certificate’ when connecting to the host for the provisioning. That ‘claim’ or ‘bootstrap’ certificate is generated by AWS IoT Core during project bootstrapping and can be delivered to device either through local API or hardcoded.

The very first one where we’ll publish is:

$aws/certificates/create/json

We’ll post message with empty payload to this topic and expect to receive back the message with device certificate, private key and token as JSON document in the topic $aws/certificates/create/json/accepted

{
    "certificateId": "string",
    "certificatePem": "string",
    "privateKey": "string",
    "certificateOwnershipToken": "string"
}

To correctly handle possible errors we also need to subscribe to

$aws/certificates/create/json/rejected

If due to some reason our certificate request will be rejected, we’ll receive a message on that topic with payload

{
    "statusCode": int,
    "errorCode": "string",
    "errorMessage": "string"
}

After receiving and storing device certificate and key in the nvs we’ll use another topic to finally provision our device. First we’ll post to the topic

$aws/provisioning-templates/<templateName>/provision/json

the message of format

{
    "certificateOwnershipToken": <certificateOwnershipToken collected>,
    "parameters": {
        "appName": <name of our application - 'diyiot'>,
        "thingName": <unique name of our new thing>,
        "thingSerial": <our thing MAC address>,
        "thingGroup": <default one 'diy' or anything else>,
        "thingType": <provided atconfiguration process>,
        "buildingId": <provided atconfiguration process>,
        "locationId": <provided atconfiguration process>
    }
}

Parameters provided in the message payload will be used by our Lambda function (provisioning hook) to allow/disallow this thing registration (see details in the cloud section).

Then we’ll listen (subscribe) for messages on topic

$aws/provisioning-templates/<templateName>/provision/json/accepted

for confirmation of success in the form of

{
    "deviceConfiguration": {
        "string": "string",
        ...
    },
    "thingName": "string"
}

and we’ll also subscribe for the topic

$aws/provisioning-templates/<templateName>/provision/json/rejected

to handle possible provisioning errors in format

{
    "statusCode": int,
    "errorCode": "string",
    "errorMessage": "string"
}

All of this may look complicated but if we’ll describe it with sequence diagram it’s really simple:

“Provisioning by claim” sequence diagram without error handling

Description of other options for thing registration with AWS IoT Core can be found here. However note that some AWS documents has error in the topic name (created instead of create). This reference can be used to verify topic names. And another important note is that your MQTT client has to explicitly set client_id when connecting to IoT Core (otherwise certificate may not be returned).

Last but not least — we need to define a <templateName> (and template itself) for our provisioning process. To simplify our solution all of our devices will be using just one template named after the project:

templateName = f"{thingGroup}_template"
# for default gorup it'll be
templateName = "diy_template"

This is not exactly typical (template expected to be the description of the thing model). But as we discussed above we want the device will be the source of truth so status message will be used for model definition.

Final summary on device provisioning

Top level device provisioning flow was covered in previous section. Here we just summarize flow with respect to MQTT topics and “fleet provisioning” described.

As described before the device at start will try to identify if it has required credentials (cert and keys) and parameters (urls, name, etc) to start communicating with cloud backend. If not — device will start provisioning process as described in the previous session, but with detail on MQTT this provisioning process is more smart:

If device doesn’t have required parameters or ‘claim’ certificate it’ll start local web-server with API and html form
If parameters and ‘device’ certificate/keys provided — provisioning completed and device will send status message with its model
If parameters and ‘claim’ certificate provided — fleet provisioning mqtt-exchange will start and ‘device’ certificate/key collected from backend. After that provisioning completed and device will send status message with its model

From the diagram below you can see that we finally have extremely flexible provisioning process which cover most of scenarios from ‘trusted app’ to fully automated fleet provisioning (if claim certificate added during software upload).

Smart provisioning process flow on device

MTLS API

General idea of MTLS API usage will be:

Cloud Send a command with MTLS API reference
Device communicate to MTLS API to receive or send information (like sending collected historical data or receiving OTA update)
Device confirms successful MTLS API communication over MQTT
Edge applications backend (chat API for example), where applications are operating on behalf of device (not on behalf of user)

For this project MTLS API will be partially a “future option”. MTLS API implementation will be included to Firmware and Cloud Backend. Endpoint for collecting update for device (update size is limited to 6Mb) is also available. However implementation of particular commands flow one the device is omitted for now.

Note that devices will be able to communicate with MTLS API using thing certificate as well as bootstrap certificate. This may be helpful for ‘pre-provisioning’ firmware updates.

Previous topic: Your own Cloud-IoT DIY project. Part 2: On Premises.

Next topic: Your own Cloud-IoT DIY project. Part 4: Cloud backend and helper tools.