Case Study: The Most Important Thing Every IoT System Must Have
Avoid bankruptcy by installing this small module to your IoT device! Imagine what will happen if you have thousands of devices already installed and their firmware cause a system hang? Are you going to visit them one by one? Or?
The Long Story Short
The idea I’m going to share with you emerged as a solution of a real problem. We are a team of engineers and we love solving problems. Two years ago we started working on an IoT device for a project called ConnectedBin. The device is a battery-powered sensor that is retrofitted to garbage bins. It sends the fill level of the container as well as it constantly checks for fire and bin tip over and pushes an alarm notification if one of these events happen. The data is visualized on a dashboard which is used by garbage collecting companies to optimize their millage.
The most important feature of that device is it should be robust and vandal proof. This means two things: it should be encapsulated, hard to disassemble from the bin and its battery life should be at least 5 years. Once installed it is very hard to remove it. Everything was good, almost perfect, until we installed first devices ‘in the field’. They worked few weeks and stopped. ‘Few weeks’ was very far from the promised ‘5 years’. Initially, we thought that the batteries were exhausted, but we were wrong. Devices we disassembled were fully operational after power-on reset. It was clear that all installed devices needed a reset on regular time. The only problem with it was that some of them were installed 260km away from our office. We started debugging and looking for the error in the source code that caused the problem. Everything was implemented according to the best practices. The internal watchdog timer that was supposed to monitor the main loop was set up and it did its job perfectly. The electronics and PCB designs were done following all rules and standards avoiding system crash by electromagnetic interference. The Problem Finally, after a week we found it! It was really stupid bug – an overflow. It caused the devices to stop sending data exactly after 45 days. For engineers: that was the amount of seconds that can be stored in a 16-bit unsigned integer. The problem was that after 45 days the device was fully operational but its ability to send data was compromised. The most important feature of an IoT device is to send data not to spin in a loop, right? The bug was fixed, job was done, but from that moment on somehow we were never sure that it won’t happen again. Any future change in the source code may cause thousands of devices to stop connecting to the server which means that they will never be updated with a fix. If a problem of that kind occurs, the solution would be manual fixing of thousands of devices until now… The Solution The solution looks quite simple: We need someone or something to restart our devices periodically or in case there is no successful transmission for a longer than expected time. For always-online systems this task is a piece of cake, however for devices that are in deep sleep mode most of the time and transmit data 2-3 times a day, it becomes quite difficult. What’s more, it was not a good idea to implement this monitoring in the main microcontroller since it could be compromised by its own firmware.
The final solution we chose was an external WDT which has a very long window (few hours) and power consumption less than 1µA. Unfortunately, the watchdog timer that is already on the market with such low power consumption, was with maximum time interval of 2 hours. This is the reason we developed a custom watchdog timer with a 6-pin microcontroller. For convenience we also developed a small breadboard-friendly PCB which can be retrofitted in existing devices. What’s Our Goal IoT looks easy, but only for those who are not deep in that business. We know that a lot of people struggle and need simple solutions like the described above. Our ultimate goal is to share this solution with as many people as possible, so we can reach a big enough volume to design a custom chip.