Introduction.
If the system fails and freezes (hereafter referred to as “hangs up”), a reboot or similar action is required.
It may be physically difficult to reset the system due to inoperable timing, system installations that are not easily accessible to people, or perhaps a shortage of personnel that makes it difficult to find the right people to handle the system.
Since the system is stopped, it will be difficult to operate it remotely.
This is where the watchdog (WD) mechanism comes in.
The WD’s role is simple: monitoring.
The role of the system is generally checked periodically and if the system goes out of order, a reset is applied to the system.
This time, I set up WD on the industrial Raspi PL-R4.
We have summarized as clearly as possible the differences between WD, which is located on its own, and WD, which is built into the microprocessor chip.
This is one of the functions that are required in industrial applications, as an external WD can reliably reset the device.
Watchdog is a periodic monitoring function
WD is also known as WDT (watchdog timer). The word “watchdog” itself means a watchdog in English, and if it is not a dog, it means a watcher.
In other words, it is a function (mechanism) that periodically monitors a specific location. So it is watching like a watchdog.
What the watchdog timer is watching for is whether the system is working properly.
There are two patterns of WDs: one is installed externally as an IC component, and the other is built into the microprocessor chip.
Many common PCs are now built inside the product. In the old days of microcomputers (PC-98, 88), a reset button was openly attached to the main unit, and a person had to press the reset button to restore the system in case of abnormality.
Advantages of External WD
The commercial Raspberry Pi has a WD built into the microprocessor chip. This is a method that can be controlled by a configuration file.
On the other hand, the industrial Raspi “PL-R4” WD is implemented as an external WD; it is also designed to enable/disable WD functions via DIP switches.
The biggest advantage of offsite placement is that the WD will operate reliably as long as the power is on.
This is because the internal WD is located inside the microprocessor chip, so if the WD timer itself stops, the WD counts that are running on it will also stop, resulting in a non-functional pattern.
This will not reset the system and will require human access to the device itself to resolve the issue.
External WD is necessary to avoid this situation in industrial applications, where some device locations may not be easily accessible.
To mount WD externally on a commercial Raspberry Pi, it can be realized with an expansion HAT, but to install it on the main board is difficult due to board size issues and increased cost.
Because it is an industrial Raspi, the WD is also installed outside, ensuring a nearly hang-up-free environment.
Role and function of the watchdog
It is somewhat difficult to say what WD actually does, but it can be said that it just measures a fixed interval in the form of a counter (time).
To use a very abstract analogy, suppose you knock on a door and it responds after one second. Imagine that you knock on the door regularly, and at some point, after 2 or 3 seconds, there is no response, and after 5 seconds, you recognize this as an abnormal situation and instruct the door to be opened (restarted).
WD, as in WDT (watchdog timer), always makes a count (timer) at a fixed interval and a process to reset the count repeat.
This is the same as confirming a person’s safety.
How are you?” and visiting them at regular intervals.
Usually the person answers, but today, no matter how many minutes you wait, he or she doesn’t answer. That’s worrying, isn’t it?
WD is the role and function of the WD to determine that an unusual condition, that is, it is an abnormal situation.
WD Timer Datasheet
The timer of the external WD in the industrial Raspi “PL-R4” is as follows
① is the time the WD waits when the system boots. As the ’60 180′ indicates, there is a slight variation, ranging from 1 to 3 minutes.
In practice, it takes about 2 minutes (120 seconds) for WD to start working.
The ‘1 3’ in part ② is the number of seconds the reset will take. Again, there is some variation, ranging from 1 to 3 seconds.
In practice, the system reset takes about 2 seconds.
To avoid exceptions and ensure reliable operation, programming will assume less than 60 seconds and less than 1 second, respectively.
It is responsible for continuing normal operation by continuously resetting the WD counter in a time shorter than the time it takes to reset the system.
If, due to some anomaly, the WD counts are not reset, then the system resets (reboots).
WD settings (for PL-R4)
The following configuration method differs from that of the commercially available Raspberry Pi. This is the setting for the “PL-R4”.
The WD preinstalled in the Raspberry Pi OS is a software watchdog in the microprocessor chip. It is not directly involved in this case.
For the “PL-R4” you will need the following
- Switching DIP switches
- Edit config.txt
- Execution of proprietary programs
Switching DIP switches
Enable the WD function by turning the first of the two dip switches on the board to the ON side, respectively.
Specific locations are shown in the photos.
WD is not enabled by simply switching the switch.
Then modify config.txt and set up your own Python program to run automatically.
Edit config.txt
If WD/CSI in the [all]
section of config.txt is commented out, uncomment it and enable it. This is exclusive use with CSI in the camera module.
sudo nano /boot/config.txt
The following changes are to be made.
Check to see where it is enabled and not commented out.
[all]
dtparam=i2c0_baudrate=400000
dtoverlay=i2c0,pins_0_1
#WD/CSI select
dtparam=i2c1_baudrate=400000
dtoverlay=i2c1,pins_44_45
#dtoverlay=imx708
This is generally the opposite of the CSI port use used for the camera module.
Program Example
The last thing needed is a program that periodically resets the external WD.
As mentioned at the beginning of this section, the purpose is to measure whether a certain timer count.
This test program uses GPIOs 44 and 45 to reset WD periodically.
Python program example:
import RPi.GPIO as GPIO
import time
WDEPIN = 44
WDRPIN = 45
GPIO.setwarnings(False)
GPIO.setmode( GPIO.BCM )
GPIO.setup( WDEPIN, GPIO.OUT )
GPIO.setup( WDRPIN, GPIO.OUT )
while 1:
GPIO.output( WDEPIN, False )
GPIO.output( WDRPIN, True )
time.sleep( 1.0 )
GPIO.output( WDRPIN, False )
time.sleep( 1.0 )
In practice, it will be necessary to incorporate it into proprietary applications, etc.
For testing purposes, run the above wdrst.py
file at startup. /etc/rc.local
The description in rc.local should be commented out or deleted to avoid automatic startup.
Conversely, if you do not use WD, comment out or delete the description of rc.local so that it will not be started automatically.
Addition to rc.local:
#WD
python /home/pi/wdrst.py
exit 0
I won’t go into it here, but rc.local is one way to write a program to run automatically on the Raspberry Pi.
To add an entry to rc.local, it must be written before exit0, which is present from the beginning.
Now, after rebooting, the WD function is enabled.
I will check the operation as soon as possible.
Confirmation of WD operation
One way to check if WD works after a reboot is to actually hang the system to see if it reboots automatically.
To artificially hang the system, use commands that execute a forkbomb or cause a kernel panic.
fork bomb
Form bombs allow processes to be replicated one after another, making the total number of processes enormous and leaving no room for other processes to enter.
You will notice that the entire system slows down from the moment you run it.
If you are connecting via SSH, even SSH stops working immediately.
Fork bomb shell command:
【Caution】Do not execute if WD has not been successfully enabled.
:(){ :|:& };:
After execution, the following message is displayed in the terminal.
In this environment, it rebooted after 10~15 seconds. Success.
-bash: fork: retry: リソースが一時的に利用できません*1
-bash: fork: メモリを確保できません*2
-bash: fork: retry: リソースが一時的に利用できません*1
(continue)
*1 resource temporarily unavailable
*2 unable to allocate memory
The system was rebooted as a result of other processes hanging, not working due to resource exhaustion, and exceeding a certain count that was periodically checked by WD.
kernel panic
Kernel panics can also reproduce a situation where the system has stopped due to some anomaly.
Kernel panics, unfamiliar to Windows users, occur when an internal fatal error occurs.
I have also encountered this a couple of times with Raspberry Pi 3 and 4.
This phenomenon can be caused by either software or hardware side, but we have experienced it on the Raspberry Pi due to corruption of the microSD card. This can happen due to excessive rewriting. Not necessarily how many times, but I have experienced it even 3~4 times with bad luck.
The following commands can only be executed with root privileges, so use sudo su
to temporarily become a super user and execute them.
A command that causes a kernel panic:
【Caution】In some cases, execution may lead to system damage. Please execute in a test environment.
sudo su
echo c > /proc/sysrq-trigger
We were able to confirm that the system reboots in about the number of seconds set in either case.
If the WD is not functioning, the system will remain hung up and the only way to restart it is for a person to physically unplug the power.
Besides, if we don’t know the moment a problem occurs, it will be too late to deal with the recovery, and the system will be out of operation for some time.
Rebooting the system will provide a minimum recovery response.
Selecting the right device for the environment in which you operate
WD monitors the system to make sure it is working properly, just like a watchdog.
Most things, including this WD, can be tried on a commercial Raspberry Pi.
However, using the Raspberry Pi in industrial applications will require a more robust system.
This is because the environment in which they are installed varies, from high temperatures in some locations to locations that are not easily accessible to people.
During the operation verification of WD, we felt that if you are looking for a system environment that avoids sudden problems, WD implemented externally is a sure bet.
Unlike the commercially available Raspberry Pi, the PL-R4 could be understood as one of the reasons why the PL-R4 is specialized for industrial applications.
Although the method is different from this time, the internally configured WD can also be used on a commercial Raspberry Pi.
With internal WD, if the microprocessor chip stops in the first place, WD will also stop, which may result in events where the system cannot be restarted.
As you try, you may find that it is somewhat less certain.
After all, an external WD placed outside the system is a reliable system. For a hang-up-free system, the industrial Raspi is a reliable choice.
Article contributed by Raspida
Raspberry Pi information site that even non-engineers can enjoy using raspida.com a Raspberry Pi information site that even non-engineers can enjoy and handle. He contributes to the PiLink site about the Raspberry Pi for industrial use.