CANopen safeguards to decrease the effect of residual errors further

Data transfer over CANopen networks is extremely reliable, but CANopen also offers further safeguards in order to decrease the effect of residual errors further.

In many applications, complex safety add-ons have been designed on top of existing control systems, which has lead to significantly increased complexity and costs. The most significant increase of the performance level can be achieved by replacing all analog signal paths with digital communication: as long as an error can be detected and a control system can perform a reliable reaction, the error cannot cause any harm. When typical failures of systems are analyzed, it is obvious that analog signal paths are the weakest point of any control system.

Due to the fact that safety standards cover the entire life cycle of target systems, the design of modern control systems requires not only new technological solutions, but also an updated mindset. In addition to the design of theoretically safe systems, production and service processes must be updated in order to be able to reach the designed performance level (PL) and maintain it throughout the systems’ life cycle, as required by the standards.

This article explains the application level safeguards briefly listed in detail and reviews fundamental CANopen services, which decrease the effect of protocol level residual error probability without introducing any additional life cycle costs. Reviews make sense, because most of the presented concepts have already been not only standardized, but implemented in most of the devices on the market and are just waiting to be used.

Device profiles

Traditionally, each system integrator implements their ownlow-level controls and management of analog sensors andactuators into PLCs in addition to the application logic.This approach leads to some major drawbacks:

Proprietary software components are not well specified, because each company uses their own. Using software components only internally gives users a wrong feeling of their ”flexibility”, leading to unmanaged customizations from project to project. Finally, there are numerous components that are almost similar, leading to a need for tests and the certification of each of them individually.
Often, a proprietary software component is optimized for a single company or even a single department or a single application and for current applications only.Usually, such components are maintained reactively when they are close to becoming obsolete and are thus continuously under development .
The quality assurance of a software component requires a lot of testing and typically some kind of certification(s). If each company tests similar components on their own, their work overlaps and they have to testa large number of items. CANopen device profiles in general define a generic architecture of device categories and thus offer a common set of basic I/O, measurement, and drive functions. From a safety point of view, the most important consequence of relying on device profile conformant devices is that each basic function has been design, implemented, tested, and certified once by the device vendor.
When a standardized, CANopen-compliant component is used, there is always a globally harmonized system integration interface, enabling the intrinsic re-usability of basic functions without a need for project specific re-testing and -certification.
Many companies are involved in the development and maintenance of each CANopen device profile. In contrast to proprietary functions, device profiles are more generic and provide a solid basis for various applications, also considering future developments.
By using standardized functions and components, the device vendor performs unit tests for a single device, the costs of which will be shared by all customers. In the long run, the quality of the functions can be improved faster because each device in each application provides test results to the device vendor.

Figure 1: Controlling a CANopen drive with two applications and signals; signal numbers refer to the text

NMT state-machine

Analog sensors and actuators start full operation immediately after power-up, without any self-tests and consistency check procedures. In case of a fatal internal failure, there is no service available that informs the rest of the system about the internal initial condition or prevents a faulty operation. The NMT state-machine provides a basic mechanism for safe and manageable behavior. During start-up, each device enters into a pre-operational state, in which the system structure can be checked and optionally correct parameter values can be set. In case of a fatal internal failure, each CANopen device can automatically enter into a defined – preferably stopped – state in order to minimize further failures on the system level.

Device state-machines

Analog sensors and actuators are stateless, which enables the transmitting or receiving of continuous-time signals only. Thus, analog actuators cannot provide any local safeguards against e.g. cabling failures, which are common and typically lead to wrong behavior. Further more, analog actuators cannot enter into the stable error state and safely go back to normal operation when requested.

Some device profiles, e.g. for I/O- and measurement devices, have a simple fault mode for outputs, enabling the use of a safe value when set-point signals are not updated. Drive profiles for electric and hydraulic drives have comprehensive device state-machines, which are controlled by an additional signal pair. The state machine controls the operation of the entire drive. The use of a state-machine decreases the significance of a single communication failure or a communication failure in a single signal. The major benefit of the device state-machine is that error recovery can be accurately controlled and accidental recoveries can be avoided .

Another benefit of the device state-machine is that, together with the primary set-point signal, the additional state control and status signals provide dual-channel control for drives. The most common safe state is”stopped”, which can be triggered in two ways: First, the main control application can set the set-point value (2) to neutral or the control word to a different value than ”device mode active”. Second, the monitoring application can force the control word value (1) to a different value than ”device mode active”.

It has been proved that in an optimum case the main control application sends the set-point(s) directly to the drive (2) in order to minimize the control path latency. The control word (1) may be routed through the monitoring application, because it is used only in the initialization and recovery phases. It is a question of the dependability of the controller devices whether single or dual applications and/or PLCs are needed. The status word and actual values (3)may be used by both applications.

Actuators to drives

Traditional analog actuators are just actuating components, typically without any internal intelligence and sensing. Additional analog sensors may be installed into analog actuators, but they are not used internally. Thus, such sensors are considered additional sensors, increasing the number of components and amount of cabling.

Modern drives can have internal measurements and control loops. Bi-directional communication interfaces make it easy to access the internal signals. Actual values of drives (S) may be used as a redundant and diverse feedback for controlled axes, enabling the plausibility checking of the primary axis sensor value (P) instead of additional sensors. The system complexity is not increased, because neither additional components nor additional cabling are required.

Membership monitoring

Analog sensors and actuators cannot provide identification. As a consequence, unintentional or unmanaged device installations or changes cannot be detected by the rest of the system, which may lead to degraded performance or even dangerous misbehavior of the entire system. Because CANopen is an integration framework and not only a set of protocol services, it includes comprehensive diagnostic services. Managed network start-up, in conjunction with NMT state machine, provides a simple, efficient, and standardized mechanism for a detailed checking of the system structure identification before full operation during the system power-up. Optional checks may cover the full functionality of each device, including the application software version and device configurations. After the start-up phase, a lightweight on-line membership monitoring provides a continuous monitoring of structural changes.It enables the state monitoring of each device, which can also be used as an information source for the monitoring of received signals . Heartbeat is a point-to-multipoint protocol, enabling system consistency monitoring by an unlimited number of devices.

Figure 2: Redundant and diverse feedback may be available
from the CANopen drives; signal letters refer to the text

PDO mapping

Analog signals are sensitive to deviations. Therefore, it is impossible to connect a single analog sensor to multiple input devices without additional active components. Thus,entirely parallel sensor channels are commonly used for monitoring.

PDO mapping is generally understood as a signal routing function only, but it may also be utilized for decreasing the residual error probability of the CANopen communication . In the case of PLCs, there are most probably free objects left in the process image and RPDOs not used for the control application signals. Signals to and from control applications of other PLCs may be mapped from PDOs intothe local object dictionary of a PLC, which makes the PLC monitor the structure of those PDOs. This way, more PLCs are able to perform PDO message length checking and increase the spatial coverage of the potential residual errors passing the CAN layer 2 consistency check. The main constraint of this kind of RPDO monitoring is that only PDOs that are too short can be detected, not ones that are too long.

Signal validity

In traditional instrumentation, only primary signals are used without redundant services providing validity information of the primary signals or consistency of the system structure.Analog sensors and actuators cannot be identified and thus it is not possible to verify the structure and configuration.Using two analog sensors in parallel may enable the identification of a failure, but not necessarily in which sensor the failure is located. A third one is needed to enable the detection of a single failing sensor, but still it remains unclear whether the failure is in the sensor or cabling.

Membership monitoring provides a basic level monitoring of signal producers . Faster and more detailed validity monitoring of received signals can be based onRPDO timeout monitoring. Combining the information from membership monitoring and RPDO monitoring enables the identification of not only the error type, but also of the error location. If a signal plausibility checking is required, a CANopen design process can provide the necessary information. Additionally, most CANopen safeguards devices contain comprehensive self-monitoring functions and detected local failures are reported to the rest of the system by the emergency protocol.

Configuration management

Parameterization is a tool to reduce the number of different product items and increase re-use by adapting standard products to various system locations. An erroneous understanding in the industry is that there are no configurable parameters in analog sensors and actuators. There are typically vendor- and device-specific mechanisms for adjusting calibration, filtering, etc. in sensors. Such services are often for the vendor’s use only, limiting the usage. In actuators, the case is totally different, especially in hydraulic valves. There are e.g. plenty of slightly different main spools, various springs with different spring forces or spring force adjustments with washers, pressure compensator, load-pins, and optional valve elements for protection purposes.

Some of those parameters, which have traditionally been configured by changing the spools and springs, may currently be adjusted by changing the parameter values of the internal valve controller. Pure mechanical and hydraulic options, which cannot be changed over a CANopen network, are still identified in the device identity, providing a detailed checking during the network startup phase. However, a good quality assurance is required in order to avoid assembly failures, causing mismatches between planned and realized constructions. From a safety point of view, it is essential to be able to check that correct sensors and actuators are used. Based on experience, end users are ”quite innovative” and instructions can never stress enough why the control system must be able to perform the membership and configuration monitoring instead.

A clear division between factory calibration and user configuration is highly recommended. During the download process these also need to be separated, in order to prevent messing up the categories. In addition, to make parameters manageable, the configuration management process supports flexible production arrangements between system integrators, subcontractors, and component vendors. Storing parameters in numeric values enables a constant production quality and the possibility to verify the assigned values after set and store.

Design process

There is no uniform approach to the management of analog sensor and actuator interfaces. Instead, various written documents are used and each input and output must be configured manually in design-time and calibrated after the assembly, before full operation. Any component change leads to the need for a re-calibration. The major problem is the significance of human effort in each phase of the process.

Most of the listed safeguards are supported by the comprehensively standardized CANopen design process. It is important to manage design information systematically in order to avoid errors. Human mistakes can be avoided by using the appropriate tools instead of human work. An appropriate tool chain enables the validation and re-use of information to/from a CANopen system project.

The design process can be considered as a procedure that provides consistent information for configuration management and various other monitoring functions. Not all of the required information is necessarily available in CANopen projects, which leads to interactions with other disciplines . Information sharing is not possible if information content is not well defined and structured.

In addition to its extremely reliable communication services, CANopen provides further safeguards in order to decrease the residual error probability of communication and to increase the diagnostic coverage. Device profiles enable the efficient re-use of standardized basic sensing and actuation functions. Implemented in off-the-shelf devices, such functions have already been tested and certified, without additional cost or effort. Device state-machines provide protection against communication errors, causing e.g. unintentional error recovery, which violates one of the main safety design principles.

Detailed membership monitoring is a function which cannot be implemented in analog sensors and actuators and is intrinsically available in each CANopen system. Receiving PDOs by multiple devices, regardless of the need of included signal values, can be used to increase error detection performance by extending the spatial distribution.The more receiving devices, the more reliable an operation will become due to spatial coverage.

CANopen defines a comprehensive configuration management, which applies equally to all kinds of compliant devices. A harmonized principle enables an efficient and reliable system-wide configuration management. Configuration management is supported by the standardized design process, providing a systematic approach for the management of design information, including meta information of signals and parameters, throughout the system’s life cycle. A well-defined design process also maximizes the possible re-use of design information.

Discussion

It has been concluded that determining the exact residual error probability of CANopen communication analytically is challenging, due to the structure of CAN messages. However, based on the existing information, the residual error probability is low enough for most applications . When compared with old analog instrumentation, the difference is significant. Based on existing analyses made by following the related standards and using real failure statistics, it may be concluded that it might not make sense to use analog sensors and actuators in safety relevant control system functions. While CANopen is not considered a safety bus, most of its basic concepts follow ”inherently safe design measures” .

In addition to the communication, the dependability of applications is critical. The main methods for improving the dependability of application programs are a managed design process and testing. As long as additional costs are not acceptable, the re-use of applications enables a more complete testing. One of the best re-use methods is to use standardized basic functions, which are defined in device profiles in CANopen. From a system point of view, a higher dependability typically results in a better availability and profitability of target systems.

When analog instrumentation and CANopen networking are compared, the latter can be achieved much simpler, while the same performance level is achieved.Analog instrumentation is more traditional, has typically less functions, and seems simple to design. One reason for its virtual simplicity is that lots of systematics have not been defined in a lot of detail. Therefore the assembly and service of analog instrumentation is error prone and needs lots of human effort. However, defined methods must be followed in order to get the benefits of CANopen.

Download full paper