Physical layer really matters in CAN
Despite of the comprehensive knowledge of the CAN physical layer, there is poor tradition in design of physical layer structures for industrial and machinery CAN networks. Cables with “wrong” impedance (not matching with termination) are commonly used and many engineers intentionally select them, because of constrained understanding of the transmission line behavior.
Most engineers forget, that the transmission line cannot be considered as conductor with DC characteristics only. “Wrong” impedance leads not only to unmanaged reflections but also to different wave propagation speed in the cable. Unmanaged reflections increase the bit-error probability of the network and the propagation speed difference makes bit timing analysis results invalid.
One should remember with the CAN physical layer that there exists only impedance, not plain conductors. Let us start with a brief review of the main important transmission line characteristics.
Turning back to the discrete instrumentation is not an option in modern systems, due to the high accuracy, dependability, and safety requirements. Dependability depends on many things, starting from the physical layer and ending to the application layer communication services and application processing. The case example of this article shows, how well CAN error detection works, even in the case where the physical layer design rules are heavily violated. The presented scenarios exist also in CAN FD, which has similar arbitration and acknowledge fields, but bit-rate transition phases may also be disturbed if transmission line deviations exist.
This article begins with a brief review of the main important transmission line characteristics. For more details readers are advised to read the referred documents. Next, a case example with violated topology, a transmission line mismatch, and too closely installed nodes are presented. After describing the starting point, corrective actions with corresponding results are shown. Finally, some discussion is included and concluding remarks set.
Many variables
There are many variables in the background of physical layer design constraints. Maximum network length is a direct consequence of the wave propagation speed in the network, selected bit-rate, and bit-timing details specified by the used application layer protocol. The signal shall propagate from one end to another and back within the propagation segment or problems will occur. Equation 1 shows clearly that any change in the transmission line characteristics leads to change of wave propagation time, λ
Line impedance ETSI Z0 MERKKI is specified by the corresponding physical layer standard, also applying to the entire medium attachment circuitry of each device and terminators. Any deviation will introduce an unideal operation of the entire network. The actual line impedance depends on the transmission line characteristics according to equation 2. It clearly shows that series resistors in signal lines, which are commonly used in passive star topology implementations, increase significantly the line impedance. Equation 3 shows that the increased line impedance results over- and undershoots, and inverse reflections are caused by transition to the standard line impedance.
Termination has a significant effect on the overall dependability. It is much more than just a DC resistance – its main purpose is to prevent reflections from the network endpoints. Equation 3 shows formula for the reflection factor, ETSI P MERKKI. If termination impedance ZT equals line impedance, the reflection factor is zero and reflections do not exist. Accordingly, if the termination impedance is smaller than the line impedance, inverse polarity reflections exist and if the termination impedance is higher than the line impedance, the reflection polarity is same with the originating edge. The higher the mismatch is, the higher the reflection amplitude becomes.
The number of nodes mainly depends on the ratio of fan-out, fan-in of the transceivers, and load caused by the transmission line. The nominal number of nodes typically applies, when transmission line characteristics meet the corresponding standard. If e.g. the line capacitance is higher than specified, the maximum achievable line length is reduced. Protection circuits shall be used carefully, especially so called EMI-capacitors and common-mode chokes. According to Figure 1, high-resistance termination of long drop-lines collapse the maximum number of nodes in a network and is thus not recommended for the implementation in industrial systems.
The maximum length of a single drop-line, Li, is interesting, because it is hard to find a well explained description for the values presented in the corresponding standard. Each drop-line is an unterminated end, causing reflection having same polarity than the originating edge, overshoot for rising and undershoot for falling edge.
Reflection shall occur in the first 33 % of the propagation segment. TTRANS is the typical transition time of transceivers. Downand- back propagation shall be considered by using the double propagation time. The traditional extension mechanism with slew-rate control does not apply for higher bit rates and CAN FD, because there is no extra time margin for longer transitions. E.g. for the standard CANopen network with λ = 5 ns/m and transceivers with average 50 ns rise and fall times, equation 4 results the maximum dropline length of 1,67 m, which is perfectly in line with practical experience on high-speed CAN networks.
The most interesting variable is the minimum length between each two devices in the network, d. Background for such a variable is very simple – a group of nodes connected close to each other introduce the lumped load capacitance ETSI CL MERKKI, increasing the nominal line capacitance ETSI C0 MERKKI of the short range of the cable, into which they are connected. The result is two impedance junctions, from higher to lower and from lower to higher. Equation 5 applies for computing the minimum distance between the nodes.
Case example
A case example is based on the real troubleshooting case, where a dual start topology network with passive star-couplers was the original set-up. There was a relatively long continuous network segment without any devices between the two stars. Communication problems seemed to occur randomly, but often. Analysis of log files revealed, that problems occurred when two devices from opposite ends of the network arbitrated.
The screenshot in Figure 2 shows an example of CAN error detection capabilities and how a locally detected error is globalized. Data frame starts normally with dominant start-of-frame (SOF) and ID10 bits, followed by recessive ID9 bit. Then, ID8 seems to be dominant and ID7 to ID4 recessive. But the first active error flag followed by another one reveals the entire problem.
The most important occurrence is that the second active error flag has as high amplitude as acknowledge (ACK) bit. It means that the first active error flag, with lower amplitude, is transmitted by a single node. Error flags also confirm that ID8 is erroneously interpreted into recessive by one node. Every local error is efficiently globalized by an active error flag, violating bit-stuffing rule and causing more global error detection and reaction.
ID8 in the transmission re-try starting after the erroneous transmission seems to be almost as bad as in the erroneous transmission, but it was still received successfully. Detailed analysis showed that there were only four CANIDs existing after error frames, being the potentially failing ones. Key thing was that the similar bit-pattern from ID10 to ID4 existed in all those CAN-IDs. System documentation revealed that messages with such CAN-IDs were transmitted by at least one device close to each end of the network. Analysis confirmed that the passive star-couplers with series resistors in CAN-high and CAN-low and long unterminated drop-lines were introducing the line mismatch.
Network system improvements
Because of the heavy topology violation, the first corrective action was re-organizing the cabling from dual start into linear bus topology. Also the star-couplers causing the transmission line mismatch were removed. The result is clearly visible in Figure 3, where the beginning of the bit ID8 is significantly more stable and dominant state amplitude in the sample point at 75 % to 87,5 % is approximately 1,8 V or higher. Error frames were not seen anymore. However, the beginning of the bit-time had to be improved, because the dominant state amplitude went below the 1,2 V threshold during the first quarter of the bit-time.
After correcting the line impedance and network topology from dual star to linear, the structure still contains a long continuous cable with many devices in both ends installed close to each other. Such structure conforms the one presented in the literature, which gave clear advice for further improvement.
Second corrective action was the replacement of 0,5 m long daisy-chaining cables in the node group in one end of the network with 1,5 m long cables. Nodes in the other end were installed in the locations, where such improvement was too time-consuming and thus that end was left intact in the system under repair. Figure 4 shows that the improvement was significant. Dominant state amplitude exceeded 1,5 V during the entire bit-time and 1,8 V in the sample point.
Discussion
The example scenario in the literature concentrates on the scenario, where a single node controls a set of devices in an island, over a long cable connection. The scenario is typical for the old system architectures, where capability of a single PLC is extended with I/O devices, concentrating sensor, and actuator connections. Concentrating into a fixed scenario hides an important detail. Connected nodes increase the capacitance of the network cable. When the nodes are connected into a short range of network cable and a long range of the cable exists without any nodes, the capacitance change is concentrated in the short range of the cable.
Modern approach is to use intelligent sensors and actuators directly connected to the network. When a higher number of devices is connected to a network, they are installed more evenly along the network cable. Such approach changes the cable capacitance more evenly and results in much smaller impedance transitions in the cable. There are e.g. optical hub implementations, where minimum distance between the nodes is approximately
20 mm. Problems do not exist, because the entire network length is approximately 200 mm long, the nodes are evenly distributed along the network and properly terminated at both ends. It can be concluded based on referred information, that instead of a fixed minimum distance between two nodes, the nodes should be distributed as evenly as possible along the network cable. Furthermore, if the system structure leads to groups of nodes, minimum length a daisy-chaining cable may need to be extended accordingly.
Increased distribution provides advantages also in functional safety integrity level. Decreased residual error probability according to the increased number of nodes in a network has been well known for long time. There are also application layer safeguards, which are effective: This increases with the number of nodes in the network. However, the most significant increase in functional safety may be achieved by replacing all discrete I/O-signals with direct network communication. The increase is mainly outcome of the significantly higher diagnostics coverage provided by digital packet communication.
While network communication is significantly more dependable and safe than discrete instrumentation, special attention to safety integrity level shall still be paid. The latest published residual error probability analysis confirmed that bit-error probability is one of the most fundamental parameters. Electromagnetic interference is an external threat and there are well-known protection mechanisms, but topology violations and transmission line mismatches have direct and permanent effect on the entire communication. If the transmission line has such problems, bit-error probability of communication over such network is increased, which may decrease the safety integrity level below the required level. It is too often forgotten that the use of proper cabling approach and components helps in realizating installations, which fulfill the requirements taken into account during the design.
Conclusions
This article describes briefly a review of the most essential transmission line characteristics. In addition, a case example was introduced in order to prove that the presented theories really apply to the real world transmission lines. The main outcome of the case example was that all the main knowledge is available for everybody, one just needs to learn and utilize the knowledge. Each system has something special, requiring deep understanding of the transmission line characteristics in order to avoid pitfalls. Especially, when star or mixed topologies are needed, it is a waste of time and money to implement such without active topology components. Ever increasing accuracy, dependability, and safety integrity level requirements have lead to a demand for changing from discrete into digital network based instrumentation. Violating transmission line specifications is wasting the dependability and safety margin. In addition to the increased dependability, networkbased instrumentation provides a more flexible system architecture. However, doing the design work properly is not enough, it shall also be ensured that the proper designs can be implemented in the assembly lines and maintained in the field service. When the world starts turning from CAN to CAN FD, the physical layer implementations shall more closely follow the good design practices.
Kurt Lewin’s (1890-1947), German-American psychologist and father of the modern social psychology, words apply perfectly to the transmission lines:
There’s nothing so practical as good theory.