Jan. 19, 2026

Tool Evaluation and Qualification

The player is loading ...

Hello and welcome to another episode of “Applied FuSa”, a podcast for FuSa pragmatists.

Tool evaluation and qualification… there is hardly any other task in the FuSa universe that has caused so much uncertainty. At least, that’s the impression gained from many internal and external discussions (for instance, at FuSa conferences and workshops). Actually, the topic is not that complicated. But the devil is very much in the details. We will illustrate this using the example of compiler evaluation.

We will also share a few suggestions on how the effort involved can be significantly reduced.

00:00 - Moderator Intro

00:43 - Introduction

02:34 - Tool Evaluation

04:49 - Support TCL1 by dedicated measures

06:49 - Tool Qualification

12:34 - Summary and some lessons learned

16:28 - Moderator Outro

WEBVTT

00:00:00.000 --> 00:00:42.000
Hello and welcome to another episode of “Applied FuSa”, a podcast for FuSa pragmatists. Tool evaluation and qualification… there is hardly any other task in the FuSa universe that has caused so much uncertainty. At least, that’s the impression gained from many internal and external discussions (for instance, at FuSa conferences and workshops). Actually, the topic is not that complicated. But the devil is very much in the details. We will illustrate this using the example of compiler evaluation. We will also share a few suggestions on how the effort involved can be significantly reduced.

00:00:43.000 --> 00:02:33.000
Tool Evaluation and Qualification, or T-E-Q for short, is necessary because the tools used for the development and production of safety-related products can also contain faults that could compromise the safety of the product. If, for instance, a requirements management tool incorrectly changes the ASIL of a group of requirements from ASIL C to ASIL A, this error can have safety-critical consequences. Or if a test system does not execute a test case correctly and, as a result, a safety-critical fault condition cannot be detected, this too poses a threat to the functional safety of the product. As a final example, consider a torque wrench that applies a torque that is consistently a few percent lower than the required, i.e., the set torque. In that case, for instance, the housing of an ECU might not be sufficiently protected against splashing water, which could lead to a safety-critical issue. These are three examples of tool faults that can affect functional safety. The risks associated with them must be minimized — and that is precisely the purpose of T-E-Q. As the name already suggests, T-E-Q consists of two steps: the evaluation and the qualification of tools. While an evaluation must always be carried out for every tool, a qualification is only required if the result of the evaluation indicates that it is necessary. It should be mentioned right away that an evaluation is strongly recommended even in cases where you are firmly convinced that a tool definitely cannot affect the functional safety of the product. Even then, the evaluation should be performed, because this is the only way to properly document the relevant potential faults — and the conclusion that they are not safety-relevant — for the Safety Case.

00:02:34.000 --> 00:04:48.000
Let’s start with the tool evaluation. This consists of two steps, which must be carried out separately for each potential fault. That might sound like a lot of work — but it really isn’t. On the one hand, both steps are very simple and therefore quick to complete, and on the other hand, we will later show that there is a high potential for reuse, which means that the overall effort can be significantly reduced. First, the so-called Tool Impact (T-I) must be determined. In other words, you need to answer the question of whether there is a potential tool fault that could affect the functional safety of the product. If this is not the case, then the result is T-I-1. If you are unsure, or if it is already known that the evaluated tool error can have safety-relevant effects, then T-I equals T-I-2. This means the tool can potentially influence the functional safety of the product. If T-I equals T-I-2, then in the second step the so-called Tool Error Detection (T-D) must be determined. This refers to the probability that the tool error will be detected — either directly or indirectly through an effect. The following values are defined: T-D-1: The probability that the tool error will be detected is very high. T-D-2: The probability that the tool error will be detected is medium. T-D-3: The probability that the tool error will be detected is very low. From these two values (T-I and T-D), the so-called Tool Confidence Level (T-C-L) is derived: T-C-L equals T-C-L-1 if T-I equals T-I-1, or if T-I equals T-I-2 and T-D equals T-D-1. In other words, the result is T-C-L-1 if the tool cannot have any impact at all, or if the tool error detection is rated as very high. If the result of the evaluation is T-C-L-1, then the tool can be used unconditionally for safety-relevant products. As mentioned: this applies only to the tool error that was evaluated. Reminder: every tool error must be evaluated separately!

00:04:49.000 --> 00:06:48.000
If the evaluation results in a Tool Confidence Level of T-C-L-2, the tool must be qualified. There are four different methods available for qualification, which will be explained in detail shortly. First, a note regarding the T-D result: It is understandable that one would want to avoid qualification if possible. Qualification can be very time-consuming and should be avoided when feasible. A widely used and generally accepted method is to implement preventive and reactive detection measures. Preventive measures aim to reduce the likelihood of a tool error occurring—or ideally, to prevent the tool error altogether. A typical example is avoiding certain configurations known to cause the specific tool error, while using an alternative configuration that does not exhibit the same issue. Reactive measures, on the other hand, are intended to detect the tool error before the corrupted output of the tool is further used. If, for instance, code reviews are performed on auto-generated code, errors introduced by the code generator can be identified and the compilation of faulty code can be avoided. In times of increasingly AI-supported code generation, this type of measure is becoming ever more important. Preventive and reactive measures can therefore potentially influence the outcome for T-I and or T-D. In this way, a tool confidence level of TCL-1 can be achieved, making qualification unnecessary. However, it must be noted that these measures must be thoroughly documented, and evidence must be provided that they were actually applied in the project. The functional safety manager is responsible for ensuring that this is carefully verified during the final FS assessment. It is all too easy to define measures in order to achieve TCL-1 without actually implementing them. Of course, this would be completely unacceptable.

00:06:49.000 --> 00:12:33.000
If the evaluation determines that qualification is necessary, ISO 26262 provides four methods that must be applied depending on the ASIL. In this case, the ASIL corresponds to the ASIL of the malfunction that could be caused by the tool error. If multiple malfunctions could be triggered, the highest ASIL among them must be used. The four methods are: 1. Increased confidence from use, 2. Evaluation of the tool development process, 3. Validation of the software tool, 4. Development in accordance with a safety standard. All four methods will be briefly introduced below. Increased confidence from use. Tools for which evidence can be provided that they have been used sufficiently often without the occurrence of the considered tool error do not require qualification, as they can be regarded as sufficiently safe. In this case, the following evidence must be provided: 1. The tool has been used for identical or comparable use cases and in a comparable operating environment with similar functional constraints, 2. The justification for increased confidence from use is based on sufficient and appropriate data, 3. The specification of the tool is identical for the current use, 4. Malfunctions and resulting faulty outputs have been systematically recorded and analyzed. Furthermore, the experience gained with the tool in the past must be analyzed for the current project, taking the following information into account: 1. Unique identification and version number, 2. The configuration of the tool, 3. The details of the period of use and relevant data on its use, 4. The documentation of malfunctions and corresponding erroneous outputs of the software tool, including details of the conditions that led to them, 5. The list of previously monitored versions, including the malfunctions fixed in each relevant version, and 6. The safeguards, avoidance measures, or workarounds for known defects, or detection measures for corresponding erroneous outputs, if applicable. Last but not least, the increased confidence from use argument shall only be valid for the evaluated version of the tool. Method number 2: Evaluation of the tool development process… For this method, ISO 26262 specifies only two requirements: 1. The tool must have been developed in accordance with an appropriate standard, 2. The evaluation of the development process must have been performed in accordance with an appropriate national or international standard, and must have demonstrated that the tool was developed following an appropriate development process. The standard does not define what is to be considered “appropriate” in this context. The reason for this is that the criteria for appropriate development processes are expected to evolve over time. However, processes that are generally considered state of the art can typically be regarded as appropriate. Method number 3: Validation of the software tool. Ultimately, a user may choose to evaluate the tool independently. However, the effort required for such an evaluation should not be underestimated, as it must meet the following requirements: 1. The validation measures must provide evidence that the tool fulfills all specified requirements and that these requirements are appropriate for its intended use in the current project, 2. Malfunctions and corresponding erroneous outputs of the tool that occur during the evaluation must be thoroughly analyzed, taking into account all potential consequences for the functional safety of the current product, and appropriate preventive and reactive measures must be defined, 3. The reaction of the tool to anomalous operating conditions shall be examined. Even the first requirement of this method is often not feasible, as the specifications of the tools used are typically not available. And even if they were, it is evident how much effort would be required to fulfill this requirement alone. Method number 3 is therefore likely to be used only in very rare cases. The fourth qualification method applies in cases where a tool has already been developed in accordance with a safety standard, and where evidence of this—such as a certification—is available. Over the past 10 to 15 years, more and more tool vendors have had their tools certified according to ISO 26262. However, it is important to check up to which ASIL the tool has been certified and for which use cases. A certification according to ISO 26262 does not automatically mean that the tool can be used without restrictions in every project. Method number 4 can only be applied if the certification matches the conditions of the current project (use cases and ASIL). So much for the four methods that can be used for the qualification of tools when the result of the tool evaluation is TCL-2 or TCL-3. But which of these methods must actually be applied? ISO 26262 addresses this in Part 8, Section 11.4.6, where it provides two tables that recommend method combinations depending on the TCL value and the ASIL. For the sake of simplicity, we will only refer to these tables here. Reading them out in detail would be rather impractical.

00:12:34.000 --> 00:16:27.000
In summary, the primary objective will typically be to achieve a tool evaluation result of TCL-1. If necessary, preventive and/or reactive measures must be defined and demonstrated during the final functional safety assessment. Otherwise, ISO 26262 provides four methods for tool qualification. Ideally, a tool is already certified, but it is important to ensure that the certification is suitable for the intended use in the current project (regarding use cases and ASIL). If not, the methods must be applied according to the recommendations given in ISO 26262, Part 8, Section 11.4.6. Finally, we would like to deliver on a promise: the hint on how the effort required for tool evaluation can be significantly reduced. We’ll illustrate this using the example of evaluating a compiler. As part of an evaluation, use cases must be defined, and the relevant potential tool errors identified and assessed accordingly. For compilers, different configurations are often treated as separate use cases, which results in a high evaluation effort. But is this really necessary? In most cases, probably not. Why not? Well, a compiler essentially has only one task: to translate source code into object code. One possible tool error is that the generated object code does not correspond to the original source code. However, such an error would typically be detected during system or software verification. If appropriate test cases are defined to verify the correctness of the compiled output, this alone can be sufficient to achieve TCL-1. As a result, only a single evaluation is required, and qualification becomes unnecessary. Even if a specific compiler optimization is intended, it is generally not necessary to define separate use cases for the tool evaluation. Object code reviews or, again, appropriate test cases can uncover such compiler errors. One last example: compiler switches. If the compiler misinterprets some of the switches, such errors can also be detected afterwards. In general, it can be said that tool errors occurring in the left branch of the V-model can usually be detected by appropriate test cases, which supports achieving TCL-1. This, however, does not apply to test tools themselves. Test tools primarily assist in detecting tool errors in development tools when the design has been corrupted. Corresponding test results will then be negative, and a root cause analysis will reveal that the cause is a tool error. The situation is different for test results themselves. Therefore, test tools must be developed and validated very carefully to ensure the correctness of the test results. Ideally, test tools are also certified according to ISO 26262. In conclusion, we also recommend the usage of a tool that captures evaluation and qualification results in a central database. This enables the possibility of reusing results from other projects. It will significantly reduce the overall effort for T-E-Q, as it is expected that after 3–5 projects, most tools and use cases will already be available in such a database. In the worst case, T-E-Q would then be limited to re-evaluation and/or re-qualification if a new use case is added or a newer version of a tool is used. A full T-E-Q would only need to be planned and conducted if a new tool is introduced — which generally happens only rarely.

00:16:28.000 --> 00:16:46.000
Applied FuSa – a podcast for Functional Safety pragmatists. Get your new piece of FuSa every other week.