Reliability prediction is an important element in the process of selecting equipment. This prediction provides necessary input to system-level reliability modes for predicting expected downtime per year and system availability. Issue 3 of SR provides all the tools needed for predicting device and unit hardware reliability. The Telcordia Reliability Prediction Procedure has a long and distinguished history of use within and outside the telecommunications industry. Issue 3 of SR provides the only hardware reliability prediction procedure developed from the input and participation of a cross-section of major industrial companies.
|Published (Last):||3 September 2016|
|PDF File Size:||11.4 Mb|
|ePub File Size:||14.43 Mb|
|Price:||Free* [*Free Regsitration Required]|
In today's competitive electronic products market, having higher reliability than competitors is one of the key factors for success. To obtain high product reliability, consideration of reliability issues should be integrated from the very beginning of the design phase. This leads to the concept of reliability prediction. Historically, this term has been used to denote the process of applying mathematical models and component data for the purpose of estimating the field reliability of a system before failure data are available for the system.
However, the objective of reliability prediction is not limited to predicting whether reliability goals, such as MTBF, can be reached. It can also be used for:. Once the prototype of a product is available, lab tests can be utilized to obtain more accurate reliability predictions. Accurate prediction of the reliability of electronic products requires knowledge of the components, the design, the manufacturing process and the expected operating conditions.
Several different approaches have been developed to achieve the reliability prediction of electronic systems and components. Each approach has its unique advantages and disadvantages.
Among these approaches, three main categories are often used within government and industry: empirical standards based , physics of failure and life testing.
In this article, we will provide an overview of all three approaches. First, we will discuss empirical prediction methods, which are based on the experiences of engineers and on historical data. Next, we will discuss physics of failure methods, which are based on root-cause analysis of failure mechanisms, failure modes and stresses. This approach is based upon an understanding of the physical properties of the materials, operation processes and technologies used in the design.
Finally, we will discuss life testing methods, which are used to determine reliability by testing a relatively large number of samples at their specified operation stresses or higher stresses and using statistical models to analyze the data.
Empirical prediction methods are based on models developed from statistical curve fitting of historical failure data, which may have been collected in the field, in-house or from manufacturers.
These methods tend to present good estimates of reliability for similar or slightly modified parts. Some parameters in the curve function can be modified by integrating engineering knowledge. The assumption is made that system or equipment failure causes are inherently linked to components whose failures are independent of each other.
There are many different empirical methods that have been created for specific applications. Some have gained popularity within industry in the past three decades. The table below lists some of the available prediction standards and the following sections describe two of the most commonly used methods in a bit more detail.
It is probably the most internationally recognized empirical prediction method, by far. The MIL-HDBK predictive method consists of two parts; one is known as the parts count method and the other is called the part stress method . The parts count method assumes typical operating conditions of part complexity, ambient temperature, various electrical stresses, operation mode and environment called reference conditions.
The failure rate for a part under the reference conditions is calculated as:. Since the parts may not operate under the reference conditions, the real operating conditions will result in failure rates that are different from those given by the "parts count" method. The failure rate for parts under specific operating conditions can be calculated as:. According to the handbook, the failure rate of a commercial ceramic capacitor of 0.
Because of dissatisfaction with military handbook methods for their commercial products, Bellcore designed its own reliability prediction standard for commercial telecommunication products. Telcordia continues to revise and update the standard.
Method II is based on combining Method I predictions with data from laboratory tests performed in accordance with specific SR criteria.
Method III is a statistical prediction of failure rate based on field tracking data collected in accordance with specific SR criteria. In Method III, the predicted failure rate is a weighted average of the generic steady-state failure rate and the field failure rate. The failure rate is 9. So the result of 0. There are reasons for this variation.
Figure 2: Bellcore capacitor failure rate example. Although empirical prediction standards have been used for many years, it is always wise to use them with caution. The advantages and disadvantages of empirical methods have been discussed a lot in the past three decades. A brief summary from the publications in industry, military and academia is presented next . In contrast to empirical reliability prediction methods, which are based on the statistical analysis of historical failure data, a physics of failure approach is based on the understanding of the failure mechanism and applying the physics of failure model to the data.
Several popularly used models are discussed next. One of the earliest and most successful acceleration models predicts how the time-to-failure of a system varies with temperature.
This empirically based model is known as the Arrhenius equation. Generally, chemical reactions can be accelerated by increasing the system temperature. Since it is a chemical process, the aging of a capacitor such as an electrolytic capacitor is accelerated by increasing the operating temperature.
The model takes the following form. While the Arrhenius model emphasizes the dependency of reactions on temperature, the Eyring model is commonly used for demonstrating the dependency of reactions on stress factors other than temperature, such as mechanical stress, humidity or voltage. According to different physics of failure mechanisms, one more term i. Several models are similar to the standard Eyring model.
They are:. Electronic devices with aluminum or aluminum alloy with small percentages of copper and silicon metallization are subject to corrosion failures and therefore can be described with the following model :. Hot carrier injection describes the phenomena observed in MOSFETs by which the carrier gains sufficient energy to be injected into the gate oxide, generate interface or bulk oxide defects and degrade MOSFETs characteristics such as threshold voltage, transconductance, etc.
Since electronic products usually have a long time period of useful life i. However, if you think your products do not exhibit a constant failure rate and therefore cannot be described by an exponential distribution, the life characteristic usually will not be the MTBF.
For example, for the Weibull distribution, the life characteristic is the scale parameter eta and for the lognormal distribution, it is the log mean. Electromigration is a failure mechanism that results from the transfer of momentum from the electrons, which move in the applied electric field, to the ions, which make up the lattice of the interconnect material.
The most common failure mode is "conductor open. At the end of the s, J. Black developed an empirical model to estimate the MTTF of a wire, taking electromigration into consideration, which is now generally known as the Black model.
The Black model employs external heating and increased current density and is given by:. The current density J and temperature T are factors in the design process that affect electromigration.
Numerous experiments with different stress conditions have been reported in the literature, where the values have been reported in the range between 2 and 3. Usually, the lower the values, the more conservative the estimation. Fatigue failures can occur in electronic devices due to temperature cycling and thermal shock. Permanent damage accumulates each time the device experiences a normal power-up and power-down cycle.
A model known as the modified Coffin-Manson model has been used successfully to model crack growth in solder due to repeated temperature cycling as the device is switched on and off.
This model takes the form :. The activation energy is usually related to certain failure mechanisms and failure modes, and can be determined by correlating thermal cycling test data and the Coffin-Manson model.
A given electronic component will have multiple failure modes and the component's failure rate is equal to the sum of the failure rates of all modes i. The system's failure rate is equal to the sum of the failure rates of the components involved.
In using the above models, the model parameters can be determined from the design specifications or operating conditions. If the parameters cannot be determined without conducting a test, the failure data obtained from the test can be used to get the model parameters.
For this example, the life of an electronic component is considered to be affected by temperature. The component is tested under temperatures of , and Kelvin. The usage temperature level is Kelvin.
Figure 4 shows the data and calculated parameters. Figure 5 shows the reliability plot and the estimated B10 life at the usage temperature level.
Figure 5: Reliability vs. Time plot and calculated B10 life. From Figure 4, we can see that the estimated activation energy in the Arrhenius model is 0. Using this equation, the parameters B and C calculated by ALTA can easily be transformed to the parameters described above for the Arrhenius relationship. As mentioned above, time-to-failure data from life testing may be incorporated into some of the empirical prediction standards i. However, in this section of the article, we are using the term life testing method to refer specifically to a third type of approach for predicting the reliability of electronic products.
With this method, a test is conducted on a sufficiently large sample of units operating under normal usage conditions. Times-to-failure are recorded and then analyzed with an appropriate statistical distribution in order to estimate reliability metrics such as the B10 life.
As an example, suppose that an IC board is tested in the lab and the failure data are recorded. Time plot and the calculated B10 life for the analysis. Figure 7: Reliability vs. Time plot and calculated B10 life for the analysis. The life testing method can provide more information about the product than the empirical prediction standards.
Therefore, the prediction is usually more accurate, given that enough samples are used in the testing. The life testing method may also be preferred over both the empirical and physics of failure methods when it is necessary to obtain realistic predictions at the system rather than component level.
This is because the empirical and physics of failure methods calculate the system failure rate based on the predictions for the components e. This assumes that there are no interaction failures between the components but, in reality, due to the design or manufacturing, components are not independent.
For example, if the fan is broken in your laptop, the CPU will fail faster because of the high temperature. Therefore, in order to consider the complexity of the entire system, life tests can be conducted at the system level, treating the system as a "black box," and the system reliability can be predicted based on the obtained failure data.
Bellcore/Telcordia Reliability Prediction in Lambda Predict
These predictions provide necessary input to system-level reliability models for predicting expected downtime per year and system availability. Issue 4 of SR provides all the tools needed for predicting device and unit hardware reliability, and contains important revisions since the document was last issued. The Telcordia Reliability Prediction Procedure has a long and distinguished history of use within and outside the telecommunications industry. Issue 4 of SR provides the only hardware reliability prediction procedure developed from the input and participation of a cross-section of major industrial companies. This lends the procedure and the predictions derived from it a high level of credibility free from the bias of any individual supplier or service provider.
In today's competitive electronic products market, having higher reliability than competitors is one of the key factors for success. To obtain high product reliability, consideration of reliability issues should be integrated from the very beginning of the design phase. This leads to the concept of reliability prediction. Historically, this term has been used to denote the process of applying mathematical models and component data for the purpose of estimating the field reliability of a system before failure data are available for the system. However, the objective of reliability prediction is not limited to predicting whether reliability goals, such as MTBF, can be reached.
These standards use a series of models for various categories of electronic, electrical and electro-mechanical components to predict steady-state failure rates which environmental conditions, quality levels, electrical stress conditions and various other parameters affect. The models allow reliability prediction to be performed using three methods for predicting product reliability:. The Telcordia standard also documents a recommended method for predicting serial system hardware reliability. It contains instructions for suppliers to follow when providing predictions of their device, unit, or serial system reliability.