The Final Test
The Final TestEdmond L. Kyser, Ph.D.Cisco Systems
版权属于作者
Abstract
Manufacturers of complex electronic equipment invariably have a 'Final Test' - a test where product that passes is shipped to the customer, and product that fails is sent to rework for repair. Traditionally, these Final Tests fall into three categories: Burn-In (power on and functional at room temperature for an extended period of time), Conformance (power on and functional at design limit of temperature and possibly other stresses), and Overstress (power on and functional at stresses beyond design limits). The decision of which Final Test to employ is analyzed in this paper in terms of Type I and Type II errors (false failures and false passes), and the conceptual biases commonly found in this decision process.Introduction
In the manufacturing process for complex electronic products, there are many test operations, each of which can be considered a 'confirmation' of the immediately preceding assembly operation. At the conclusion of each test, there are two possible outcomes - the product is either 'shipped' to the next assembly operation or is returned for rework. The 'good' or 'normal' outcome is that the board passes.
In statistical terms [1], we call the good outcome the Null Hypothesis. The Test is designed to confirm the Null Hypothesis. Here we will be concerned with the design of the Test - specifically, what is the criteria for the board to be judged 'good', and how accurate is the Test in confirming this 'goodness'?. Setting aside for the moment the definition of 'good', the situation is as shown in Table 1.
Test: Positive
Test: Negative
Good board
Correct result
Type I error
Bad board
Type II error
Correct result
Table 1Type I and Type II errors
A Type I error - designating a good board as bad - is called the producer's risk, or a false fail. A Type II error - designating a bad board good - is called the consumer's risk, or a false pass. In this case, these are the only types of error possible. A good test will minimize these errors. Burn-in, Conformance and Overstress
It is common for complex electronic products to exhibit the reliability characteristic shown in Figure 1. A high initial failure rate (or parts return rate) is followed by a more-or-less asymptotic decline over a period of time to a steady failure rate, sometimes called the 'mature' failure rate. This is the same characteristic as the front end of the infamous 'bathtub curve'. However, the data in Figure 1 do not show an increasing failure rate as time passes, indicating that there is no evidence of 'end-of-life, or wearout. In what follows, we will use the generic characteristic of Figure 2 for the products being analyzed.
Figure 1Part Replacement rate vs. Time for 5 computer products
Figure 2Generic Field Failure Characteristic
Figure 2 is the metric of success for the benefits of the final Test. We expect product that passes the final test to perform better in the field - that is, to have fewer early life failures, and a larger MTTF.
Consider now three candidates for Final Test - Burn-in, Compliance, and Overstress.
Final Test
Characteristic
Burn-in
Functional test at ambient or elevated temp for extended time
Compliance
Functional test at design limits
Overstress
Functional test beyond design limits
Each of these candidates for Final Test has its champions, its history, its advantages and drawbacks [2,3,4]. The task before the manufacturing test engineer is which to recommend for a specific product, and the task set here in this paper is to systematize the decision.
At this point we have introduced two measures of 'badness' - Type I and Type II errors - and one measure of 'goodness' - MTBF improvement, or early life failure reduction. The three Final Test proposals differ primarily in the level of stress applied. In addition, the error rates at final Test will be Stress level dependent - assuming that the stresses selected are relevant (more on this later). This is illustrated in Figure 3.
Figure 3Type I and Type II errors vs. Stress Level
Several comments are in order relating to Figure 3. Most importantly, it is a metaphor - a model of reality. As the advertisements say, your results may vary. However, the general characteristics of figure 3 are representative of the products that the author has experience with. If Burn-in is performed at ambient conditions with stresses that are close to end user reality, then the Type I error rate (false fails) should be essentially zero. However, Type II errors (false passes) will be relatively high (remember that ALL field failures are by definition Type II errors). The appeal of Burn-in is that it is a low cost, low engineering content, low risk test - and the cost of test escapes is deferred until failed product is returned. It does not typically assure (or test for) product functionality over the entire design stress range. The producing company has promised performance it has decided not to confirm in a Final Test.
A Conformance Test is designed to alleviate the deficiencies of Burn-in, possibly at the cost of increasing Type II errors. For example, a product specified to operate from 0 to 50C, 15% to 85% humidity, +-5% supply voltage, and some maximum load in terms of users or traffic, should be tested at all combinations of these extreme values for a comprehensive Conformation test. In practice, this is seldom done. One alternative is to test at two corners of the conformance space - the 'fast' corner of high temperature and low voltage, and the 'slow' corner of low temperature and high voltage. The expectation for a conformance test is that Type II errors will be reduced and Type I errors will increase, as compared to Burn-in. The appeal of conformance testing is that the stresses are identical with the design criteria, and that it should produce a more robust product that Burn-in.
If only Burn-in and Conformance were represented in Figure 3, the curious engineer would likely ask the following questions: where is the optimum stress level - and could it possibly lie beyond the design/conformance level? After all, the curves for Type I and II errors are continuous through the design limit stress level - the best stress level could very likely be beyond the design limit. The real answers will require an assignment of relative importance to Type I and Type II errors. The use of stresses beyond design limits - overstress testing - was begun in the early 1980's, notably by the US Armed Forces, in an attempt to improve system reliability [5].
COSTS AND BENEFITS
The following cost-benefit analysis was developed in [4] as a method of 'netting out' all the positives and negatives relating to Environmental Stress Screening. The model allows for variables with unknown values to be estimated in terms of probability distributions, and the outcome calculated by Monte Carlo techniques. The output is a probability distribution of Net Present Savings per board tested. The software used was ‘Analytica”, published by Lumina Decision systems [6].
Figure 4 shows the relationships between the variables in the model. The costs of Type I errors are calculated in the node labeled ‘Cost of Test Failures’. The costs of Type II errors are calculated in the node labeled ‘total field Failure cost’. Since field failures occur, on average, at the MTTF, these must be discounted by the Cost of Capital to yield ‘Present Value of FF’, which can then be compared directly to ‘Cost of Test Failures’. The ‘NPS Importance’ node gives the relative importance of the variables in calculating Net Present Value. This will be discussed in a following section.
Figure 4Cost Model Variables and Relationships
It should be clear that cost considerations are not included in the definitions of Type I and Type II errors. The addition of cost in the model above allows us to evaluate the Net Present Savings per board generated by Final Test. The details of this model and the definitions of the mathematical relationships are given in [4]. Here we want to apply the model to 3 very different product scenarios in order to evaluate proposals to strengthen final Test and to reduce Type II errors.Table 4 shows the parameters for 3 products. Case 1 is for a fault-tolerant high-end Server, Case 2 is a Controller with a small imbedded processor, and case 3 is a telecommunications Router. The question in each case is should the Final Test be strengthened to Overstress status.
Variable Name
Units
Comments
Case 1: Server
Case 2: Controller
Case 3: Router
Cost of Inventory
%/week
Includes Depreciation and liquidity effects
Lognormal (0.5, 1.5)
Lognormal (0.5, 1.5)
Lognormal (0.5,1.5)
Test Failure Repair Cost
NP$
Material and Labor costs for debug and repair
Lognormal (1500, 1.5)
Lognormal (8,2)
Lognormal(1000,2)
Time to Repair Test Failure
Weeks
WIP time
Lognormal (6, 1.5)
Lognormal (.5,2)
Lognormal (1, .5)
Replacement Cost of Field Failure
Future$
Material and Labor (warranty costs)
Lognormal (2500, 1.25)
Linear (5,10)
Lognormal(190+0.52H, 1.5)
MTTF of Unscreened Unit
Years
Mean Time To Failure
Lognormal(5, 1.5)
8
Normal(20,2)
Impact of ESS on MTTF
%
Factor by which ESS improves unit MTTF
Normal (25%, 15%)
Normal (20%,15%)
Normal(20%, 15%)
Operational Costs
NP$
Variable cost only (no fixed costs)
Lognormal (50, 1.5)
5
Lognormal(20,2)
Whole Product Cost
NP$
Used to calculate inventory, depreciation, and replacement costs
5000
100
8000
ESS Yield
-
Probability of passing ESS screen
90%
90%
90%
IBP for Field Failure
Future$
A measure of the intangible costs
2500
Linear (20,40)
H/2
Cost of Capital
%/year
Time vs. money discount rate
15%
15%
15%
Table 4Input Variables for 3 Products
Once these values are input into the model, it calculates the probability of achieving Net Present Savings. For the Server, the results are shown in figure 5. Figure 5NPS for Server
From Figure 5, the cumulative probability of breaking even ($0 NPS) occurs at 0.3, meaning that there is a 30% probability of a negative NPS (costs exceed benefits, and the final Test looses money) and 70% probability of a positive NPS. The Expected Value of NPS, which is defined as 50% cumulative probability, is about $180. We expect this version of Final Test to save us $180 per board tested. Therefore, this program should be implemented.
Typically, a few uncertain inputs are responsible for most of the uncertainty in the final result. I have had opponents of overstress testing argue, for example, that the cost of repairing Type I failures would make the entire test worthless. This type of question is best answered by using Importance analysis – which, in statistical terms, is the absolute rank-order correlation between the sample of output values and the sample for each uncertain input. (See [6]).
Figure 6 shows the Importance analysis for the Server. Here we see that ‘Impact of ESS on MTTF’ is an order of magnitude more important than ‘Test Failure Repair Cost’ – and thus, if the model is to be challenged or improved, one should focus on Impact of ESS.
Figure 6Variable Importance for Server
Looking at Figure 7, NPS for an Embedded controller, the cumulative probability of breaking even ($0 NPS) occurs at 0.9, meaning that there is a 90% probability of a negative NPS (costs exceed benefits, and the final Test looses money) and 10% probability of a positive NPS. The Expected Value of NPS, which is defined as 50% cumulative probability, is about -$3.00. We expect this version of Final Test to loose $3 per board tested. Therefore, this program should not be implemented. Looking at the input values, we suspect that this conclusion is dominated by the high reliability of the unscreened board, and the low cost of a field failure.
Figure 7NPS for Controller
Variable importance for the controller is shown in Figure 8. Note that once again the impact of ESS on MTTF is the leading contributor to uncertainty in the final result of NPS. Also, importance is only calculable for variables, not input constants. Therefore, for example, operational costs which were variable for the server and show up on the ‘importance’ chart, are input as constant for the controller, and do not appear on the importance plot.
Figure 8Variable Importance for Controller
Turning to the case of the Router in figure 9, the probability of breaking even ($0 NPS) occurs at 0.4, meaning that there is a 40% probability of a negative NPS (costs exceed benefits, and the final Test looses money) and 60% probability of a positive NPS. The Expected Value of NPS, which is defined as 50% cumulative probability, is $52. We expect this version of Final Test to save us $52 per board tested. Here the risks are high, and the outcome is uncertain. A good policy would be to go back and re-evaluate the input parameters, or to run a preliminary test in an effort to reduce uncertainty.
Figure 9NPS for Router
Router Variable Importance in Figure 10 shows Impact of ESS again the leading cause of uncertainty, but now Test Failure Repair costs are significant.
Figure 10Variable Importance for Router
Finally, we can extract from the preceding analyses the costs of Type I and Type II errors for each of the 3 cases being examined. These results are shown in Table 5. Note particularly how expensive a Type II error is compared to Type I, and the importance of including the cost of money in discounting Type II errors to current dollars for MTTF years. I believe that information like that given in Table 5 can serve well to overcome the reluctance to consider overstress testing. The bias against overstress is often based on fear of creating additional Type I errors.
Server
Controller
Router
Type I error (cost of test fail)
$201.00
$1.13
$146.00
Type II error (field failure)
$5,063.00
$37.50
$8,566.00
MTTF (years)
5
8
20
Type II error in current $
$2,486.00
$14.62
$501.00
Table 5Cost of Type I and Type II errors
Conclusions
Optimizing the final Test can be done in a systematic manner, using statistically represented variables where the true values are unknown at the time of the analysis. The decision of which 'Final Test’ to employ can be analyzed in terms of Type I and Type II errors (false failures and false passes), and calculation of Net Present Savings per board to be realized by the test.Overstress tests can be more effective than either burn-in or conformance, depending on product parameters such as MTTF and cost of repair. The analysis shows that improving the MTTF of the product is the predominately important variable.Each product must be analyzed. Variations from product to product are key, with variables assuming vastly different importance for different products.Type II failures (field failures) are often an order of magnitude more costly than Type I failures (test failures in-house).
References
版权属于作者
Abstract
Manufacturers of complex electronic equipment invariably have a 'Final Test' - a test where product that passes is shipped to the customer, and product that fails is sent to rework for repair. Traditionally, these Final Tests fall into three categories: Burn-In (power on and functional at room temperature for an extended period of time), Conformance (power on and functional at design limit of temperature and possibly other stresses), and Overstress (power on and functional at stresses beyond design limits). The decision of which Final Test to employ is analyzed in this paper in terms of Type I and Type II errors (false failures and false passes), and the conceptual biases commonly found in this decision process.Introduction
In the manufacturing process for complex electronic products, there are many test operations, each of which can be considered a 'confirmation' of the immediately preceding assembly operation. At the conclusion of each test, there are two possible outcomes - the product is either 'shipped' to the next assembly operation or is returned for rework. The 'good' or 'normal' outcome is that the board passes.
In statistical terms [1], we call the good outcome the Null Hypothesis. The Test is designed to confirm the Null Hypothesis. Here we will be concerned with the design of the Test - specifically, what is the criteria for the board to be judged 'good', and how accurate is the Test in confirming this 'goodness'?. Setting aside for the moment the definition of 'good', the situation is as shown in Table 1.
Test: Positive
Test: Negative
Good board
Correct result
Type I error
Bad board
Type II error
Correct result
Table 1Type I and Type II errors
A Type I error - designating a good board as bad - is called the producer's risk, or a false fail. A Type II error - designating a bad board good - is called the consumer's risk, or a false pass. In this case, these are the only types of error possible. A good test will minimize these errors. Burn-in, Conformance and Overstress
It is common for complex electronic products to exhibit the reliability characteristic shown in Figure 1. A high initial failure rate (or parts return rate) is followed by a more-or-less asymptotic decline over a period of time to a steady failure rate, sometimes called the 'mature' failure rate. This is the same characteristic as the front end of the infamous 'bathtub curve'. However, the data in Figure 1 do not show an increasing failure rate as time passes, indicating that there is no evidence of 'end-of-life, or wearout. In what follows, we will use the generic characteristic of Figure 2 for the products being analyzed.
Figure 1Part Replacement rate vs. Time for 5 computer products
Figure 2Generic Field Failure Characteristic
Figure 2 is the metric of success for the benefits of the final Test. We expect product that passes the final test to perform better in the field - that is, to have fewer early life failures, and a larger MTTF.
Consider now three candidates for Final Test - Burn-in, Compliance, and Overstress.
Final Test
Characteristic
Burn-in
Functional test at ambient or elevated temp for extended time
Compliance
Functional test at design limits
Overstress
Functional test beyond design limits
Each of these candidates for Final Test has its champions, its history, its advantages and drawbacks [2,3,4]. The task before the manufacturing test engineer is which to recommend for a specific product, and the task set here in this paper is to systematize the decision.
At this point we have introduced two measures of 'badness' - Type I and Type II errors - and one measure of 'goodness' - MTBF improvement, or early life failure reduction. The three Final Test proposals differ primarily in the level of stress applied. In addition, the error rates at final Test will be Stress level dependent - assuming that the stresses selected are relevant (more on this later). This is illustrated in Figure 3.
Figure 3Type I and Type II errors vs. Stress Level
Several comments are in order relating to Figure 3. Most importantly, it is a metaphor - a model of reality. As the advertisements say, your results may vary. However, the general characteristics of figure 3 are representative of the products that the author has experience with. If Burn-in is performed at ambient conditions with stresses that are close to end user reality, then the Type I error rate (false fails) should be essentially zero. However, Type II errors (false passes) will be relatively high (remember that ALL field failures are by definition Type II errors). The appeal of Burn-in is that it is a low cost, low engineering content, low risk test - and the cost of test escapes is deferred until failed product is returned. It does not typically assure (or test for) product functionality over the entire design stress range. The producing company has promised performance it has decided not to confirm in a Final Test.
A Conformance Test is designed to alleviate the deficiencies of Burn-in, possibly at the cost of increasing Type II errors. For example, a product specified to operate from 0 to 50C, 15% to 85% humidity, +-5% supply voltage, and some maximum load in terms of users or traffic, should be tested at all combinations of these extreme values for a comprehensive Conformation test. In practice, this is seldom done. One alternative is to test at two corners of the conformance space - the 'fast' corner of high temperature and low voltage, and the 'slow' corner of low temperature and high voltage. The expectation for a conformance test is that Type II errors will be reduced and Type I errors will increase, as compared to Burn-in. The appeal of conformance testing is that the stresses are identical with the design criteria, and that it should produce a more robust product that Burn-in.
If only Burn-in and Conformance were represented in Figure 3, the curious engineer would likely ask the following questions: where is the optimum stress level - and could it possibly lie beyond the design/conformance level? After all, the curves for Type I and II errors are continuous through the design limit stress level - the best stress level could very likely be beyond the design limit. The real answers will require an assignment of relative importance to Type I and Type II errors. The use of stresses beyond design limits - overstress testing - was begun in the early 1980's, notably by the US Armed Forces, in an attempt to improve system reliability [5].
COSTS AND BENEFITS
The following cost-benefit analysis was developed in [4] as a method of 'netting out' all the positives and negatives relating to Environmental Stress Screening. The model allows for variables with unknown values to be estimated in terms of probability distributions, and the outcome calculated by Monte Carlo techniques. The output is a probability distribution of Net Present Savings per board tested. The software used was ‘Analytica”, published by Lumina Decision systems [6].
Figure 4 shows the relationships between the variables in the model. The costs of Type I errors are calculated in the node labeled ‘Cost of Test Failures’. The costs of Type II errors are calculated in the node labeled ‘total field Failure cost’. Since field failures occur, on average, at the MTTF, these must be discounted by the Cost of Capital to yield ‘Present Value of FF’, which can then be compared directly to ‘Cost of Test Failures’. The ‘NPS Importance’ node gives the relative importance of the variables in calculating Net Present Value. This will be discussed in a following section.
Figure 4Cost Model Variables and Relationships
It should be clear that cost considerations are not included in the definitions of Type I and Type II errors. The addition of cost in the model above allows us to evaluate the Net Present Savings per board generated by Final Test. The details of this model and the definitions of the mathematical relationships are given in [4]. Here we want to apply the model to 3 very different product scenarios in order to evaluate proposals to strengthen final Test and to reduce Type II errors.Table 4 shows the parameters for 3 products. Case 1 is for a fault-tolerant high-end Server, Case 2 is a Controller with a small imbedded processor, and case 3 is a telecommunications Router. The question in each case is should the Final Test be strengthened to Overstress status.
Variable Name
Units
Comments
Case 1: Server
Case 2: Controller
Case 3: Router
Cost of Inventory
%/week
Includes Depreciation and liquidity effects
Lognormal (0.5, 1.5)
Lognormal (0.5, 1.5)
Lognormal (0.5,1.5)
Test Failure Repair Cost
NP$
Material and Labor costs for debug and repair
Lognormal (1500, 1.5)
Lognormal (8,2)
Lognormal(1000,2)
Time to Repair Test Failure
Weeks
WIP time
Lognormal (6, 1.5)
Lognormal (.5,2)
Lognormal (1, .5)
Replacement Cost of Field Failure
Future$
Material and Labor (warranty costs)
Lognormal (2500, 1.25)
Linear (5,10)
Lognormal(190+0.52H, 1.5)
MTTF of Unscreened Unit
Years
Mean Time To Failure
Lognormal(5, 1.5)
8
Normal(20,2)
Impact of ESS on MTTF
%
Factor by which ESS improves unit MTTF
Normal (25%, 15%)
Normal (20%,15%)
Normal(20%, 15%)
Operational Costs
NP$
Variable cost only (no fixed costs)
Lognormal (50, 1.5)
5
Lognormal(20,2)
Whole Product Cost
NP$
Used to calculate inventory, depreciation, and replacement costs
5000
100
8000
ESS Yield
-
Probability of passing ESS screen
90%
90%
90%
IBP for Field Failure
Future$
A measure of the intangible costs
2500
Linear (20,40)
H/2
Cost of Capital
%/year
Time vs. money discount rate
15%
15%
15%
Table 4Input Variables for 3 Products
Once these values are input into the model, it calculates the probability of achieving Net Present Savings. For the Server, the results are shown in figure 5. Figure 5NPS for Server
From Figure 5, the cumulative probability of breaking even ($0 NPS) occurs at 0.3, meaning that there is a 30% probability of a negative NPS (costs exceed benefits, and the final Test looses money) and 70% probability of a positive NPS. The Expected Value of NPS, which is defined as 50% cumulative probability, is about $180. We expect this version of Final Test to save us $180 per board tested. Therefore, this program should be implemented.
Typically, a few uncertain inputs are responsible for most of the uncertainty in the final result. I have had opponents of overstress testing argue, for example, that the cost of repairing Type I failures would make the entire test worthless. This type of question is best answered by using Importance analysis – which, in statistical terms, is the absolute rank-order correlation between the sample of output values and the sample for each uncertain input. (See [6]).
Figure 6 shows the Importance analysis for the Server. Here we see that ‘Impact of ESS on MTTF’ is an order of magnitude more important than ‘Test Failure Repair Cost’ – and thus, if the model is to be challenged or improved, one should focus on Impact of ESS.
Figure 6Variable Importance for Server
Looking at Figure 7, NPS for an Embedded controller, the cumulative probability of breaking even ($0 NPS) occurs at 0.9, meaning that there is a 90% probability of a negative NPS (costs exceed benefits, and the final Test looses money) and 10% probability of a positive NPS. The Expected Value of NPS, which is defined as 50% cumulative probability, is about -$3.00. We expect this version of Final Test to loose $3 per board tested. Therefore, this program should not be implemented. Looking at the input values, we suspect that this conclusion is dominated by the high reliability of the unscreened board, and the low cost of a field failure.
Figure 7NPS for Controller
Variable importance for the controller is shown in Figure 8. Note that once again the impact of ESS on MTTF is the leading contributor to uncertainty in the final result of NPS. Also, importance is only calculable for variables, not input constants. Therefore, for example, operational costs which were variable for the server and show up on the ‘importance’ chart, are input as constant for the controller, and do not appear on the importance plot.
Figure 8Variable Importance for Controller
Turning to the case of the Router in figure 9, the probability of breaking even ($0 NPS) occurs at 0.4, meaning that there is a 40% probability of a negative NPS (costs exceed benefits, and the final Test looses money) and 60% probability of a positive NPS. The Expected Value of NPS, which is defined as 50% cumulative probability, is $52. We expect this version of Final Test to save us $52 per board tested. Here the risks are high, and the outcome is uncertain. A good policy would be to go back and re-evaluate the input parameters, or to run a preliminary test in an effort to reduce uncertainty.
Figure 9NPS for Router
Router Variable Importance in Figure 10 shows Impact of ESS again the leading cause of uncertainty, but now Test Failure Repair costs are significant.
Figure 10Variable Importance for Router
Finally, we can extract from the preceding analyses the costs of Type I and Type II errors for each of the 3 cases being examined. These results are shown in Table 5. Note particularly how expensive a Type II error is compared to Type I, and the importance of including the cost of money in discounting Type II errors to current dollars for MTTF years. I believe that information like that given in Table 5 can serve well to overcome the reluctance to consider overstress testing. The bias against overstress is often based on fear of creating additional Type I errors.
Server
Controller
Router
Type I error (cost of test fail)
$201.00
$1.13
$146.00
Type II error (field failure)
$5,063.00
$37.50
$8,566.00
MTTF (years)
5
8
20
Type II error in current $
$2,486.00
$14.62
$501.00
Table 5Cost of Type I and Type II errors
Conclusions
Optimizing the final Test can be done in a systematic manner, using statistically represented variables where the true values are unknown at the time of the analysis. The decision of which 'Final Test’ to employ can be analyzed in terms of Type I and Type II errors (false failures and false passes), and calculation of Net Present Savings per board to be realized by the test.Overstress tests can be more effective than either burn-in or conformance, depending on product parameters such as MTTF and cost of repair. The analysis shows that improving the MTTF of the product is the predominately important variable.Each product must be analyzed. Variations from product to product are key, with variables assuming vastly different importance for different products.Type II failures (field failures) are often an order of magnitude more costly than Type I failures (test failures in-house).
References
- Statistical Analysis for Engineers and Scientists, J. Wesley Barnes, McGraw-Hill, 19942. Burn-In Testing, Dmitri Kececioglu and Feng-Bin Sun, Prentice Hall, 19973. Burn-In, Finn Jensen and Neils Eric Petersen, Wiley, 19824. “The Politics of Accelerated Stress Testing”, Edmond L. Kyser, Eugene Hnatek and Mark Roettgering, Proc. IEST, March 20005. Environmental Stress Screening, Dmitri Kececioglu and Feng-Bin Sun, Prentice Hall, 19956. Analytica Users Guide, Lumina Decision Systems, 1999
[转帖]Sn-Ag无铅焊点可靠性与环境试验
Sn-Ag焊料是目前商业推广中最有前景的一种无铅焊料。通过高温试验、热循环试验、热振动复合应力试验,研究了Sn-3.5Ag-0.75Cu、Sn-2Ag-0.75Cu-3Bi和Sn-10Pb镀层、Ni/Pd/Au镀层间焊点可靠性。另外对Sn-3.5Ag-0.75Cu焊料批量生产的试样PCB板进行了环境试验和长达三年的现场可靠性测试,并和传统的Sn-Pb共晶焊料进行了耐久性的比较。
测试方法
表1列出了可靠性测试的条件。组装部件用的是菊花链型内部连线的QFP(0.5mm引脚间距,100只引脚)。QFP的铜引脚采用传统的Sn-10Pb镀层和Ni/Pd/Au镀层。
表1
图1
图1给出了温度冲击试验中测量焊接部位导电性的方法。通过一个扫描仪和一个毫欧表,使用4线测量法可测量出菊花链QFP的导电性。
结果与讨论
图2显示了高温试验后焊点强度的变化。Sn-3.5Ag-0.75Cu焊料和Sn-Pb共晶焊料,不管使用的镀层类型如何,随着时间和循环数的增加其焊点强度都会下降。当作用于Ni/Pd/Au镀层的部件时,Sn-2Ag-0.75Cu-3Bi的焊点强度和其它焊料相比无太大差别;但是,当作用于Sn-10Pb镀层的部件时,其焊点强度会出现显著的降低。
图3是复合环境试验中Sn-10Pb镀层引脚的威布尔曲线。Sn-3.5Ag-0.75Cu和Sn-Pb共晶焊料的试验结果相差不大,在40~50循环下,失效率是50%(L50)。大约在试验开始后2小时,使用Sn-2Ag-0.75Cu-3Bi焊料的PCB板就开始出现失效,试验开始后30小时,所有此类焊料的评估用PCB板均失效。此类焊料用于Sn-10Pb镀层时,L50的出现时间是15个循环。但是,当此类焊料用于Ni/Pd/Au镀层时,在100小时后还没有失效发生。当不施加高温而仅使用振动试验时,无论是何种镀层类型,在试验500小时后都没有出现失效。这样的结果表明了温度的影响是导致焊接部位劣化的主要原因。
图2
图3
图4是经过100小时热振动复合试验后焊点截面背散射电子像。Sn-2Ag-0.75Cu-3Bi焊点开裂发生在焊料内部以及焊料/PCB板、焊料/引线界面处。Sn-3.5Ag-0.75Cu焊点开裂发生焊料/PCB板界面处。Sn-Pb共晶焊料组织变粗大,并在焊料内部产生开裂。
图5反映了界面金属间化合物层的厚度随试验时间的变化趋势。Sn-2Ag-0.75Cu-3Bi/Sn-10Pb镀层界面处金属间化合物层在高温条件下随着时间的延长不断变厚,其生长速度要快于Sn-3.5Ag-0.75Cu/Sn-10Pb的。对于Ni/Pd/Au镀层使用不同焊料,其界面处金属间化合物层的厚度差别不大。
图4
图5
图6
对Sn-2Ag-0.75Cu-3Bi作用于不同镀层的界面结构进行了分析。图6是2000小时高温试验后界面处元素面扫描图,结果表明当含Bi的焊料和Sn-10Pb镀层作用时,在高温阶段由于金属间化合物的快速生长会导致焊点强度下降。
量产PCB板的测试
批量生产的试样PCB板的可靠性测试温度条件设定为:—25℃~80℃。试验项目包括高温试验、高温高湿试验和热循环试验(-25℃/80℃,60分钟/循环)。
使用Sn-Pb共晶焊料的PCB板在2000个循环时开始出现断路,在3000个循环时所有试样均断路。导致的功能问题主要是数字电路系统失效。使用Sn-3.5Ag-0.75Cu焊料的PCB板在3000个循环后依然没有出现失效。
在进行可靠性试验的同时,我们还进行了应用于实际产品的现场可靠性测试。现场可靠性测试包括了在宇都宫和福知山工厂进行和评估的实际操作。到2003年7月为止,这项现场可靠性测试已经进行了差不多3年,操作时间接近20000小时。现场可靠性测试的结果表明没有产品失效发生。
结论
1.
无论使用哪种类型的镀层,Sn-3.5Ag-0.75Cu焊料的焊点强度和可靠性与Sn-Pb共晶焊料相当。
2.
Sn-2Ag-0.75Cu-3Bi焊料与传统的Sn-10Pb镀层焊接时,焊点强度下降。
3.
使用Sn-3.5Ag-0.75Cu焊料批量生产的试样PCB板可以确保至少3000个循环的温度试验,至少20000小时之久的现场可靠性试验的耐力度。
本文曾在第六届国际可靠性、维修性、安全性会议上发表。由爱斯佩克测试科技(上海)有限公司黄卫东共同参与研究完成。原文详细内容请参照会议论文集245-249。
版权属于作者http://www.reliaonline.com 收起阅读 »
[转帖]What is a “Bellcore test?”
[转帖]What is a “Bellcore test?”
Telecommunications suppliers for years have used test chambers to assure the quality of their equipment, making the phone system in the USA extremely reliable and cost-efficient. The regional phone companies have set common standards for quality, centrally issued by Bellcore (now called Telcordia). Up until the last couple years, these standards have remained in obscurity. But with today’s complete rebuilding of our telecommunications infrastructure with fiberoptics, they have come to the forefront.
Even if you aren’t involved in this industry, I think you may find it interesting to learn more. As more and more communications-related companies are created, the need to do this type of testing has increased dramatically (fiberoptic components are quite sensitive to environmental conditions, as opposed to copper-wire). Also increasing is the number of people unfamiliar with this type of test, but required to comply.
There is no single Bellcore test—each test is specific for a type of component and its application. They are all numbered. Most are called Generic Requirements, and are prefixed by GR. The name “Bellcore” has stuck to them, although they are now all officially called Telcordia standards.
For example, one of the popular standards is GR-1221-CORE for fiberoptic passive components. Within this standard are several different environmental tests including simulated storage at extremes, operational tests, and thermal shock tests.
In trying to comply with some of the specific test requirements, it has been our experience that the descrīption of the environmental conditions in the standards are not always written in consideration of the operation of a test chamber. More likely, they describe what was programmed into the test chamber, without considering what the test chamber actually does.
For example, one test (GR-1209-CORE 5.1.2) requires going from –40 to 75°C. At 2°C, the humidity control is required to start, controlled at 80% ±2%. Actually, when the humidity system is turned on, it takes a few minutes to heat up the water to generate moisture, then a little while to gain control at 80%. Meanwhile the temperature has increased, as required by the program. So the chamber never actually achieves the beginning condition of 2C/80% as the standard indicates!
I share this with you to let you know that following a test standard is not a process that doesn’t require you to think. What is acceptable to you, your company, and your customer? Only you can decide. Luckily, it has been our experience that those involved in the requirement for these tests have been reasonable and realistic. 收起阅读 »
Telecommunications suppliers for years have used test chambers to assure the quality of their equipment, making the phone system in the USA extremely reliable and cost-efficient. The regional phone companies have set common standards for quality, centrally issued by Bellcore (now called Telcordia). Up until the last couple years, these standards have remained in obscurity. But with today’s complete rebuilding of our telecommunications infrastructure with fiberoptics, they have come to the forefront.
Even if you aren’t involved in this industry, I think you may find it interesting to learn more. As more and more communications-related companies are created, the need to do this type of testing has increased dramatically (fiberoptic components are quite sensitive to environmental conditions, as opposed to copper-wire). Also increasing is the number of people unfamiliar with this type of test, but required to comply.
There is no single Bellcore test—each test is specific for a type of component and its application. They are all numbered. Most are called Generic Requirements, and are prefixed by GR. The name “Bellcore” has stuck to them, although they are now all officially called Telcordia standards.
For example, one of the popular standards is GR-1221-CORE for fiberoptic passive components. Within this standard are several different environmental tests including simulated storage at extremes, operational tests, and thermal shock tests.
In trying to comply with some of the specific test requirements, it has been our experience that the descrīption of the environmental conditions in the standards are not always written in consideration of the operation of a test chamber. More likely, they describe what was programmed into the test chamber, without considering what the test chamber actually does.
For example, one test (GR-1209-CORE 5.1.2) requires going from –40 to 75°C. At 2°C, the humidity control is required to start, controlled at 80% ±2%. Actually, when the humidity system is turned on, it takes a few minutes to heat up the water to generate moisture, then a little while to gain control at 80%. Meanwhile the temperature has increased, as required by the program. So the chamber never actually achieves the beginning condition of 2C/80% as the standard indicates!
I share this with you to let you know that following a test standard is not a process that doesn’t require you to think. What is acceptable to you, your company, and your customer? Only you can decide. Luckily, it has been our experience that those involved in the requirement for these tests have been reasonable and realistic. 收起阅读 »
Good luck and bye~
天气: 晴朗心情: 郁闷有同事要離職了...因為是玩得比較好的姐妹,那感覺就像是自己離職一樣~~
她在郵件中寫道: 親愛的姐妹們,因我個人在工作上很不順利,不得已要離開妳們了,妳們不要太傷感哦..想當初剛來的時候,........(其中省略5000字).........我會想妳們的!byebye!
9m說得對,做人哪有不累?打工哪有不受氣?壓力無處不在,要懂得自我調節..
我喜歡看<<上帝也瘋狂>>,尤其喜歡其中的一段話:人類創造有利環境,但卻越來越難適應自己創造的 收起阅读 »
她在郵件中寫道: 親愛的姐妹們,因我個人在工作上很不順利,不得已要離開妳們了,妳們不要太傷感哦..想當初剛來的時候,........(其中省略5000字).........我會想妳們的!byebye!
9m說得對,做人哪有不累?打工哪有不受氣?壓力無處不在,要懂得自我調節..
我喜歡看<<上帝也瘋狂>>,尤其喜歡其中的一段話:人類創造有利環境,但卻越來越難適應自己創造的 收起阅读 »
今年春节不回家
下午阿飞发信息告诉我明天他妈妈要来看他,多么幸福的事情啊,羡慕ing。我都快两年没见过我妈妈了,想他们的时候经常一天打一次电话,不过最近几个月因为爸爸在,所以电话也少打了。
昨天晚上爸爸又打来电话,不知道为什么就敷衍了几句挂掉了。不知道从什么时候开始有点逃避和爸爸说话,也许是因为我做的不够好 ,或者说做的太差,所以有点无颜见江东父老的感觉。妈妈问我春节回家不,这个问题从5月份就开始问我一直问了估计有上千次了。我说,现在也定不下来,春节确实不想回来,票太贵人太多我又晕车的厉害。妈妈说那你到底回不回,我说不回吧,明年回。妈妈很失望说你不想看看我吗,我可想看看你呀。我说哎呀,我还不是那样,嘿嘿~一笑带过,其实我也何尝不想家,不想妈妈?但是~~不知道为什么,就是不愿意回去。
也许是爸爸曾经说过的一些话,其实也不算是伤害我,我觉得是我太没用了。作为长女,我觉得我应该承担家庭的一切,爸爸老了,真的老了,上次见到他的时候我真的很心疼。可是我好象又不能为他们做的更多一点。每次在他们唠叨的时候,我只有默默的承受,安静的听着,做不到更多的事情,听听他们的唠叨,就算是为自己的心灵赎罪。
小林结婚了,交了漂亮女朋友,买了房子。这是我小学同学,我的邻居。我父母偶尔会在我面前提起,昨天又说小林快结婚了怎么怎么的。没心思去听。只是在想什么时候也可以象小林那样,做个父母的好孩子~ 收起阅读 »
昨天晚上爸爸又打来电话,不知道为什么就敷衍了几句挂掉了。不知道从什么时候开始有点逃避和爸爸说话,也许是因为我做的不够好 ,或者说做的太差,所以有点无颜见江东父老的感觉。妈妈问我春节回家不,这个问题从5月份就开始问我一直问了估计有上千次了。我说,现在也定不下来,春节确实不想回来,票太贵人太多我又晕车的厉害。妈妈说那你到底回不回,我说不回吧,明年回。妈妈很失望说你不想看看我吗,我可想看看你呀。我说哎呀,我还不是那样,嘿嘿~一笑带过,其实我也何尝不想家,不想妈妈?但是~~不知道为什么,就是不愿意回去。
也许是爸爸曾经说过的一些话,其实也不算是伤害我,我觉得是我太没用了。作为长女,我觉得我应该承担家庭的一切,爸爸老了,真的老了,上次见到他的时候我真的很心疼。可是我好象又不能为他们做的更多一点。每次在他们唠叨的时候,我只有默默的承受,安静的听着,做不到更多的事情,听听他们的唠叨,就算是为自己的心灵赎罪。
小林结婚了,交了漂亮女朋友,买了房子。这是我小学同学,我的邻居。我父母偶尔会在我面前提起,昨天又说小林快结婚了怎么怎么的。没心思去听。只是在想什么时候也可以象小林那样,做个父母的好孩子~ 收起阅读 »
天津市交通违章摄录专点分布!开车的一定注意!
天津市交通违章摄录专点分布!开车的一定注意!
常言道“常在河边走,哪能不湿鞋”大家天天在外面开车,难免会遇到摄像的警察叔叔,本人冒死窃得部分摄录点的分布,以飨各位GGJJDDMM们。
凡是路名,表示有警察叔叔在整条路上巡逻摄像;路口一般有固定摄像头和流动的警察叔叔共同把守。希望大家首先要严于律己,不要违章;同时尽量降低自己的损失,别把自己的辛苦钱都赞助了交通事业!
最新内部资料!!!!!!!!!!
和平区:
新兴路 违反临时停车规定
电台道 违反临时停车规定
解放路 违反机动车停放规定
柳州路 违反临时停车规定
南京路 违反临时停车规定
解放彰德 人行道网状线区停车
荣业街闸口街 逆向行驶的
兴安路多伦道 违反标志标线指示的
荣业街 不按规定车道行驶的
张自忠路 违反机动车停放规定
海光寺岗北口 违反排队缓慢行驶规定
大沽路 违反临时停车规定
西宁道 违反临时停车规定;
西康路 违反临时停车规定
南京徐州 违反标志标线指示的
卫津路广播电台门前 人行道网状线区停车
柳州路与潼关道交口未按规定使用转向灯
解放北路逆向行驶的
营口道违反灯光使用规定 气象台路违章停车
解放北路第一饭店和峰光酒楼之间路口,在峰光酒楼楼上有摄像头
解放北路由泰安道向曲阜道方向注意别压黄线
昆明路违反临时停车规定 电台道,单行,与卫津路交口处有摄录头
南开:
水上北路与东路交口 违反标志标线指示的
南开三马路 违反临时停车规定
迎水道飞鸿路交口 违反机动车停放规定
飞鸿路久华里1号楼下 违反机动车停放规定
黄河道南开区政府门前 人行道网状线区停车
鞍山西道 违反临时停车规定
城厢中路 违反机动车停放规定
南马路 违反机动车停放规定
八里台长途车站立交桥北 逆向行驶的
东南角 遇红灯继续通行
南丰路义兴里交口 逆向行驶的
长江道公交三厂门前 人行道网状线区停车
复康路桥下(王顶堤)逆行
华苑信义道右转违反灯光使用
南开区西湖道违反临时停车规定
白堤路违反临时停车规定
三潭路违反临时停车规定
西湖道卫津路口违反行车规定的
河东:
昆仑北路卫国道立交桥下 违反标志标线指示的
天山路 违反临时停车规定
河东易初莲花的肯德鸡门口,违章停车
华昌大街与新兆路交口 违反标志标线指示的
八纬路与八经路交口 未按规定时间道路行驶
河东广宁路和津塘路口
六纬六经至赤峰桥之间路段 不在机动车道内行驶的
张贵庄路雪莲桥旁逆向行驶的
河东家世界(就是第六大道)有拍逆行
卫国道与沙柳路交口逆向行驶的
河北:
金钟桥大街 违反临时停车规定
海河东路 违反标志标线指示的
中山北路 违反排队缓慢行驶规定
中山路人行道 违反临时停车规定
海河东路平安街 逆向行驶的
狮子林大街逆向行驶的
普济河道立交桥下逆向行驶的
普济河道立交桥上逆向行驶的
金钟河大街 违反机动车停放规定
金海道 违反标志标线指示的
狮子林大街上,米兰家园正对的那个路口禁止左拐
李公楼下桥左拐华龙道压线
勤俭桥底下,左拐,要拐大弯,否则逆行
东站.建国道的路口违章下人
河西:
永安道 违反临时停车规定
友谊北路 违反临时停车规定
解放南路 违反临时停车规定
利民道 违反临时停车规定
紫金山路韩江道交口 遇红灯继续通行
隆昌路 违反临时停车规定
福建琼州 未按规定时间道路行驶
尖山路设施处口 人行道网状线区停车
黄埔南路 违反临时停车规定
大沽南路洪泽路路口 逆向行驶的
黑牛城道和尖山路口
下瓦房琼州道从解放南路往河西医院方向违反规定的
河西区宾水道麦当劳门口压越网格线
红桥:
一号路 违反机动车停放规定
西青复兴 不按规定车道行驶的
金华桥北 违反标志标线指示的
西青道家乐超市出口 违反标志标线指示的
复兴路与先春园西街交口 违反标志标线指示的
中环西青 违反标志标线指示的
光荣道与咸阳北路交口西口 违反标志标线指示的
芥园道违反机动车停放规定
红桥纪念馆路与平津道交口 违反标志标线指示的
西青道的登发门口不按规定调头
芥园西道冶金路口 遇红灯继续通行
西青道的登发门口,非机动车道行驶
咸阳路和光荣道交口,占用非机动车道拐弯
古文化街门前违章停车
如果是在市内违法被摄录,必须到市内六区任何一个支队交纳罚款。
接受交通违法处理地点及处理时间:当事人可以在每周一至周五8:00-12:00、14:00-17:30到以下地点接受处罚。
和平支队:和平区康定路35号增2号
河西支队:河西区洞庭路35号增1号
河东支队:河东区张贵庄路59号
河北支队:河北区王串场富强道7号
红桥支队:红桥区光荣道保康路红桥机动车检测线院内
南开支队:南开区雅安道13号
卫国道大队:河东区卫国道93号
南马路大队:南开区城厢中路899号 收起阅读 »
常言道“常在河边走,哪能不湿鞋”大家天天在外面开车,难免会遇到摄像的警察叔叔,本人冒死窃得部分摄录点的分布,以飨各位GGJJDDMM们。
凡是路名,表示有警察叔叔在整条路上巡逻摄像;路口一般有固定摄像头和流动的警察叔叔共同把守。希望大家首先要严于律己,不要违章;同时尽量降低自己的损失,别把自己的辛苦钱都赞助了交通事业!
最新内部资料!!!!!!!!!!
和平区:
新兴路 违反临时停车规定
电台道 违反临时停车规定
解放路 违反机动车停放规定
柳州路 违反临时停车规定
南京路 违反临时停车规定
解放彰德 人行道网状线区停车
荣业街闸口街 逆向行驶的
兴安路多伦道 违反标志标线指示的
荣业街 不按规定车道行驶的
张自忠路 违反机动车停放规定
海光寺岗北口 违反排队缓慢行驶规定
大沽路 违反临时停车规定
西宁道 违反临时停车规定;
西康路 违反临时停车规定
南京徐州 违反标志标线指示的
卫津路广播电台门前 人行道网状线区停车
柳州路与潼关道交口未按规定使用转向灯
解放北路逆向行驶的
营口道违反灯光使用规定 气象台路违章停车
解放北路第一饭店和峰光酒楼之间路口,在峰光酒楼楼上有摄像头
解放北路由泰安道向曲阜道方向注意别压黄线
昆明路违反临时停车规定 电台道,单行,与卫津路交口处有摄录头
南开:
水上北路与东路交口 违反标志标线指示的
南开三马路 违反临时停车规定
迎水道飞鸿路交口 违反机动车停放规定
飞鸿路久华里1号楼下 违反机动车停放规定
黄河道南开区政府门前 人行道网状线区停车
鞍山西道 违反临时停车规定
城厢中路 违反机动车停放规定
南马路 违反机动车停放规定
八里台长途车站立交桥北 逆向行驶的
东南角 遇红灯继续通行
南丰路义兴里交口 逆向行驶的
长江道公交三厂门前 人行道网状线区停车
复康路桥下(王顶堤)逆行
华苑信义道右转违反灯光使用
南开区西湖道违反临时停车规定
白堤路违反临时停车规定
三潭路违反临时停车规定
西湖道卫津路口违反行车规定的
河东:
昆仑北路卫国道立交桥下 违反标志标线指示的
天山路 违反临时停车规定
河东易初莲花的肯德鸡门口,违章停车
华昌大街与新兆路交口 违反标志标线指示的
八纬路与八经路交口 未按规定时间道路行驶
河东广宁路和津塘路口
六纬六经至赤峰桥之间路段 不在机动车道内行驶的
张贵庄路雪莲桥旁逆向行驶的
河东家世界(就是第六大道)有拍逆行
卫国道与沙柳路交口逆向行驶的
河北:
金钟桥大街 违反临时停车规定
海河东路 违反标志标线指示的
中山北路 违反排队缓慢行驶规定
中山路人行道 违反临时停车规定
海河东路平安街 逆向行驶的
狮子林大街逆向行驶的
普济河道立交桥下逆向行驶的
普济河道立交桥上逆向行驶的
金钟河大街 违反机动车停放规定
金海道 违反标志标线指示的
狮子林大街上,米兰家园正对的那个路口禁止左拐
李公楼下桥左拐华龙道压线
勤俭桥底下,左拐,要拐大弯,否则逆行
东站.建国道的路口违章下人
河西:
永安道 违反临时停车规定
友谊北路 违反临时停车规定
解放南路 违反临时停车规定
利民道 违反临时停车规定
紫金山路韩江道交口 遇红灯继续通行
隆昌路 违反临时停车规定
福建琼州 未按规定时间道路行驶
尖山路设施处口 人行道网状线区停车
黄埔南路 违反临时停车规定
大沽南路洪泽路路口 逆向行驶的
黑牛城道和尖山路口
下瓦房琼州道从解放南路往河西医院方向违反规定的
河西区宾水道麦当劳门口压越网格线
红桥:
一号路 违反机动车停放规定
西青复兴 不按规定车道行驶的
金华桥北 违反标志标线指示的
西青道家乐超市出口 违反标志标线指示的
复兴路与先春园西街交口 违反标志标线指示的
中环西青 违反标志标线指示的
光荣道与咸阳北路交口西口 违反标志标线指示的
芥园道违反机动车停放规定
红桥纪念馆路与平津道交口 违反标志标线指示的
西青道的登发门口不按规定调头
芥园西道冶金路口 遇红灯继续通行
西青道的登发门口,非机动车道行驶
咸阳路和光荣道交口,占用非机动车道拐弯
古文化街门前违章停车
如果是在市内违法被摄录,必须到市内六区任何一个支队交纳罚款。
接受交通违法处理地点及处理时间:当事人可以在每周一至周五8:00-12:00、14:00-17:30到以下地点接受处罚。
和平支队:和平区康定路35号增2号
河西支队:河西区洞庭路35号增1号
河东支队:河东区张贵庄路59号
河北支队:河北区王串场富强道7号
红桥支队:红桥区光荣道保康路红桥机动车检测线院内
南开支队:南开区雅安道13号
卫国道大队:河东区卫国道93号
南马路大队:南开区城厢中路899号 收起阅读 »
一只熊不孤单,想一只熊才孤单
天气: 冷心情: 平静对桌的儿子拿来一只小什么什么熊
样子象只小小小的小老鼠
小小的眼睛亮亮的
小小的耳朵支棱着
小小的尾巴一点点
小小的两只前爪常常翘了起来东张西望一翻
或者缩在胸前让自己站成一个椭圆的球
然后慢慢低垂了小小的脑袋闭了眼缩成一个灰黑的乒乓球睡去
吃东西的时候也是用两只前爪捧着迅速地啃食
好象怕别人抢了去似的
吃完还会用小爪子洗洗小脸捋捋小胡子
每天就这样吃睡溜达溜达
看久了,忽然觉得他好孤单哦
没有伴说话,没有伴玩耍,过着一只熊的孤单日子
他有没有烦恼呢?有没有觉得寂寞?
也许,一只熊不孤单,想一只熊才孤单。。。 收起阅读 »
样子象只小小小的小老鼠
小小的眼睛亮亮的
小小的耳朵支棱着
小小的尾巴一点点
小小的两只前爪常常翘了起来东张西望一翻
或者缩在胸前让自己站成一个椭圆的球
然后慢慢低垂了小小的脑袋闭了眼缩成一个灰黑的乒乓球睡去
吃东西的时候也是用两只前爪捧着迅速地啃食
好象怕别人抢了去似的
吃完还会用小爪子洗洗小脸捋捋小胡子
每天就这样吃睡溜达溜达
看久了,忽然觉得他好孤单哦
没有伴说话,没有伴玩耍,过着一只熊的孤单日子
他有没有烦恼呢?有没有觉得寂寞?
也许,一只熊不孤单,想一只熊才孤单。。。 收起阅读 »
送给自己
愿你早日搞定所谓的定置管理图
转帖] The Politics Of Accelerated Stress Testing
版权属于作者
The Politics Of Accelerated Stress Testing
Edmond L. Kyser, Eugene R. Hnatek, and Mark H. RoettgeringCompaq Computer CorporationEnterprise Computing Group - Tandem Business UnitCupertino, California
BIOGRAPHIES
Edmond L. Kyser is Principal Member of the Technical Staff for the Tandem Division of Compaq, where he has technical responsibility for Accelerated Stress Testing. He holds eight US patents and has published 12 articles, nine on Accelerated Stress Testing. His Ph.D. is from UC Berkeley in Applied Mechanics.
Eugene R. Hnatek is director of the Tandem Product Evaluation Center where he is involved in complete hardware product assurance activities from early design through first customer ship. In this regard, he is intimately involved with HALT and ESS processes. Prior to this assignment he was component Engineering Manager at Tandem. He is a recognized authority on integrated circuit quality and reliability having published 11 books on the topic.
Mark H. Roettgering is a Senior Member of the Technical Staff for the Tandem Division of Compaq, where he serves as a program manager and as an internal consultant on strategic and operational issues. Prior to this assignment, he worked on fault-tolerant system design and hardware quality assurance at Tandem. Mark holds a B.S. in Electrical Engineering from UC Davis and an M.S. in Engineering Economic Systems & Operations Research from Stanford.
ABSTRACTThe technical literature and various technical conferences delve into the myriad details of the ESS process, the ESS profiles to be used for testing, the required equipment characteristics, etc. Most everything that can be written about the virtues of ESS and the inherent technical details has been written.
We contend that it is not the technical aspects of ESS that dominate decision-making: The real issues, for most companies, are of a political nature. ESS implementations become political when the functional organizations that bear the short-term costs of ESS do not get credit for the long-term benefits. Various factions within most large corporations rise to the surface to question processes like ESS from a self-serving viewpoint. Justifying the need for continuing with ESS eats up a lot of time in meetings, evaluating databases and developing position presentations. In this paper we discuss these commonly encountered political issues, provide a process for resolution of these issues, and conclude with recommendations for corporate ESS management. KEYWORDSEnvironmental Stress Screen (ESS), Net Present Value, Uncertainty, Decision-Making.
BACKGROUNDToday’s fast time to market and concern with low price may be taking our focus off quality and reliability. Frank Burge of Electronic Engineering Times in his September 27, 1999 editorial put it this way. “In a world where price is king, are we painting ourselves into a corner—a corner where design quality gives way to price or time, eliminating steps in the design verification/test process or choosing suppliers strictly on price? Are we back to making the numbers at any cost?”
Figure 1: Product Flow for AST Programs
The decision whether or not to perform ESS on a specific product is a typical example of the quality vs. cost problem with which many companies struggle, including our own. One of the problems in being able to make a decision based on data is the fact that very little real data (whether from current or equivalent products) is available to determine the value of ESS. Typically at stake are millions of dollars in investment capital, thousands of square feet of manufacturing floor space, tens of person years, and the reputation for quality and possibly the profitability of the corporation. A typical product flow diagram for Accelerated Stress Test (AST) processes is shown in Figure 1. Many separate stakeholders of the corporation are involved in this complex process, the core of which is manufacturing ESS: Product Development, Manufacturing, Field Service, Engineering Services, Sustaining Engineering and Information Services. Figure 2 illustrates a common hierarchy of these groups, each of which typically has its own agenda and point of view. Traditional guidelines, established product requirements documents, and
Figure 2: Generic Corporate Reporting Structure
standard procedures may not be sufficient or appropriate. Benchmarking is difficult. Evangelizers for specific approaches to increasing reliability are quick to offer their services and opinions, often at loggerheads with one another. Industry standards are rare and often ambiguous. Perhaps most significantly, the benefits (and the associated costs) realized from the program do not accrue proportionately to the functional units that bear the costs.
The goal of an ongoing AST program, such as implementation of manufacturing ESS, is to make cost effective improvements in the field reliability of the hardware being tested. Figure 3 shows a normalized field failure distribution for five recent Tandem products, all of which undergo 100% manufacturing ESS.
Figure 3: Field Data - Part Replacement Rate
Figure 4a: ESS Support
Figure 4b: ESS Opposition
All of the products represented in Figure 3 show the same pattern of a high initial return rate that decreases more or less asymptotically to a stable return rate in about two years. This is a classic characteristic of products that are most likely to benefit from an ESS program.
SURVEY OF ATTITUDES ON ESSDuring a recent IEEE workshop on Accelerated Stress Testing, we conducted a survey of attitudes towards AST to determine if there was a ‘common experience’ among industry practitioners that could be leveraged as the science evolves. The issue was defined as “Within your company, where do you see support for or opposition to ESS, and why?” The organizational results are summarized in Figures 4a and 4b. The respondents’ reasons behind the support and opposition are shown in the Tables 1a and 1b.
As the comments in the tables indicate, many of the reasons given are similar and can therefore be combined. The resulting ‘grouped’ categories of opposition and support are shown in Tables 2a and 2b. In cases where the reasons appeared to be ambiguous, require other processes to be considered, or deal with educational or organizational issues, the category ‘out of scope’ was used. ‘Out of scope’ does not imply that the reasons are invalid, just that they will not be addressed in detail in this paper.
Table 1a: Reasons for Supporting ESS
key
#
Stated Reason
Comments
a
11
Increased reliability / quality
Hard to measure - hard to quantify benefits - compare to n
b
9
Sales advantage / customer satisfaction
Same as a, but more difficult to quantify
c
6
Reduce field service costs
Equivalent to a
d
4
Reduce DOA / Early life fails
Equivalent to a
e
3
Identify failure modesin-house
Benefits seen only by redesigning to avoid failure modes
f
2
Better Product
Equivalent to a
g
2
Reduced field returns
Equivalent to a
h
2
More efficient than run-in
Weibull analysis can help determine this
i
1
Identify process failures
Equivalent to a + e
j
1
Improve yields
Equivalent to e
Table 1b: Reasons for Opposing ESS
key
#
Stated Reason
Comments
k
12
Additional cost
Virtually all opposition is cost based - Easier to measure than benefits
l
10
Outside of Component specs, design limits
Equivalent to n
m
5
Additional WIP Time
Additional step assumes all else equal - part of k
n
5
Decreases manufacturing yields
Easy to measure, easy to quantify. Compare to a
o
5
Afraid of damaging good product
See comments on a - effect on reliability is uncertain
p
3
Seen as critical of known good process
‘Known good’ implies improved reliability is of no benefit or screen is no good
q
2
Don’t understand process
Education issue
r
2
Difficult test to run / diagnose failures
Part of k + t
s
1
Additional handling problem
Equivalent to k + m
t
1
Repair costs
Part of k, s
u
1
run-in more efficient
See h
v
1
Doesn’t believe in benefits
See a
As Tables 2a and 2b indicate, we are left with two potential sources of benefit, and a large bucket containing severalcost factors: additional time, reduced manufacturing yields, test costs, and repair costs. The fear of product damage will be handled explicitly as part of the question of improved reliability.
Table 2a: Revised Reasons for ESS Support (Benefits)
key
#
Stated Reason
Comments
a
25
Increased reliability / quality
Hard to measure - hard to quantify benefits - compare to n
b
9
Sales advantage / customer satisfaction
Same as a, but more difficult to quantify
*
7
Out of scope
Table 2b: Revised Reasons for ESS Opposition (Costs)
key
#
Stated Reason
Comments
k
19
Additional cost
Virtually all opposition is cost based - Easier to measure than benefits
m
15
Decreases manufacturing yields
Easy to measure, easy to quantify. Compare to a
n
5
Additional Time
Additional step assumes all else equal - part of k
o
5
Afraid of product damage
See comments on a - effect on reliability is uncertain
*
4
Out of scope
The survey results we have been discussing represent the opinions of 32 individuals from 22 corporations active in ESS. One of the most striking results is that the same issues, or organizations, appear in BOTH the positive and negative columns. Obviously, there are strong differences of opinion, and a lack of mutually acceptable (accurate and meaningful) data on which to base decisions. This is equivalent to stating that there is a high degree of uncertainty about many important aspects of a manufacturing ESS program. Without a structured methodology in place to address this uncertainty, a common ground within the corporation may never be found.
We maintain that what is needed is a common metric of success that accommodates all of the above ‘reasons’ – since all are valid in the opinion holder’s frame of reference. How is one to ‘net out’ all the above positives and negatives? The problem can be formulated as follows:
We propose that the metric of success is the dollar, and the method of ‘netting out’ the positives and negatives is to discount all cash flows to net present value and calculate a net present cost. Rather than taking a ‘best guess’ at exact amounts of the costs and benefits, all uncertainty should be explicitly stated so that conflicting opinions about possible outcomes can be addressed simultaneously. This process is detailed in the following section. DECISION MODELA review of the pluses and minuses of ESS raised by the practicing community quickly reveals the major source of organizational problems that arise in an ESS implementation. The majority of the costs are easily identified and can be quantified with a high degree of accuracy. The manufacturing organization bears essentially all costs - using many common manufacturing metrics (end-to-end yield, inventory turns, WIP days, etc.) ESS is a negative. On the other hand, the benefits, while identifiable, possess the following characteristics. They are highly uncertain, difficult to quantify with any degree of accuracy, difficult to measure, require an explicit value statement by management, and are not immediately realized. The benefits are realized by the corporation as a whole, essentially through downstream cost-avoidance (lower field service and warranty costs) and through increased sales (product reputation).
It can be said that the problem with ESS acceptance is that it is high in both organizational and technical complexity. Technical complexity arises from the large number of strategic and operational decisions and processes that need to be in place for an ESS program to function in an efficient manner. Organizational complexity is inherent when...
Costs and benefits are realized by different groups.
Uncertainty allows a variety of advocates and opponents to champion opinions without fear of refutation by data.
There is a lack of strong cross-functional leadership from management.
Unfortunately, management attempts to solve problems of this nature by attacking the “people problem” first, through team-building, facilitation, consensus-building, etc. Despite these well-intentioned tactics, the underlying technical complexity invariably remains, and with it, the conflict. What is needed is a framework in which to solve the technical complexity first. Through creating a technically accurate and compelling business model, organizational disagreements can be addressed in a methodical and rigorous manner. Arguments like “Doesn’t believe the benefits” can be addressed by explicitly addressing which parts of the model are inconsistent with the beliefs of the opponent. Consequently, if the model is agreed to, and the inputs are agreed to, the resulting ‘netted-out’ cost or benefit of ESS should stand on its own, leaving nebulous and ambiguous arguments without legs.
We propose a normative decision model as the best method for solving the technical complexities of ESS. In this framework, we must first clearly identify what exactly we are modeling. Stated here: “What is the net present value of all future product costs for a unit which is to undergo ESS subtracted from the net present value of all future product costs for a unit which will not undergo ESS?” We call this quantity Net Present Savings or NPS.
The NPS we compute is a marginal savings on a per-unit basis. This eliminates the requirement to consider facility and capacity issues. We also assume that all other manufacturing processes remain the same: we do not explicitly consider the potential benefit of reduced run-in times here, although the framework allows for it. One last assumption is that we are discussing a particular ESS screen for a particular product: the selection or modification of screen parameters to maximize NPS is not performed here, although we have used the methodology to do parameter optimization at Tandem/Compaq. The model and theoretical results discussed in the following analysis were built using Analytica® analysis software from Lumina Decision Systems.
The influence diagram of Figure 5 illustrates the factors that have been included in our model. Based on the factors identified in the ESS survey, we will model seven uncertain – or random – variables (single ovals), and
Figure 5: Influence Diagram
Table 3: Model Variables and their Descrīptions
Key
Variable Name
Units
Comments
Value
A
Cost of Inventory
%/week
Includes Depreciation and liquidity effects
Lognormal(0.5, 1.5)
B
Test Failure Repair Cost
NP$
Material and Labor costs for debug and repair
Lognormal(1500, 1.5)
C
Time to Repair Test Failure
Weeks
WIP time
Lognormal(6, 1.5)
D
Replacement Cost of Field Failure
Future$
Material and Labor (warranty costs)
Lognormal(H/2, 1.25)
E
MTBF of Unscreened Unit
Years
Mean Time Before Failure
Lognormal(5, 1.5)
F
Impact of ESS on MTBF
%
Factor by which ESS improves unit MTBF
Normal(20%, 15%)
G
Operational Costs
NP$
Variable cost only (no fixed costs)
Lognormal(50, 1.5)
H
Whole Product Cost
NP$
Used to calculate inventory, depreciation, and replacement costs
5000
J
ESS Yield
-
Probability of passing ESS screen
90%
K
IBP for Field Failure
Future$
2000
M
Cost of Capital
%/year
Time vs. money discount rate
15%
N
Cost of Test Failures
NP$
Total cost of fail, debug, repair cycle
((1/J)-1)(B+H(((1+A)^C)-1))
P
Total Field Failure Cost
Future$
Includes direct and indirect costs
D+K
R
MTBF of Screened Unit
Years
See E.
E(1+F)
T
Total Cost
NP$
Total additional cost of ESS
N+G(1/J)
W
Total Benefit
NP$
Total downstream benefit per unit derived from ESS
(P/(1+M)^R)-(P/(1+M)^E
X
NPS
NP$
Per Unit Net Present Savings
W-T
NP$ is Net Present Dollars. Future$ is dollars not discounted to present value.Lognormal(x, y) is a distribution with mean x, and geometric standard deviation y. The range [x/y, x*y] contains about 68% of the probability mass.
four constant variables (trapezoids). Double ovals indicate deterministic variables (those that are known exactly once the inputs are known). A summary of the model variables is given in Table 3. Values are representative of our experience with a broad range of CPU products.
We use the lognormal distribution to express the uncertainty in almost all random variables included in this model. The lognormal has a sharp lower bound of zero and is positively skewed. For most cost and time parameters, these characteristics are highly desirable.
‘Field Failure Cost’ is one of the more difficult parameters in the model for most corporations to assess. We have broken it into two parts based on the results of the conference survey discussed above: Replacement cost, or warranty cost, and reputation cost. Replacement cost can be assessed directly through careful consideration of all contributing costs, but reputation cost (re-buy, word-of-mouth, etc.) may best be derived by discussing the Indifferent Buying Price (IBP) of a field failure. Suppose there were a wizard who was able to perform the following feat: Moments before a field failure is about to occur, the wizard calls the CEO of your company and offers to allow you to secretly swap out the failing unit before the failure takes place – for a price.
The CEO’s IBP for the field failure is the price at which she is indifferent between paying the wizard or not: the CEO would pay any lower price (in addition to the replacement cost), but would refuse to pay any more. Although IBP for field failures may be different for the same product depending on customer and application differences, a well thought out value for the IBP will be equivalent to the ‘reputation cost’ of a failure. Both replacement and reputation costs are valued at the time in the future at which the failure takes place.
With this groundwork in place, our model simply computes the ‘Total Benefit’ per unit for performing ESS as the difference between the present value of the total failure cost of a screened unit vs. an unscreened one.
Figure 6 displays the results of the model discussed above. The expected value of NPS is $180/unit: a good return on a $50 test. The cumulative distribution of NPS contains much more information, however. Indeed there is a 30% chance that this generic ESS program will lose money on a per unit basis. On the other hand, there is just as likely a chance that a net benefit of more than $325 per unit will be realized.
Figure 6: Probabilistic Model Output
Figure 7: Sensitivity of NPS to Variations in Screen Yield
Figure 8: Sensitivity of NPS to Yield and IBP
Any dispute with the conclusion that the hypothetical ESS program represented by this model and corresponding parameters is a ‘good bet’, should be stated in the context of the model or its parameters rather than with more abstract terms. A ‘good bet’ is a deal with an uncertain but positive expected outcome. By encoding differing points of view in the form of parametric uncertainty, and incorporating all stakeholders concerns into the model structure, discussions are moved from the political realm into the technical one.
Although this is a generic example, it is useful to demonstrate how insights may be gained through further analysis. One such analysis may be to address the following concern: “What if the screen yield required to achieve a 20% improvement in MTBF is either higher or lower than 90%?”Figure 7 shows the mean (average, or expected, value) NPS as a function of screen yield. As can be seen, any screen parameter set with a yield higher than 83% would be considered valuable. Similarly, the effect of IBP on the value of our generic ESS program can be investigated graphically in Figure 8. For a yield of 90%, the program would still have a mean value of $60 per unit even if the reputation cost (IBP) of a failure were valued at $0.
Finally, it is enlightening to examine the degree to which the uncertainty in the input variables contributes to the variation in the output variable, NPS. Table 4 lists the absolute rank-order correlation between NPS and the listed uncertain inputs. This analysis indicates that (as expected) the greatest opportunity to reduce uncertainty in the value of this hypothetical ESS program is to refine the ‘Impact of ESS on MTBF’ estimate.
Conversely, expenditures of effort on refining any of the bottom four variables in the table will do little to reduce the uncertainty in the estimate of per unit NPS.
Table 4: Input Variable Importance
Variable
Importance
Impact of ESS on MTBF
0.871
Replacement Cost of Field Failure
0.337
Test Failure Repair Cost
0.211
MTBF of Unscreened Unit
0.086
Time to Repair Test Failure
0.040
Operational Costs
0.013
Cost of Inventory
0.001
DATAWhen an ESS program is initiated, there must be decisions made in the face of many uncertainties. As the program progresses, real data becomes available, and the initial probability estimates can be replaced by real numbers. In this section, we show some of the manufacturing and field data collected at Tandem division of Compaq relating to our ESS program, and answer some of the issues raised earlier.
One roadblock to ESS is usually stated as follows: “We can’t afford the yield loss in manufacturing caused by ESS”. Translation: “We need to ship every unit we build in order to make our revenue target. We can’t worry about reliability at this point. ESS yields cost us money, both in shippable units (lost revenue) and reworked or scrapped PWAs”. Let’s look at two examples of the yield of CPU PWAs subjected to manufacturing ESS. Figure 9 is a composite bar chart showing combined manufacturing ESS yields for five different CPU products. Note that the ESS yield remains essentially constant. As process and component problems were solved, new problems emerged and were addressed. In this case, given the complexity of the products, 100% ESS was
Figure 9: PWA ESS Yields
Figure 10: Manufacturing Yield by PWA type for 3Q97
required for the entire life of each product. Figure 10 shows a breakout detail of the products included in last bar of Figure 9, and adds Post-ESS yields for each of the five. This chart shows the value of conducting ESS in production and the potential impact of loss in system test or the field if ESS was not conducted. Notice the high ESS yield of mature PWAs (PWAs #1-#3) but the low ESS yield of new boards (PWAs #4 and #5). The benefit of ESS for new products is evident here. Note particularly that the Post-ESS yields for both mature and immature products are equivalent, indicating that ESS is finding the latent defects. Nonetheless, the value of ESS must be constantly evaluated. At some point in time when yield is stable and high, it may make sense to discontinue its use for that PWA/product.
The ideal data set would allow the creation of a ‘failure rate vs. time’ plot, or hazard function, for a split population of screened and unscreened product. Unfortunately, this type of data is never available until several months or even years have elapsed. Figure 11 displays one real-world example. It took careful data mining of over five years of run-time and of more than 2000 total field installations to produce this information for a Compaq CPU server product. If this data were known ahead of time, ESS implementation decisions would have been simple: Screen yields would be known (92%), and so would the effect of the screen on the MTBF of shipped product (14%). Assuming all other model parameters discussed earlier apply to the product producing this field data, the cumulative distribution in Figure 12 reflects the value of its ESS program. The expected value is $110, with only a 20% chance of being negative.
Figure 11: Failure Rate vs. Time for Split Population
Figure 12: Calculated NPS using Value Model and Field Data
The amazing thing about this field data when applied back into the model is that our screening decision and our expected value remain virtually the same. The main benefit we gained through the gathering of field data is that our uncertainty about the true vale of NPS has been reduced. The 10%-90% range has shrunk from [-260, 650] to [-20, 260]. Considering that five years after data gathering began, the product is long past end-of-life. The reason a structured framework for dealing with this uncertainty about the future is so valuable is that it allows corporations to make the best decision possible – at the time the decision needs to be made. CONCLUSIONSDecisions relating to Environmental Stress Screening involve political aspects within a corporation to far greater a degree than almost any other manufacturing process. A correct decision on whether to perform ESS or not on a particular product requires a measure of success (metric) that is acceptable to all of the different corporate stakeholders. We suggest that Net Present Value of all associated costs and benefits is the metric of choice.
In addition to a robust cost-benefit model, the nature of ESS requires that there be a strong ‘company champion’ for ESS at a high level who can adjudicate the inevitable disagreements, focus on corporate goals, and provide direction for the discipline. However, the champion needs to remember that the goal of ESS is not “reliability at any price”, but rather “reliability at the right price”.
Finally, since ESS decisions assisted by a model of NPS will be based on probabilities estimated before actual data exists, yield and failure data must be obtained to verify the initial probabilistic estimates. The resulting data should be used to improve screening de 收起阅读 »
转帖] Economic Justification of Halt Tests
版权属于作者
转帖] Economic Justification of Halt Tests
Economic Justification of Halt Tests: The relationship between operating margin, test costs, and the cost of field returns Edmond L. Kyser and Nahum Meadowsong Cisco Systems, Inc.Email: ekyser#cisco.com nameadow#cisco.com
Introduction
Increasing pressures for cost reduction
Prototype build is a leading cost item in product development
Halt test requires destruction of (at least) one prototype at a critical stage (earliest stable hardware and software)
How is this cost best justified?
The benefit of a Halt test is AVOIDED COST –
Improved reliability (reduced RMA rate)
Reduced test cost (eliminate/reduce ORT, RDT)
What is the relationship between Halt results (operating margin) and RMA rate?
Under what conditions is the cost of Halt justified?
Operating Margin vs. ReliabilityCan Halt tests be used to predict field performance?
No !A Halt test is NOT a highly accelerated life test – it’s not a life test at all. There is no appropriate acceleration algorithm, no acceleration factor. The deliverables of a Halt test are operating margin and failure modes.
Yes !Operating margin is an indicator of field performance. Low margins indicate poor performance (short life), and high margins indicate good performance (long life). Halt tests determine operating margin, and failure modes show where margins may be improved.
The issue of relating Halt results (operating margin) to field reliability is NOT a yes/no issue, but rather how to express the relationship correctly.
It will be an Empirical relationship
It will not be independent (There will be other variables)
It will be probabilistic in nature - confidence interval
It will be product dependent
The following plot shows an Empirical relationship between Operating margin and Reliability for similar products (high performance line cards for Cisco routers)
Other Independent variables influence RMA rate:
Board complexity, measured by active component count-Includes ASIC, IC, FPGA, Transistors, Crystals, Diodes
The following plot shows Normalized RMA rate vs. parts count, FOR A LARGER RANGE OF PRODUCT.
The correlation is about 1/10 that of Normalized RMA vs Operating Margin for Line Cards
Cost justification 1: Eliminate RDT
The following 2 slides show traditional RDT and Halt RDT
Traditional RDT (80% confidence in MTBF > 75,000 hours) The dashed blue line shows RDT requires 40 boards tested for 10 weeks. (assuming 1 fail and Arrhenius acceleration due to 50C temp)
Halt RDT requires 1 board for 1 week. The dashed blue line shows 80% confidence for Operating margin of 30C indicating Normalized RMA rate below 0.55
Standard RDT
80% CONFIDENCE LEVEL
Cost Justification 2: Improve Reliability
From slide 6, if the operating margin is increased N °C , the normalized RMA rate is reduced 0.0192N
Each RMA costs approximately WPC$, where WPC is the cost of producing the board, the Whole Product Cost.
Thus the Benefit of a Halt test that increases Operating Margin N °C is
The benefit of a Halt testBenefit = (# of RMAs prevented) (cost of an RMA)= N(0.0192)(RMA intercept)(Pvol)(WPC)$WherePvol is the (annual) production volume
The cost of a Halt test Cost = WPC + Esalary + DEP + CON + CA
WhereEsalary = fully burdened weekly salaryDEP = Weekly Depreciation of equipmentCON = Consumable costsCA = Corrective action costs
The break-even point is where costs = benefits.
For a Halt test to be cost effective, it must, on average, increase the operating Margin N °C, where
N =
WPC + Esalary + DEP + CON + CA
(0.0192)(RMA intercept)(Pvol)(WPC)$ 收起阅读 »
转帖] Economic Justification of Halt Tests
Economic Justification of Halt Tests: The relationship between operating margin, test costs, and the cost of field returns Edmond L. Kyser and Nahum Meadowsong Cisco Systems, Inc.Email: ekyser#cisco.com nameadow#cisco.com
Introduction
Increasing pressures for cost reduction
Prototype build is a leading cost item in product development
Halt test requires destruction of (at least) one prototype at a critical stage (earliest stable hardware and software)
How is this cost best justified?
The benefit of a Halt test is AVOIDED COST –
Improved reliability (reduced RMA rate)
Reduced test cost (eliminate/reduce ORT, RDT)
What is the relationship between Halt results (operating margin) and RMA rate?
Under what conditions is the cost of Halt justified?
Operating Margin vs. ReliabilityCan Halt tests be used to predict field performance?
No !A Halt test is NOT a highly accelerated life test – it’s not a life test at all. There is no appropriate acceleration algorithm, no acceleration factor. The deliverables of a Halt test are operating margin and failure modes.
Yes !Operating margin is an indicator of field performance. Low margins indicate poor performance (short life), and high margins indicate good performance (long life). Halt tests determine operating margin, and failure modes show where margins may be improved.
The issue of relating Halt results (operating margin) to field reliability is NOT a yes/no issue, but rather how to express the relationship correctly.
It will be an Empirical relationship
It will not be independent (There will be other variables)
It will be probabilistic in nature - confidence interval
It will be product dependent
The following plot shows an Empirical relationship between Operating margin and Reliability for similar products (high performance line cards for Cisco routers)
Other Independent variables influence RMA rate:
Board complexity, measured by active component count-Includes ASIC, IC, FPGA, Transistors, Crystals, Diodes
The following plot shows Normalized RMA rate vs. parts count, FOR A LARGER RANGE OF PRODUCT.
The correlation is about 1/10 that of Normalized RMA vs Operating Margin for Line Cards
Cost justification 1: Eliminate RDT
The following 2 slides show traditional RDT and Halt RDT
Traditional RDT (80% confidence in MTBF > 75,000 hours) The dashed blue line shows RDT requires 40 boards tested for 10 weeks. (assuming 1 fail and Arrhenius acceleration due to 50C temp)
Halt RDT requires 1 board for 1 week. The dashed blue line shows 80% confidence for Operating margin of 30C indicating Normalized RMA rate below 0.55
Standard RDT
80% CONFIDENCE LEVEL
Cost Justification 2: Improve Reliability
From slide 6, if the operating margin is increased N °C , the normalized RMA rate is reduced 0.0192N
Each RMA costs approximately WPC$, where WPC is the cost of producing the board, the Whole Product Cost.
Thus the Benefit of a Halt test that increases Operating Margin N °C is
The benefit of a Halt testBenefit = (# of RMAs prevented) (cost of an RMA)= N(0.0192)(RMA intercept)(Pvol)(WPC)$WherePvol is the (annual) production volume
The cost of a Halt test Cost = WPC + Esalary + DEP + CON + CA
WhereEsalary = fully burdened weekly salaryDEP = Weekly Depreciation of equipmentCON = Consumable costsCA = Corrective action costs
The break-even point is where costs = benefits.
For a Halt test to be cost effective, it must, on average, increase the operating Margin N °C, where
N =
WPC + Esalary + DEP + CON + CA
(0.0192)(RMA intercept)(Pvol)(WPC)$ 收起阅读 »
[转帖] The Next Generation of Environmental Testing
The Next Generation of Environmental Testing
by William Lagattolla, Trace Laboratories-Central
HALT and HASS are starting to supplant traditional vibration and thermal testing to meet today’s quality targets.
For decades, product quality has been determined through environmental testing such as vibration, thermal cycling, mechanical shock, and thermal shock. More recently, there has been a significant trend in the marketplace to improve product quality even further.
The Need for Increased Product QualityOne of the most pervasive trends across a wide range of the consumer, industrial, and military markets is the need for increased product quality. In consumer markets, a high rate of product failure can result in the manufacturer’s loss of credibility with an attendant loss of sales, from which it can take years to recover. In industrial markets, a high failure rate can result in expensive field service calls or—potentially worse—significant downtime. In military markets, product failures can translate in the loss of lives.
Although the need for quality is increasing, certain developments are making it more difficult to maintain existing quality levels. The most challenging development has been the increased use of manufacturing subcontractors. The manufacturer whose name goes on a product is likely to be relying on an outside resource, a subcontractor, over which the manufacturer does not have direct control.
This subcontractor is relying on a number of vendors, further weakening the control that the manufacturer has on product quality. Should a product fail, the customer will blame the manufacturer—the one responsible for its quality level.
Another challenge to maintaining quality is a continually decreasing number of engineers with comprehensive QA/QC backgrounds at these manufacturing companies. Many of the highly experienced QA/QC engineers are retiring or being replaced by younger engineers who are far less experienced.
Traditional Vibration and Temperature TestingTraditional vibration and temperature testing has played an important role in the genesis of today’s reliable and sophisticated electronic and electromechanical products. The core philosophy of this testing method is to define a set of specifications, usually minimum or maximum temperatures and vibration levels, and conduct the tests by changing only one variable at a time. Vibration testing is performed one axis at a time. If the device still is functional after being tested according to the test specs, it is considered to have passed.
A passing result is a positive outcome. However, a pass result does not help identify the weakest link in the product. In other words, the traditional test cannot help the engineer make the product any more robust.
Furthermore, with the one-at-a-time change in environmental variables and the one-dimension vibration testing, the test specs are not similar to real-world operating environments. As a result, this kind of testing does not provide an accurate indication of how the product might perform in the field.
This critical look at traditional environmental testing is not intended to be a blanket condemnation of that process. After all, this kind of testing has played a key role in the evolution of today’s highly reliable products. Instead, this examination of certain weaknesses in classical environmental testing can be helpful in understanding how new testing methods, in particular HALT and HASS testing, can lead to even greater levels of product quality and reliability.
The Strengths of HALT and HASS Highly Accelerated Life Testing HALT exposes the product to a step-by-step cycling of environmental variables such as temperature, shock, and vibration. It involves simultaneous vibration testing in all three axes using a random mix of frequencies. Finally, HALT can include combinations of multiple environmental variables; for example, temperature cycling plus vibration testing.
Unlike conventional testing, the goal of HALT is to break the product. When the product fails, the weakest link is identified so engineers know exactly what needs to be done to improve product quality.
After a product has failed, weak components are upgraded or reinforced. The revised product then is subjected to another round of HALT, with the range of temperature, vibration, or shock further increased so the product fails again. This identifies the next weakest link.
Figure 1. Headlights, Front, On
By going through several iterations like this, the product can be made quite robust. With this informed approach, only the weak spots are identified for improvement. This type of testing provides so much information about the construction and performance of a product that it can be quite helpful for newer engineers assigned to a product with which they are not completely familiar.
HALT must be performed during the design phase of a product to make sure the basic design is reliable. But it is important to note that the units being tested are likely to be handmade engineering prototypes. At Trace, we have found that HALT also should be performed on actual production units to ensure that the transition from engineering design to production has not resulted in a loss of product quality or robustness.
Some engineers may consider this approach as scientifically reasonable but financially unrealistic. However, our customers have repeatedly found that the cost of HALT is much less than the cost of field failures, service calls, blanket recalls, and loss of credibility or business due to poor product quality. One of our clients even includes HALT as a line item on its bill of materials to make sure this testing is included in the product cost right from the beginning.
Highly Accelerated Stress Screening HASS, an abbreviated form of HALT, is an ongoing screening test performed on regular production units. Here, the idea is not to damage the product but rather to verify that actual production units continue to operate properly when subjected to the cycling of environmental variables used during the HASS test. The limits used in HASS testing are based on a skilled interpretation of the HALT parameters but do not exceed a product’s operating limits.
The importance of HASS testing can be appreciated when you consider today’s typical manufacturing scenario. Circuit boards are purchased from a vendor who uses materials purchased from other vendors. Components and subassemblies are obtained from manufacturers all over the world.
Often, the final assembly of the product is performed by a subcontractor. This means that the quality of the final product is a function of the quality or lack thereof of all the components, materials, and processes that are a part of that final product. These components, materials, and processes can and do change over time, affecting the quality and reliability of the final product. The best way to ensure that production units continue to meet reliability objectives is through HASS testing.
Case HistoriesThe benefits of HALT/HASS testing can be seen in two case histories.
Automotive Lamp Assembly A manufacturer of automotive lamp assemblies (headlight, brake light, and third brake light units) provides an example of the benefits of using HALT/HASS throughout the development of a new product.
An engineer at this company decided to submit a production sample for an abbreviated suite of HALT. The unit failed, and it was redesigned. When submitted for a retest, a full HALT was performed, with the power to the bulbs in the assemblies cycled on and off during the testing process. During HALT, temperatures were varied over the range of -100°C to +85°C, with vibration parameters of 0 to 50g rms (Figure 1).
Special fixtures were made to hold the assemblies at the exact same angle and under the exact conditions they would experience when installed in a car. The manufacturer was careful to test actual production units to ensure that the test results were an accurate reflection of product quality.
Automakers have been champions of sophisticated quality testing for years. When they saw the test setup and the test results from this lamp assembly manufacturer, the automakers were so impressed that they made the manufacturer a prime vendor for these assemblies and started requiring HALT from all their vendors.
Power Supply A manufacturer of custom power supplies used in telecom switching systems wanted to ensure reliability in the field, so the company contacted Trace Labs for HALT to verify and refine the basic design. After several iterations, the basic design was made reliable. The power supplies were HALT tested over the temperature range of -50°C to +130°C, with vibration levels ranging from 0 to 10g rms.
Next, the manufacturer developed the handmade units into production designs. We recommended the production units be HALT tested, but this recommendation was declined.
Unfortunately, when the first production units were placed in service, there were many failures. Eventually, some production units were brought into the lab, and a cursory examination revealed that the units had smaller heat sinks, the chassis were made of thinner metal, and the amount of structural bracing had been reduced compared to the original engineering design that had been subjected to HALT.
It turned out that in developing the design for production, the power supply manufacturer reacted to price pressure from its customer, reduced the cost of various aspects of the production design, and had inadvertently compromised the high reliability of the original design.
Now facing a serious field-failure problem, the manufacturer submitted actual production units for HALT. After five iterations, the design of the production units had been refined to provide good field reliability. Ironically, the cost of the redesigned production units was only 2% more than the amount specified in the original contract—a cost the customer was willing to pay.
However, damage had been done to the power supply vendor’s relationship with the customer. The customer next required 100% HASS testing of all power supplies from this manufacturer, and the manufacturer was not invited to submit quotes on subsequent RFQs. The entire problem could have been avoided if the manufacturer had been willing to spend the upfront costs for HALT on the original production units.
Fortunately, this story does have a happy ending. After three years of producing reliable power supplies, proven through HASS testing as well as successful field experience, the manufacturer once again is regarded as a primary vendor.
ConclusionClassic vibration and temperature testing definitely have helped improve product quality over the years, but today’s very high standards for product quality are requiring tests better able to reduce, or even eliminate, field failures.
HALT provides a controlled, repeatable method of determining product quality under conditions comparable to field operating conditions and is critical for proving the basic design of a product. HASS testing is a quick, effective screening process that can be used to ensure production units continue to meet quality standards.
While it is true that HALT and HASS testing can add to the short-term manufacturing cost of a product, the increment is surprisingly small in most cases. In the long run, the cost of the testing is much less than the cost of field failures or the loss of business due to reliability problems. 收起阅读 »
by William Lagattolla, Trace Laboratories-Central
HALT and HASS are starting to supplant traditional vibration and thermal testing to meet today’s quality targets.
For decades, product quality has been determined through environmental testing such as vibration, thermal cycling, mechanical shock, and thermal shock. More recently, there has been a significant trend in the marketplace to improve product quality even further.
The Need for Increased Product QualityOne of the most pervasive trends across a wide range of the consumer, industrial, and military markets is the need for increased product quality. In consumer markets, a high rate of product failure can result in the manufacturer’s loss of credibility with an attendant loss of sales, from which it can take years to recover. In industrial markets, a high failure rate can result in expensive field service calls or—potentially worse—significant downtime. In military markets, product failures can translate in the loss of lives.
Although the need for quality is increasing, certain developments are making it more difficult to maintain existing quality levels. The most challenging development has been the increased use of manufacturing subcontractors. The manufacturer whose name goes on a product is likely to be relying on an outside resource, a subcontractor, over which the manufacturer does not have direct control.
This subcontractor is relying on a number of vendors, further weakening the control that the manufacturer has on product quality. Should a product fail, the customer will blame the manufacturer—the one responsible for its quality level.
Another challenge to maintaining quality is a continually decreasing number of engineers with comprehensive QA/QC backgrounds at these manufacturing companies. Many of the highly experienced QA/QC engineers are retiring or being replaced by younger engineers who are far less experienced.
Traditional Vibration and Temperature TestingTraditional vibration and temperature testing has played an important role in the genesis of today’s reliable and sophisticated electronic and electromechanical products. The core philosophy of this testing method is to define a set of specifications, usually minimum or maximum temperatures and vibration levels, and conduct the tests by changing only one variable at a time. Vibration testing is performed one axis at a time. If the device still is functional after being tested according to the test specs, it is considered to have passed.
A passing result is a positive outcome. However, a pass result does not help identify the weakest link in the product. In other words, the traditional test cannot help the engineer make the product any more robust.
Furthermore, with the one-at-a-time change in environmental variables and the one-dimension vibration testing, the test specs are not similar to real-world operating environments. As a result, this kind of testing does not provide an accurate indication of how the product might perform in the field.
This critical look at traditional environmental testing is not intended to be a blanket condemnation of that process. After all, this kind of testing has played a key role in the evolution of today’s highly reliable products. Instead, this examination of certain weaknesses in classical environmental testing can be helpful in understanding how new testing methods, in particular HALT and HASS testing, can lead to even greater levels of product quality and reliability.
The Strengths of HALT and HASS Highly Accelerated Life Testing HALT exposes the product to a step-by-step cycling of environmental variables such as temperature, shock, and vibration. It involves simultaneous vibration testing in all three axes using a random mix of frequencies. Finally, HALT can include combinations of multiple environmental variables; for example, temperature cycling plus vibration testing.
Unlike conventional testing, the goal of HALT is to break the product. When the product fails, the weakest link is identified so engineers know exactly what needs to be done to improve product quality.
After a product has failed, weak components are upgraded or reinforced. The revised product then is subjected to another round of HALT, with the range of temperature, vibration, or shock further increased so the product fails again. This identifies the next weakest link.
Figure 1. Headlights, Front, On
By going through several iterations like this, the product can be made quite robust. With this informed approach, only the weak spots are identified for improvement. This type of testing provides so much information about the construction and performance of a product that it can be quite helpful for newer engineers assigned to a product with which they are not completely familiar.
HALT must be performed during the design phase of a product to make sure the basic design is reliable. But it is important to note that the units being tested are likely to be handmade engineering prototypes. At Trace, we have found that HALT also should be performed on actual production units to ensure that the transition from engineering design to production has not resulted in a loss of product quality or robustness.
Some engineers may consider this approach as scientifically reasonable but financially unrealistic. However, our customers have repeatedly found that the cost of HALT is much less than the cost of field failures, service calls, blanket recalls, and loss of credibility or business due to poor product quality. One of our clients even includes HALT as a line item on its bill of materials to make sure this testing is included in the product cost right from the beginning.
Highly Accelerated Stress Screening HASS, an abbreviated form of HALT, is an ongoing screening test performed on regular production units. Here, the idea is not to damage the product but rather to verify that actual production units continue to operate properly when subjected to the cycling of environmental variables used during the HASS test. The limits used in HASS testing are based on a skilled interpretation of the HALT parameters but do not exceed a product’s operating limits.
The importance of HASS testing can be appreciated when you consider today’s typical manufacturing scenario. Circuit boards are purchased from a vendor who uses materials purchased from other vendors. Components and subassemblies are obtained from manufacturers all over the world.
Often, the final assembly of the product is performed by a subcontractor. This means that the quality of the final product is a function of the quality or lack thereof of all the components, materials, and processes that are a part of that final product. These components, materials, and processes can and do change over time, affecting the quality and reliability of the final product. The best way to ensure that production units continue to meet reliability objectives is through HASS testing.
Case HistoriesThe benefits of HALT/HASS testing can be seen in two case histories.
Automotive Lamp Assembly A manufacturer of automotive lamp assemblies (headlight, brake light, and third brake light units) provides an example of the benefits of using HALT/HASS throughout the development of a new product.
An engineer at this company decided to submit a production sample for an abbreviated suite of HALT. The unit failed, and it was redesigned. When submitted for a retest, a full HALT was performed, with the power to the bulbs in the assemblies cycled on and off during the testing process. During HALT, temperatures were varied over the range of -100°C to +85°C, with vibration parameters of 0 to 50g rms (Figure 1).
Special fixtures were made to hold the assemblies at the exact same angle and under the exact conditions they would experience when installed in a car. The manufacturer was careful to test actual production units to ensure that the test results were an accurate reflection of product quality.
Automakers have been champions of sophisticated quality testing for years. When they saw the test setup and the test results from this lamp assembly manufacturer, the automakers were so impressed that they made the manufacturer a prime vendor for these assemblies and started requiring HALT from all their vendors.
Power Supply A manufacturer of custom power supplies used in telecom switching systems wanted to ensure reliability in the field, so the company contacted Trace Labs for HALT to verify and refine the basic design. After several iterations, the basic design was made reliable. The power supplies were HALT tested over the temperature range of -50°C to +130°C, with vibration levels ranging from 0 to 10g rms.
Next, the manufacturer developed the handmade units into production designs. We recommended the production units be HALT tested, but this recommendation was declined.
Unfortunately, when the first production units were placed in service, there were many failures. Eventually, some production units were brought into the lab, and a cursory examination revealed that the units had smaller heat sinks, the chassis were made of thinner metal, and the amount of structural bracing had been reduced compared to the original engineering design that had been subjected to HALT.
It turned out that in developing the design for production, the power supply manufacturer reacted to price pressure from its customer, reduced the cost of various aspects of the production design, and had inadvertently compromised the high reliability of the original design.
Now facing a serious field-failure problem, the manufacturer submitted actual production units for HALT. After five iterations, the design of the production units had been refined to provide good field reliability. Ironically, the cost of the redesigned production units was only 2% more than the amount specified in the original contract—a cost the customer was willing to pay.
However, damage had been done to the power supply vendor’s relationship with the customer. The customer next required 100% HASS testing of all power supplies from this manufacturer, and the manufacturer was not invited to submit quotes on subsequent RFQs. The entire problem could have been avoided if the manufacturer had been willing to spend the upfront costs for HALT on the original production units.
Fortunately, this story does have a happy ending. After three years of producing reliable power supplies, proven through HASS testing as well as successful field experience, the manufacturer once again is regarded as a primary vendor.
ConclusionClassic vibration and temperature testing definitely have helped improve product quality over the years, but today’s very high standards for product quality are requiring tests better able to reduce, or even eliminate, field failures.
HALT provides a controlled, repeatable method of determining product quality under conditions comparable to field operating conditions and is critical for proving the basic design of a product. HASS testing is a quick, effective screening process that can be used to ensure production units continue to meet quality standards.
While it is true that HALT and HASS testing can add to the short-term manufacturing cost of a product, the increment is surprisingly small in most cases. In the long run, the cost of the testing is much less than the cost of field failures or the loss of business due to reliability problems. 收起阅读 »
[转帖]Managing Failure Analysis
To be a good failure analyst one must also be a good manager. After all, failure analysis or problem solving is more than just brainstorming a solution to an identified problem. Successful analysis can only be achieved when a structured technique that uncovers the facts of the incident being investigated is used and adhered to at every step of the analysis process. As the manager or Principal Analyst for the failure your management skills will not only be put to the test but will be an integral part of the investigation.
Managing the Failure Definition
The first step in the analysis effort would be to clearly define what constitutes a failure. This may sound simple but I can assure you that it is not. Ask anyone and they will all tell you that they know what their failures are. Now explore a little deeper and you will find that they all know what’s breaking down but they care for a different reason. The fact is we all tend to care for a different reason and there are many factors that will directly affect the reason why we care thereby changing our failure definition. For example, consider a plant whose production levels are low and maintenance, downtime, and parts cost high. In this example the Operations Manager considers the low production levels to be the failure, while the Maintenance Manager considers the Mean Time Between Failure (MTBF) and Mean Time To Repair (MTTR) to be the failure. The Plant Manager considers the low bottom line to be the failure while the maintenance staff cares about the number of times that they must repair the equipment. What we have here is clearly a failure but a different failure definition at every level of the organization. Now add to the thought process by considering another factor that affects how we feel about the failure; i.e., the business environment. Low production levels in a non-sold out condition are not as big a problem as high maintenance cost. Conversely, in a sold out condition maintenance cost are not nearly as important as production levels and downtime. The job of the Principal Analyst is to recognize these factors and apply the necessary focusing tools (Impact – Effort Matrix, Decision by Pairs, Force Field Analysis, Failure Modes and Effects Analysis, etc.) to uncover those failures that represent the greatest amount of potential return or unrealized opportunity based on the right definition of failure for the facility.
Managing the Scope of the Analysis
Don’t bite off more than you can chew! The size and scope of the analysis you intend to tackle should not exceed the available resources for the analysis effort. Therefore, the scope of the analysis should be directly proportional to the resources available to conduct the analysis. Always remember that the bigger the scope the bigger the analysis. Process or system related-analyses tend to be the largest in size because of the many variables associated with the modes of failure. Whereas, single components tend to be the smallest due to the relatively few variables associated with a single item. The key is to determine what is really important and what you can reasonably manage. This is easily done if you have already determined the amount of opportunity by performing a Failure Modes and Effects Analysis (FMEA) and know the available resources on hand. Here the scope and the opportunity have already been identified. The goal is to eliminate failure and recover opportunity as quickly as possible by going after the biggest “bang for the buck”. In essence, limit the scope of the analysis at an early stage and get a payback as soon as possible. By doing so it becomes easier to dedicate resources for those analyses that are larger in scope and therefore more time consuming to resolve. Although the analysis with the largest scope may have the greatest potential return it is not always the best analysis to go after first. Managing the scope of the analysis is important when you realize that an incomplete effort is worse than a smaller completed problem resolution. In effect, don’t go after world hunger on your first attempt, although an attractive opportunity, it may be a bit more than you can chew with the available resources at hand.
Managing the Failure Data
One of the most challenging aspects of any failure analysis effort is the management of the data necessary to solve the failure. Failure data provides the key that unlocks the mystery when problem solving. What the data tells you are the facts of the failure. Therefore, the management of failure data is vital to the successful outcome of the analysis.
It is not enough to merely set down and identify the data necessary to find the root cause(s) of failure, but to develop and implement a data collection strategy that ensures that the integrity of the failure data is maintained. Not just identifying the person responsible for data collection, but how they are going to obtain the data and what they are going to do with it once it has been collected. Think of it like a police investigation. The forensic strategy is handled in such a manner as to ensure that all the evidence is collected and stored until needed. Pictures are taken, evidence is bagged and tagged for use in the investigation and in court, all the witnesses are interviewed and their statements recorded, locations and times are noted to determine all the positional information, etc. The collection of failure data should receive exactly the same type of stringent detail as the evidence collected at any crime scene.
Managing the Analysis Team
Managing the analysis team consists of more than just managing the people. This includes making sure you have the right team, not only in size but also in makeup. A common mistake made by most organizations is to form an ad hoc committee comprised entirely of subject matter experts (led by the most senior or experienced of the experts) to solve the egregious effects of the incident being investigated. The results tend to be pre-tailored solutions for the specific problem based on the expertise of the team. Make no mistake about it; although subject matter experts are absolutely necessary to solve the failure, to make sure all the possibilities are covered individuals that have little or no knowledge of the failure being investigated should compliment them. Non subject matter experts bring the element of questioning to the table. When they ask a question such as “can this happen or occur?” the subject matter experts then must think about the possibility and answer yes or no to the question. The problem with a team comprised solely of subject matter experts is that they often overlook possibilities due to their interment knowledge of the failure. They believe that they already know why the failure is occurring and want to follow that path to uncover root cause(s). Non subject matter experts want to explore all the possibilities because they have no pre-conceived notions.
It is not necessary for the Principal Analyst to be a subject matter expert in the failure. Quite to the contrary as this is often a detriment to the analysis effort because he also will have developed pre-conceived notions as to why the failure is occurring. What the Principal Analyst needs to be an expert in is the science of Problem Solving or Failure Analysis.
The perfect analysis team is usually made up of 5 to 7 cross-functional people who have a common goal and commitment to solving the failure under investigation. Proper management of the team involves not only the selection of the right people, but also the correct assignment of individuals involved. Each must have clearly defined roles and duties based on their unique strengths and weaknesses. For example, every team needs a critic to keep the team honest. Fortunately every organization seems to have an abundance of people with this characteristic. The job of the Principal Analyst is to make sure this individual is critical but not to the point of disruption.
Managing the Analysis Effort
The first step in managing the actual analysis effort is to determine what you expect from the final outcome. This can be easily accomplished by developing a charter that clearly delineates the terminal objective of the analysis. This is further enhanced through the development of critical success factors that will tell you whether or not the terminal objective has been obtained. For example, if you are solving a problem involving an administrative issue such as slow invoice processing your charter could be something like the following:
“Uncover the root causes of the recurring invoice processing problems. This includes identifying deficiencies in or lack of management systems. Appropriate recommendations for root causes will be communicated to management for rapid resolution.”
Examples of possible critical success factors could include the following:
Reduce invoice processing turnaround time from two weeks to one week.
No lost invoices.
No incorrect invoices.
Maintain an invoice tracking system that is 100% accurate.
By developing a good charter and critical success factors for the analysis the team has a common goal and focusing mechanism to keep them on track and stop them from straying off on tangents.When failure analysis begins the goal of the Principal Analyst is to make sure that the logic is sound and that all hypotheses have been proven or disproved. Here it is good to understand that the Principal Analyst manages the analysis and is responsible for its successful outcome. He owns the process and the team owns the failure. Keeping this in mind, if the team can prove it to the Principal Analyst, then he can subsequently prove it to management.
Often during the logic tree development portion of the analysis team members will disagree and some conflict will result. This conflict is not necessarily a bad thing. With conflict comes valuable discussion. As long as the conversation is pertinent to the analysis and provides benefit it should be allowed to continue. The trick is to keep this conflict from becoming confrontational and therefore detrimental to the analysis. One management technique used to maintain control during the analysis is for the Principal Analyst to ask questions that will help to clarify points. Questioning not only minimizes the amount of conflict between the team members it also keeps the team focused. This is especially important for those team members who are not subject matter experts in the failure under investigation.
Managing the Final Report
The final report is the alpha and omega of the failure. It represents the culmination of the analysis effort and the beginning of failure elimination. Remember that the goal of any failure analysis should be the elimination of identified causes. The final report is the tool used to obtain the resources necessary to implement solutions to the uncovered root cause(s) of the failure thereby achieving that goal. In essence, the final report can be thought of as a sales tool and should be developed with that in mind. At a minimum the final report should not only provide solutions with expected returns on investments but also identify how the failure occurred in the first place. To accomplish this an event summary, a descrīption of the failure mechanism and list of recommendations should be included in the report.
The event summary is nothing more than a brief descrīption of how the failure was first noticed, how long it has been going on and the method(s) used to isolate or mitigate the consequences of the failure.
The failure mechanism can be thought of as a summary of the root cause(s) that led to failure occurrence. It chronologically characterizes the things that must occur in order for the failure to manifest itself.
The list of recommendations should not only explain what, when and who is going to be responsible for implementation, it should also include a detailed cost benefit-ratio associated with each recommendation.
Summary
The success or failure of your problem solving efforts often depends on the management strategies used to conduct the analysis. A sound management strategy must be devised and put into place for every step in the Root Cause Analysis process in order for the analysis to be both effective and efficient.
Obviously collecting and maintaining the paperwork associated with the failure investigation can be a daunting task. For this reason the use of software that is designed specifically for this purpose is extremely beneficial and is highly recommended. Although there are several packages on the market RCI’s PROACT® is by far the best and most complete of the software packages designed for this purpose.
RCI’s PROACT® software not only makes this difficult job seem almost effort free, but also provides a mechanism that allows easy and ready access to all the pertinent data associated with the analysis, including the structured logic tree. Failure data is maintained in a database unique to the failure and can be sorted by type, person responsible for its collection, date required, etc.
Of equal importance to the analysis is keeping track of the verification techniques used for the hypotheses pertaining to how the failure occurred. PROACT® automatically requires the completion of a verification log once a hypothesis is identified. This log can then be retrieved at any time to determine how to proceed with the analysis. In addition, PROACT® has many features that help the analyst do his job. It will help you to determine what your critical success factors are for the analysis, write a report on the analysis, communicate your findings to management, and track the results of your analysis efforts, just to name a few.
As a failure analyst I find that PROACT® is an invaluable tool for doing my job. My analysis efforts are not only easily managed, but are much quicker than ever before. 收起阅读 »
Managing the Failure Definition
The first step in the analysis effort would be to clearly define what constitutes a failure. This may sound simple but I can assure you that it is not. Ask anyone and they will all tell you that they know what their failures are. Now explore a little deeper and you will find that they all know what’s breaking down but they care for a different reason. The fact is we all tend to care for a different reason and there are many factors that will directly affect the reason why we care thereby changing our failure definition. For example, consider a plant whose production levels are low and maintenance, downtime, and parts cost high. In this example the Operations Manager considers the low production levels to be the failure, while the Maintenance Manager considers the Mean Time Between Failure (MTBF) and Mean Time To Repair (MTTR) to be the failure. The Plant Manager considers the low bottom line to be the failure while the maintenance staff cares about the number of times that they must repair the equipment. What we have here is clearly a failure but a different failure definition at every level of the organization. Now add to the thought process by considering another factor that affects how we feel about the failure; i.e., the business environment. Low production levels in a non-sold out condition are not as big a problem as high maintenance cost. Conversely, in a sold out condition maintenance cost are not nearly as important as production levels and downtime. The job of the Principal Analyst is to recognize these factors and apply the necessary focusing tools (Impact – Effort Matrix, Decision by Pairs, Force Field Analysis, Failure Modes and Effects Analysis, etc.) to uncover those failures that represent the greatest amount of potential return or unrealized opportunity based on the right definition of failure for the facility.
Managing the Scope of the Analysis
Don’t bite off more than you can chew! The size and scope of the analysis you intend to tackle should not exceed the available resources for the analysis effort. Therefore, the scope of the analysis should be directly proportional to the resources available to conduct the analysis. Always remember that the bigger the scope the bigger the analysis. Process or system related-analyses tend to be the largest in size because of the many variables associated with the modes of failure. Whereas, single components tend to be the smallest due to the relatively few variables associated with a single item. The key is to determine what is really important and what you can reasonably manage. This is easily done if you have already determined the amount of opportunity by performing a Failure Modes and Effects Analysis (FMEA) and know the available resources on hand. Here the scope and the opportunity have already been identified. The goal is to eliminate failure and recover opportunity as quickly as possible by going after the biggest “bang for the buck”. In essence, limit the scope of the analysis at an early stage and get a payback as soon as possible. By doing so it becomes easier to dedicate resources for those analyses that are larger in scope and therefore more time consuming to resolve. Although the analysis with the largest scope may have the greatest potential return it is not always the best analysis to go after first. Managing the scope of the analysis is important when you realize that an incomplete effort is worse than a smaller completed problem resolution. In effect, don’t go after world hunger on your first attempt, although an attractive opportunity, it may be a bit more than you can chew with the available resources at hand.
Managing the Failure Data
One of the most challenging aspects of any failure analysis effort is the management of the data necessary to solve the failure. Failure data provides the key that unlocks the mystery when problem solving. What the data tells you are the facts of the failure. Therefore, the management of failure data is vital to the successful outcome of the analysis.
It is not enough to merely set down and identify the data necessary to find the root cause(s) of failure, but to develop and implement a data collection strategy that ensures that the integrity of the failure data is maintained. Not just identifying the person responsible for data collection, but how they are going to obtain the data and what they are going to do with it once it has been collected. Think of it like a police investigation. The forensic strategy is handled in such a manner as to ensure that all the evidence is collected and stored until needed. Pictures are taken, evidence is bagged and tagged for use in the investigation and in court, all the witnesses are interviewed and their statements recorded, locations and times are noted to determine all the positional information, etc. The collection of failure data should receive exactly the same type of stringent detail as the evidence collected at any crime scene.
Managing the Analysis Team
Managing the analysis team consists of more than just managing the people. This includes making sure you have the right team, not only in size but also in makeup. A common mistake made by most organizations is to form an ad hoc committee comprised entirely of subject matter experts (led by the most senior or experienced of the experts) to solve the egregious effects of the incident being investigated. The results tend to be pre-tailored solutions for the specific problem based on the expertise of the team. Make no mistake about it; although subject matter experts are absolutely necessary to solve the failure, to make sure all the possibilities are covered individuals that have little or no knowledge of the failure being investigated should compliment them. Non subject matter experts bring the element of questioning to the table. When they ask a question such as “can this happen or occur?” the subject matter experts then must think about the possibility and answer yes or no to the question. The problem with a team comprised solely of subject matter experts is that they often overlook possibilities due to their interment knowledge of the failure. They believe that they already know why the failure is occurring and want to follow that path to uncover root cause(s). Non subject matter experts want to explore all the possibilities because they have no pre-conceived notions.
It is not necessary for the Principal Analyst to be a subject matter expert in the failure. Quite to the contrary as this is often a detriment to the analysis effort because he also will have developed pre-conceived notions as to why the failure is occurring. What the Principal Analyst needs to be an expert in is the science of Problem Solving or Failure Analysis.
The perfect analysis team is usually made up of 5 to 7 cross-functional people who have a common goal and commitment to solving the failure under investigation. Proper management of the team involves not only the selection of the right people, but also the correct assignment of individuals involved. Each must have clearly defined roles and duties based on their unique strengths and weaknesses. For example, every team needs a critic to keep the team honest. Fortunately every organization seems to have an abundance of people with this characteristic. The job of the Principal Analyst is to make sure this individual is critical but not to the point of disruption.
Managing the Analysis Effort
The first step in managing the actual analysis effort is to determine what you expect from the final outcome. This can be easily accomplished by developing a charter that clearly delineates the terminal objective of the analysis. This is further enhanced through the development of critical success factors that will tell you whether or not the terminal objective has been obtained. For example, if you are solving a problem involving an administrative issue such as slow invoice processing your charter could be something like the following:
“Uncover the root causes of the recurring invoice processing problems. This includes identifying deficiencies in or lack of management systems. Appropriate recommendations for root causes will be communicated to management for rapid resolution.”
Examples of possible critical success factors could include the following:
Reduce invoice processing turnaround time from two weeks to one week.
No lost invoices.
No incorrect invoices.
Maintain an invoice tracking system that is 100% accurate.
By developing a good charter and critical success factors for the analysis the team has a common goal and focusing mechanism to keep them on track and stop them from straying off on tangents.When failure analysis begins the goal of the Principal Analyst is to make sure that the logic is sound and that all hypotheses have been proven or disproved. Here it is good to understand that the Principal Analyst manages the analysis and is responsible for its successful outcome. He owns the process and the team owns the failure. Keeping this in mind, if the team can prove it to the Principal Analyst, then he can subsequently prove it to management.
Often during the logic tree development portion of the analysis team members will disagree and some conflict will result. This conflict is not necessarily a bad thing. With conflict comes valuable discussion. As long as the conversation is pertinent to the analysis and provides benefit it should be allowed to continue. The trick is to keep this conflict from becoming confrontational and therefore detrimental to the analysis. One management technique used to maintain control during the analysis is for the Principal Analyst to ask questions that will help to clarify points. Questioning not only minimizes the amount of conflict between the team members it also keeps the team focused. This is especially important for those team members who are not subject matter experts in the failure under investigation.
Managing the Final Report
The final report is the alpha and omega of the failure. It represents the culmination of the analysis effort and the beginning of failure elimination. Remember that the goal of any failure analysis should be the elimination of identified causes. The final report is the tool used to obtain the resources necessary to implement solutions to the uncovered root cause(s) of the failure thereby achieving that goal. In essence, the final report can be thought of as a sales tool and should be developed with that in mind. At a minimum the final report should not only provide solutions with expected returns on investments but also identify how the failure occurred in the first place. To accomplish this an event summary, a descrīption of the failure mechanism and list of recommendations should be included in the report.
The event summary is nothing more than a brief descrīption of how the failure was first noticed, how long it has been going on and the method(s) used to isolate or mitigate the consequences of the failure.
The failure mechanism can be thought of as a summary of the root cause(s) that led to failure occurrence. It chronologically characterizes the things that must occur in order for the failure to manifest itself.
The list of recommendations should not only explain what, when and who is going to be responsible for implementation, it should also include a detailed cost benefit-ratio associated with each recommendation.
Summary
The success or failure of your problem solving efforts often depends on the management strategies used to conduct the analysis. A sound management strategy must be devised and put into place for every step in the Root Cause Analysis process in order for the analysis to be both effective and efficient.
Obviously collecting and maintaining the paperwork associated with the failure investigation can be a daunting task. For this reason the use of software that is designed specifically for this purpose is extremely beneficial and is highly recommended. Although there are several packages on the market RCI’s PROACT® is by far the best and most complete of the software packages designed for this purpose.
RCI’s PROACT® software not only makes this difficult job seem almost effort free, but also provides a mechanism that allows easy and ready access to all the pertinent data associated with the analysis, including the structured logic tree. Failure data is maintained in a database unique to the failure and can be sorted by type, person responsible for its collection, date required, etc.
Of equal importance to the analysis is keeping track of the verification techniques used for the hypotheses pertaining to how the failure occurred. PROACT® automatically requires the completion of a verification log once a hypothesis is identified. This log can then be retrieved at any time to determine how to proceed with the analysis. In addition, PROACT® has many features that help the analyst do his job. It will help you to determine what your critical success factors are for the analysis, write a report on the analysis, communicate your findings to management, and track the results of your analysis efforts, just to name a few.
As a failure analyst I find that PROACT® is an invaluable tool for doing my job. My analysis efforts are not only easily managed, but are much quicker than ever before. 收起阅读 »
改进论坛
是一个不错的改进论坛,里面介绍的内容非常多。链接: 点击访问
要进行监督审核了
我们厂里在今年12月份要进行一体化管理体系的年度监督审核了,去年的不符合项①MSA、②全尺寸检验计划开了不符合,今年要注意了,否则就是重覆出现,后果严重呀!!
悠閑的一天
天氣有點變涼了,坐在狹小的辦公室里,感覺挺溫暖的。工作壓力不大,事情也不多。一個人待在網上看小說。嘻,悠然自得!
也说现在可靠性的工作内容
搜一搜更多此类问题
可靠性工程师的网上家园
我不知道大家现在工作主要是些什么了,不过就我个人了解下来大致有下面一些吧。
通信企业或者配套企业一般会多的比较多,比较注重降额设计,事实我个人认为这是他们可靠性的根本了,其他还会特别注重热设计,当然了这和降额设计是相辅相成的。其它的在国内就做的比较少,后者说做的很不成功了,例如可靠性验证实验,FMEA,可靠性分配,可靠性增长,可靠性预计等。
不过他们的降额设计标准基本上是参照美军标和国军标,能够有自己的标准的很少,当然了拥有一个自己降额标准成本很高,不是所以企业都可承受的,而且这还需要很长的实际积累。当然了所以的标准都是源于美军标了,而大家自己的标准是根据后来实际的行业应用以及电路设计才生成了自己的标准,这过程中需要对所以的失效进行分析并做出实际记录,另外还需要厂商的配合,缺一不可。所以说需要很高的成本和时间的付出。一般他们的热设计开展的还行,主要原因是热了设计就不行,实实在在看到的东西啊,另外他们还很关注EMC,主要原因是为了获得认证从而进入国际市场,如果产品没有进入国际市场的计划,那么他们卖给国内的产品根本就不会关注,典型的落后观念。
再说说他们其它工作的开展情况吧。如果公司有比较好的可靠性工程师,那么会在全新的关键项目做FMEA,但是这对于可靠性工程师的要求很高,除非特别需求一般不会进行的,更何况现在很少有公司能够有这样的可靠性工程师的了。对于可靠性分配是稍微好一点,一般是在后期认识到后才有此结果的。一般是某些产品的可靠性预计出现问题很难改变后才知道可靠性分配的重要性,这时候才知道要是当初进行可靠性分配就不至于如此下场啊。我经历过这样的项目,那个项目我是后期介入的,发现没有进行可靠性分配,对电源模块的可靠性指标要求没有概念,但是厂商已经根据要求做到正样了,没办法啊,只有改其他地方。当时就了解要因为没有进行可靠性分配所以不知道对电压具体的可靠性指标要求,提的低了,这样对其他部件的要求就高了,后来只要更换更好的元器件从而提高产品系统可靠性,虽然最后可以了,但是成本增加了。也许这样的情况很多人都会碰到吧。
可靠性增长好像大家都没有多少概念我就不多说了。
可靠性预计一般都是必须的了,呵呵。因为电信产品都是要求5年可靠性指标,没有数据那行,所以就算咯。但是结果无用啦。很多国际大企业也比较认这个,我不知道何时才能够改变这样的状况,希望早一点吧。具体预计我已经有一个帖子专门提过了,这里就先这么的吧。
下面就是消费电子产品了,现在消费电子产品众多,而且时时面临成本压力,大家都希望有好有便宜的东西。所以也就造成大家主次颠倒,后者说大家没有心思放在长远计划上,更多的只是关注产品的可靠性试验,希望通过试验来把关所以的产品质量。
现在电子产品的生命周期越来越短,所以设计压力和试验压力都很大。于是乎大家都比较推崇HALT这个可以快速发现产品设计问题的测试方法。但是它所能够节约的时间还是有限的,在下面大家都会面临更大的挑战,到时候HALT就不能够帮我们了。
现在大家能够比拼的只是测试的水准了,大公司的测试规范比较全面,而且对测试的来源等都很了解,有很了解测试的资深工程师。而且现在还在HALT方面又已经领先很多,暂时小的厂商还无法在质量上与他们竞争。我想国内消费电子的成长需要更多的合格的测试工程。
不过现在很多做通信行业的把可靠性测试认为是可靠性的边缘,没有什么技术含量,我想这就错了。可靠性测试现在对公司的贡献是远大于其它的,而且想做好也不是那么容易的,我在这里为可靠性测试工程师平个反吧。也正式这个原因,所以可靠性工程师在国内现在还是以测试工程师为主,这也就是大家很多时候都是在谈测试的原因了,行业的反展是需要过程的,相信大家做为先驱会有成长的时候的,我个人也是这样一路走来的。
还有就是做半导体的,这个行业投资巨大,待遇很好,相信是很多认的梦想了。不过我这只谈可靠性了,呵呵。这个行业的半导体设计属于高精尖,设计复杂度特别高。大家可能都会在想那他们的可靠性应该进行的很好吧。也许大家会失望的,他们更多的还是倚靠测试来保障的,很难想象吧,但是就是如此的。他们的测试时间特别长,覆盖特别广,事实上这也是芯片很少出问题的所在吧,不过对于具体的一些我了解的不多,也只是说到这了。需要补充的是,在芯片这一块仿真的作用特别重要,没有这个可靠性几乎没有可能。
最后就是机械行业了。这没有机械背景的是不会怎么了解的。其实就一句话安全系数,呵呵。太简单了吧,不要急咯。安全系数可以认为设计一个安全裕量,这样设计的可靠性是比设计的要高,所以保证以后使用的可靠性。不过这更多的需要经验了,没办法啊,随要大家说机械工程师是越老越吃香呢,呵呵。也不是啦,主要是没有成熟的理论支持所有产品,所以更多的是需要工程师自己的经验了,积累这时候显的尤其重要。如果一个公司没有好的记录程序,那么优秀的工程师走了后就是全部从来了,这非常危险,是这类公司需要特别关注的。我们需要把前人的经验积累下来,我们需要更多的制度上的保障而不是人。和上面芯片行业一样仿真很重要,它会最终成为主流,但是现在应用还有待提高。
好了,基本上都说完了。这不是我理想的可靠性工程了。我坚持可靠性需要量化,否则没有前途。
量化可靠性包括,市场调查,产品规划,专利竞争,可靠性设计,可靠性增长,可靠性测试,可靠性保障,全寿命周期成本。这些我在后续都会专门介绍了。今天先到这咯。
天气太冷了,又没有空调,我的手都是冰冷的,先可怜一下我的手先。结束! 收起阅读 »
谈谈当今流行的HALT<转>
搜一搜更多此类问题
<转>可靠性工程师的网上家园
现在可能可靠性工程师没有听说过HALT已经是很落后的感觉了,至少我是这样想的了。而且现在大家也都对HALT很景仰,非常希望自己具备这样一门特殊的绝学来让自己的可靠性技术获得一个突飞猛进的发展。
HALT同样还是起源于国外,现在在国外的发展已经非常成熟了,确实是一个发现问题的好的方法,可以这么说这个理论让我们节省的宝贵的时间和成本,让我们有机会在最短的时间来发现更多的问题,真的很好啊,呵呵。
HALT的理论是基于增加应力水平将加大产品的失效,但是有一个前提是产品在增加一定应力是必须失效模式没有变化,否则HALT没有任何意义。现在对于失效模式有没有变化基本上关注的人很少,一般都是默认没有变化,我是没有看到具体这方面的论文,所以此方面没有什么发言权了,当然了国外的书因为获得比较困难看不到也比较正常了。
当我们在HALT测试中同时加入多个应力,我所看到的资料介绍是比普通的单个应力可以多发现更多的失效并大大提高效率。但这个事实来将确实非常只好了,我个人也比较推崇HALT的方法,并且也在学习的路上前进。
但是我对HALT是有疑问的,上面提高了失效模式是否变化的问题,这个我们如何确认,我是没有看到一个比较好的理论指导我们去处理(当然也有可能是我个人孤陋寡闻了),如果没有那么我们的测试温度上限如何确定,在发现问题后如何处理,当然了分析是不可避免的,关键是我们在有了解决方案后如何处理的问题,一定需要加入解决方案吗,如果时间不允许我们是否一定要处理,如果成本增加太多我们是否需要处理?事实上这两个问题真的没办法回答,但是如果不能够回答就只能说我们可靠性工程还只是停留在比较初级的阶段,还无法为公司提供更多的支持创造更多的价值,我们也就没有理由去要求获得更好的地位和更好的福利待遇。
另外HALT具体的测试内容需要对产品特别熟悉的工程师来参与制定,这在国外比较普遍,研发工程师也很支持,但是在国内我是没有看到这样的现象,是我们的可靠性工程师对产品很熟悉的吗,不,他们对产品的熟悉程度还很差。既然是这样的答案我们就只能够说我们发现的问题是否真的是我们需要的,或者我们真的把问题找出来了吗,我是不相信了。
HALT设备一个都是上百万,现在过年买的也不多,相信对设备熟悉的人也很少了,我还见过有些厂商对自己的HALT设备都很不了解就开始做代理销售的,自己都不了解难道要客户去熟悉吗,扯淡。一个畸形的市场行为。事实上HALT设备的很多需要可靠性工程师去理解,特别是振动和温度变化的处理。说到振动就谈谈夹具吧,HALT厂商都会向大家介绍好的夹具如何,但是就是不会去推动客户真的去设计这样的东西,当然了这不能够怪设备厂商了,他们说了这东西的重要性,可惜的是大家都听不进去,我想说各打五十大板吧。设备厂商的那句经典台词就是“一个好的夹具必须保证振动台的振动能量能够尽可能多的传到设备上,最好是100%”,多好的一句话呢,真的是那么简单吗,如果简单大家也就不会出现问题了,事实上如果对结构了解不深入是不可能有好的夹具的。有的时候大家可能会设计出一个夹具去实际测试时发现很好,但是不要高兴太早,真的去测试一下所有的地方看看再说吧,不是说某个和设备锁好的地方振动能量一样就行的。
HALT其实还是属于实验的部分,只是说它把可靠性验证性实验提前了一小段时间,我们还没有真的进入到设计阶段去,这样的结果设计还是有那么多的问题,我们只是提前去找出来罢了,好像是可以把产品的开发周期提前的。但是如果只是如此那么HALT也就不应该有大家现在所认识的那么高的地位了,它所能够对企业有帮助的只是减少设计开发周期,而且它同时还会带来一个负面的影响,那就是很多测试是否真的会在实际现场中出现,出现的几率到低有多大,没有人可以回答,更多的是需要工程师的经验来给出一个答案。
相信大家看到这的时候可能会说HALT不过如此,它和其他的可靠性方法都是可靠性工作的辅助,对,这就是我的想法。
再次重申我的观点,可靠性需要帮公司赚钱,否则就不是好的可靠性。那怎么才可以赚钱呢,把可靠性工作往前推,越往前越能够为公司创造价值。我想很多人会看到一个图了,那是介绍可靠性工作在各个阶段所影响的成本,那就是说明。
好了HALT就说到这吧,下面会更多和大家谈可靠性设计和产品规划,甚至市场调查的部分。我想大家如果看到后面的内容会说可靠性原来也可以这样,我们是可以帮公司赚钱的,我们的工作不会再被动的。如果大家到时候这么想了,就算我所能够给大家带来的理念了吧,希望如此了。
量化可靠性(quantify reliability,measureable reliability)
我想请都admin几个问题,先我谈谈我所认知的HALT实验,我们公司有一台由qualmark公司生产的台风4.0的HALT设备,听说花了三百万人民币才买进来的,这只算设备价钱,由于HALT涉及到液氮给产品快速降温,我们有一个很大的液氮容器,装满需要人民币一万二千块钱,大概只能用不到两个月,如果用来做thermal shcok的话,大概只能用三次,先讲完设备了,再谈谈我对HALT认识,HALT有一个温度应力,振动应力,还有高温加振动应力,和低温加振动应力,还可以加电压应力(电压应力目前我们公司还没有用到,我们量产品的电压,都是在open chamber里面进行)先谈谈温度应力,我常常听有人问怎么知道一个产品的极限值温度,或者说怎么样知道一个产品能承受多少温度,目前我是用温度步进式量测出来的,比如一个产品我从常温以10C/20Min,或者10C/30Min的步进法,从常温到产品shut down,或者can't power,这是针对可以接通电源的产品,假如有些产品没有电源接入部分,那怎么判断呢。(对于这个我没有实际经验,不好说,我想肯定有人有其方法)反之低温也是这样子的,我在很多论坛上面看到有人常常用加速公司去推断那个加速因子,比如从30C升到40C,是20C/Min,还是60C/Min这个我一直都没有搞懂,本人数学不好,所以也没有怎么去了解,因为我们公司是代工厂,有规定为40C/Min,我有想过,40C/Min是不是合理,怎么算,有经验的大侠可以指点我一二,还有在一个温度停留的时间,不知道大家是估算的,还是通过加速公司算出来的,我看过几份测试SPEC,有停留10Min,15Min,20Min,30Min的都有,温度应力有两个必须知道的是加速因子,和温度停留时间,极限值可以通过接通电源的方法来判断。
然后再说振动,halt是六轴自由振动,固定方式,是否有要求,我们公司因为是产品比A4纸大一点,所以用铝条夹具两条固定在table平台上面,成品进行振动反馈回来的数值一定比我们输入的数值小,但是如果是MB却反馈回来的值一定比输入的值大,我们做法也是10G/30Min,一直增加,我们的HALT设备最大可以做到60G,我们的产品往往做到50G,但有一个问题50G是怎么出来的,这个我也没有弄明白。是否也有公式可以计算。
我做过几个产品的HALT试验,出现的问题有零件在振动的时候会掉,有振动的时候不开机,或者会关机,在温度测试也有关机现象。我一直没有看出做过HALT对公司帮助有多大。也许是我们的产品比较成熟,出现问题比较少。
随便写了一些,希望跟admin多讨论HALT这方面的知识。
发表一下个人拙见:
HALT实验现在大家基本都还是停留在理论水平上,其实HALT试验箱激发故障的原理也不是什么新鲜玩意,他只是三综合应力箱的派生体,所不同的是它能够产生6自由度的振动和快速温变,个人认为这仅仅是设备供应商的一个卖点,对于激发故障的优势并没有想象中那么理想。不过HALT试验确实是属于既经济(所花的代价肯定没有Clark说的那么高,具体因公司差异会有所不同,在这里就不好比较了)又快速的一种故障暴露手段,其实暴露故障本身不是最难的地方,最难的是怎样去定位在特殊环境下才能出现的问题,并加于解决,根据我做HALT试验这么长时间的经验,有很多故障虽然暴露出来了,但还是无法定位,能够定位并加于解决的一般达不到50%。其实HALT试验箱本身的应用非常简单,只要买了HALT试验箱之后,相信大家不出2个月就能摸索出个道道,但要累积解决问题的经验没有一定财力、物力和人力的投入是很难总结出一些有实用价值的东西!
当前HALT试验对外开放的试验室不是很多,一般都是单位内部使用,据我了解有北航和QualMark(该公司分别在深圳和无锡各有一个实验室)两家提供对外承接试验,另外电子五所上半年下单准备采购两台,不知道现在有没有到货。北航收费便宜,但箱子很烂,响应测试系统也不完善,QualMark箱子不错但收费很高,报价在1500/小时左右。 收起阅读 »
看到一篇关于spc的文章-节选了一段!
SPC过程控制基准,是源自汽车行业本身质量要求而来的.可以这么说因其行业质量控制成本相对较高,SPC运用实为一种经济的方法.但SPC的运用也是需要投入一定的成本,对其它传统行业,或低值产品行业,或行业本身工艺水平就不太高,甚至客户原本就要求较低的企业来讲,SPC运用好比”杀鸡用牛刀”,可能结果是:质量有改善,但质量成本也高了;也可能质量没改善,不但白花了一笔钱,还导致新的管理问题出现.过程控制在基于3σ,能力要提高到1.00以上的要求, 其能力实实在在提升也并不是件易事,常见到现场抱怨:”工程检查PASS,出荷检查PASS,甚至出货到客户也没有不良投诉,还要改进什么?”,有些企业头脑一热,或纯粹做市场宣传,打肿脸来充胖子的情况,就不多说了.这里想说的是,选择一个管理决定前,应好好的给自己号号脉:是否适合企业本身发展的阶段,人员素质是否跟得上,自己能否长抓不懈,提升竟争力的代价是否划算,现行方法可以满足要求吗,其它方法是否也可行,等等,多扪心自问下,别把传统好的东西盲目丢下,一味地去赶风. 收起阅读 »
终于请假了
天气: 阴雨心情: 平静 终于请假了,开了一个周的病假条剩下了不足3天的时候,请假了,也终于认清了很多东西…… 其实坦言,自己这一年的身体状况不是很好,请了很多的病假,但是,至少自己的工作也算做的还算到位,但是,这次生病的结果让我看到了很多的东西,其实对自己再好的人也不过是利益的关系而已,当自己对他来言,利益不是那么多的时候便失去了重要性,因为自己生病这么久了,相信他也知道的,我是拿着一个周病假条在坚持着上班,一个周的假,医院如果不是认为需要一个周的休息不会开假条的,因为这个病最重要的就是消炎休息,保证炎症不会因为走路等运动而加重,而实际上,自己这几天的上班,无一例外的都是上午好好的不痛,下午久撑不住了,因为疼痛,原因很简单,炎症区域又开始发作了,因为自己的一厢情愿的想为公司多着想,为他多着想,搞的自己一天天的痛苦着,而自己得到的是指责,是不屑,是因为自己太一厢情愿了吗?也许吧! 今天,请假了,他不耐烦的了。难道怪我吗?2天两夜不吃不喝,上吐下泻,抱着热水袋又是出差又是在厂里上班,肚子因为热水袋而引起的烫伤现在还在脱皮。我忍了,没有请假治疗。现在的急性感染,不能说和那次没有任何的关系,因为期间间隔只有一天而已,请假了,拿着医院开一个周的假期,请了2天的假期,说是要休息,因为其余的时间已经过去了,因为只剩下2天了,因为其余的天数坚持着上班了,也许是自己犯贱吧。 突然看到了一个朋友的群里面的口号,“珍爱生命,远离老板”看来是真的,因为老板与下属永远是利益的关系,永远不变的! 感觉说的很有道理,不管怎么样,因为针剂里面的地塞米松的激素作用,一天只能睡3个小时不到,但是,自己依然在坚持着,但是,相信会有限度的,因为,我也知道一个道理,公司不是你的救命恩人,相反,我们是把生命一点点的消耗,他难道支付了值得我们去消耗生命的薪酬了吗?没有! 不管了,明天好好休息,因为自己的身体最终还是自己的! 收起阅读 »
10/100M Ethernet 卡
联想D-LINK, 型号DFE-650TX, 不知道现在还有没有人用这个的说.
谁要的话站内发短信给我,免费送,你付快递费就行了(到付形式).
只有卡没有线,不好意思了.
谁要的话站内发短信给我,免费送,你付快递费就行了(到付形式).
只有卡没有线,不好意思了.
中奖了!!
走夜路总是要遇到鬼,呵呵这句话说的没错。偷偷在公司电脑上装了个MSN,没想到才装了2天就出事了。
前几天陆续有人电脑挂掉了,正在庆幸自己电脑没事情,结果今天上午,资讯部门打电话过来——老大电脑挂了,叫抱过去~汗,心想怎么没我的?其实我也在想装个MSN都没怎么用,病毒又不是我带进来的,不管他。谁知道,中午刚准备去吃饭,电脑上出现一行字“XXX,你的电脑已经中毒,请马上将电脑送到资讯贲门”,我汗,没办法,送就送吧,我也懒的删除MSN了,反正要查注册表也能查出来删除也是白搭。
上几次中毒一说是QQ带进来的一说是MSN带进来的,最后装MSN和QQ的都中奖了——全部被资讯部门拉进黑名单加处分和罚款。呵呵,整个一下午没电脑用,有些事情也没办法处理了,开始还稍微有点担心,结果饭一吃完,什么事情也不想了,管那么多干吗,呵呵~~~ 收起阅读 »
前几天陆续有人电脑挂掉了,正在庆幸自己电脑没事情,结果今天上午,资讯部门打电话过来——老大电脑挂了,叫抱过去~汗,心想怎么没我的?其实我也在想装个MSN都没怎么用,病毒又不是我带进来的,不管他。谁知道,中午刚准备去吃饭,电脑上出现一行字“XXX,你的电脑已经中毒,请马上将电脑送到资讯贲门”,我汗,没办法,送就送吧,我也懒的删除MSN了,反正要查注册表也能查出来删除也是白搭。
上几次中毒一说是QQ带进来的一说是MSN带进来的,最后装MSN和QQ的都中奖了——全部被资讯部门拉进黑名单加处分和罚款。呵呵,整个一下午没电脑用,有些事情也没办法处理了,开始还稍微有点担心,结果饭一吃完,什么事情也不想了,管那么多干吗,呵呵~~~ 收起阅读 »
闹钟、书立、存钱罐子
. 给自己买了一个蛮精致的小闹钟,一副铁书立,顺便买了一头存钱猪猪。感觉房间里整洁多了,书放好了,那些硬币也不随处可见了。唯一没看到好处的是闹钟吧,因为手机上次晚上自动关机让我迟到了一个小时特意买回来的。这几天都没上班,不过摆在桌上感觉很不错
既来之,则安之
朋友想看我的博客,就进来然后给她地址,
既然来了,就写点什么,写点什么呢?
啊,对了一年一度的轴展会要开始了,这里有人参加吗?
我们公司有摊位的,叫ZEN,在一楼,希望能碰到大家啊
既然来了,就写点什么,写点什么呢?
啊,对了一年一度的轴展会要开始了,这里有人参加吗?
我们公司有摊位的,叫ZEN,在一楼,希望能碰到大家啊
戴防靜電指套的意義!
目前我們公司正在大力推動靜電防護重要性是,在戴防靜指套意義上出現了爭議
據我了解,防靜電指套,以表面電阻值來分,規格有10的6次方歐姆到10的11次方歐姆!
有
據我了解,防靜電指套,以表面電阻值來分,規格有10的6次方歐姆到10的11次方歐姆!
有

