
第四十五篇 A five-pronged approach to analyze process data

本帖最后由 小编H 于 2011-11-22 11:22 编辑


翻译:http://www.6sq.net/space-uid-393147.html 校稿:xy_persist

Make Data MatterA five-pronged approach to analyze process data解决数据问题——“五管齐下”分析过程数据
by Ronald D. S nee
作者:Ronald D. Snee

Is data analysis an art or a science?Arguments exist for both sides, and many people simply come down in the middle.In my mind, I believe it’s both.
数据分析是一门艺术或科学? 两种争论一直存在,很多人就简单的持中立态度。而在我看来,数据分析即是艺术也是科学。

Regardless of which view you take, thediscussion misses a critical element—the need for an explicitly articulatedstrategy for data analysis. In fact, the various attitudes toward the nature ofdata analysis of ten imply unreflective strategies.

Partisans of data analysis as an art simplymight look at the data, manipulate it based on their intuition and experience,and proceed confidently to extract what they believe is useful information. Themore scientific folks, with perhaps too much faith in numbers, go straight to statisticalsoftware and do some indisputable number crunching.

Those who stand on middle ground—possiblythe great majority of practitioners—do a little of both: rely on their insightto manipulate the data, run the numbers, do some further manipulation and rerunthe numbers until they achieve what they believe is a satisfactory result.

All of those approaches are likely toproduce questionable results in terms of what the analysis addresses and thesignificance of the results.

Five activities Practitioners can avoid thepitfalls of these unreflective or ad hoc approaches by adopting a clearlyarticulated, proven strategy for analyzing process data and systematicallyfollowing that strategy.1Such a strategy entails five essential activities:

  1. Understanding the context of the analysis.
  2. Examining the pedigree of the data.
  3. Graphically representing the process.
  4. Graphically representing the data.
  5. Statistically analyzing the data.

Note that these are iterative, as opposedto sequential, activities. Depending on the circumstances, the order of some ofthese activities may shift.

For example, in the mutually dependentiterations of this approach, the graphical representation of the process mayprecede the examination of the pedigree. In any case, most of these activitieslook forward and backward. The examination of the data’s pedigree—where it camefrom and how it was collected—may drive the analyst back to a fullerexploration of the context of the process to fill out that pedigree.

But the pedigree of the data also points tohow the process should be graphically represented. That, in turn, couldretrospectively suggest the need for additional types of data and prospectivelyaffect the graphical representation. By engaging iteratively in these activities,you can arrive at important results that are ready to be fully and persuasivelyreported.

This approach offers at least threedistinct advantages over less structured approaches. First, it is repeatable—itcan be used in any situation that calls for the greater understanding of aprocess. Second, like sound processes themselves, it’s robust—flexible enoughto encompass the wide variation of particulars to be found in differentsituations. Third, and most importantly, it’s more likely to produce usefulresults.

Understanding the context

It’s difficult to know precisely how toproceed until you ask the most basic of questions: What is the purpose of theanalysis? Are you trying to confirm a hypothesis?

For example, a manufacturer that uses rawmaterials from two different vendors suspects that differences in quality arecausing defects in the finished product. Data analysis can confirm ordisconfirm the hypothesis and, in this example, identify the offending vendor.Such contexts call for what is sometimes referred to as confirmatory dataanalysis.

Alternatively, let’s say you’re trying tosolve specific problems, the causes of which you do not understand. Forexample, a chemical process is producing unacceptable variations in purity frombatch to batch. Or a business process, like a bank loan approval process, istaking far too long to complete. Or, perhaps a distributor’s percentage ofon-time deliveries is fluctuating widely. These contexts call for exploratorydata analysis, which must first have a hypothesis to test.

In confirmatory and exploratory analyses ofa process, the goal is the same: find the inputs and the controlled anduncontrolled variables that have a major impact on the output of the process.2

Examining the pedigree

Data analysis begins with a data table,which is either provided to or constructed by the analyst. In either case, youshould always question the data because data can be, among many other things:

• Incorrect: Some of the information is wrong—for example, whensomeone monitoring a process records the data incorrectly or a measurementdevice is faulty.
• Irrelevant: Some of it is the wrong information—for example,when data on the wrong variables are captured.
• Incomplete: Crucial information is missing—for example, whendata on an important variable are missing.
• Misleading/biased: Data points you in the wrong direction foranalysis—for example, when an important variable has been examined only over ashort time, thus making it appear to be a constant.

An understanding of the context of the processcan guard against these errors, but the context alone is insufficient. Giventhese and the many other shortcomings that can undermine the value of the data,it is absolutely critical to understand the pedigree of the data—where it camefrom and how it was collected.

For example, consider a batch manufacturingprocess in which a sample is taken every shift and carried to an analytical labwhere it is tested for purity, and the results are recorded. Thus, the datatrail is:

Production process ► sampling process ►testing process ► data-logging process.
To understand the resulting data, it isnecessary to understand this data trail and the production process parameters.That is the pedigree of the data.

Incomplete understanding of the data’s pedigreecan lead you down wrong analytical trails. Suppose, for example, apharmaceutical company is experiencing differences in yield from batch to batchof a product because of the properties of the raw materials supplied by avendor. Although the properties for each batch of raw materials are withinspecifications, the yield nevertheless varies unacceptably.

The analyst has been given a data tablethat includes the properties of the raw materials for each batch of productunder consideration. But if the analyst does not know that some raw materialbatches were analyzed by the vendor’s quality assurance lab and some by themanufacturer, then there is a strong possibility the analysis will come upempty. By taking the time to understand the pedigree of the data fully, theanalyst can save much frustration and fruitless work.
分析师员得到一张包括每一批原材料属性的数据表 ,但是,如果分析师员不知道部分原材料是由供应商的质量保证实验室提供分析而和另外一部分是由制造商提供的话,分析员最后结果很有可能一无所获。所以花一些时间去全面的了解数据的系谱来源,可以减少分析时候的挫折感和做一些徒劳的工作。

Some Guiding Principles

• Theprocess provides the context for the problem being studied and the data beinganalyzed.
• Knowthe pedigree of the data—the who, what, when, where, why and how of itscollection.
• Analysis is defined by how the data were generated.
• Understand the measurement system as well as the process.
• Beaware of human intervention in a process. Humans are often a large source ofvariation.

Graphing the process
A graphical representation of the processshows how the process works from end to end. Such representations fall into twobroad categories: flow charts and schematics. A flow chart maps the sequenceand flow of the process and often includes icons, such as pictures of a truckto represent a transportation step or smokestacks to indicate a factory.

A schematic representation is designed toexhibit the inputs and the controlled and uncontrolled variables that go into aprocess to produce its outputs. Both types of representation reinforce oneanother by suggesting what types of data are needed, where they can be foundand how they can be analyzed.

Figure 1 is an elementary schematicrepresentation of a process (such as pharmaceutical, chemical or loan approvalAs the analyst knows, the context is unacceptable variations in yield frombatch to batch of the finished product. Therefore, “yield” is the key output.

To get an accurate picture of the processagain, however, analysts should not simply rely on the context. To find out howthe process really works, they should also observe the process first-hand andquestion the people who operate it. This investigation might also lead theanalyst to further refine the pedigree of the data—the who, when and why of itsmeasurement and collection. 然而,为了准确了解过程的画面的反映整个过程,分析人员不应该仅仅依靠上下文过程数据。分析人员应该直接赴一线去观察过程,询问具体的操作人员,这样才能知道流程是如何进行运转的。这样分析人员进一步了解数据的系谱来源,即是谁在什么时间收集的,以何时以及为什么要进行测量和数据收集。

With yield as the key output of amanufacturing process, the analyst can now graphically represent the processand fill in the blanks with the sources of possible variation that led to theunacceptable variations in yield. For the inputs, sources of variation might beenergy, raw materials and different lots of the same raw materials. Controlledvariables that go into the process might include things like temperature, speedof flow and mixing time.随着产量将收率作为的制造过程中的关键输出,分析人员可以用图形来方式表示过程并进行数据采集,也知道哪些是可能导致不可接受变量的来源。对于输入和变量,波动的来源可能是能源,原材料和不同批次的相同原料。与过程相关的可控变量包括如温度,流速和搅拌时间等。

In essence, controlled variables are thethings that can be adjusted with a knob or a dial. Uncontrolled variables thatgo into this process may include human intervention and differences in work teams,production lots, days of the week, machines or even heads on the same machine. Inthe output of the process, variation may result from the measurement system itself.实质上,可控变量是指能够用通过工具(knob or dial旋钮或刻度进行调节度量的变量事物。不可控变量在流程中可能包含人为干涉和不同的工作团队、生产批量、每周工作天数、生产工具生产设备甚至是同一个生产工具的使用人数所产生的差异同种设备的数量。在流程的输出变量中,波动可能由于度量测量体系本身所产生。

A good rule to follow when you have, forexample, two production lines doing the same thing or two pieces of equipmentperforming the same task, is to assume they vary until proven otherwise. That’sespecially true for the human factor. Experience shows that in creating theinitial data table and in the graphical representation of the process, thehuman element is a frequently overlooked source of variation. 可以遵循一个规则, 比方说,当你有两条生产线做同样的事情或者两套设备执行同样的任务,能够遵循的一个好的规则是确信他们是有差异的,直到你能证明他们是无差异的。你可以先假定不一样,直到证明它们确实没差别在人类的工厂里这毫无疑问是正确的。经验表明,在创建初始数据表和用图形表示的过程的时候,人为因素是一个经常被忽视的一种波动来源变量。

In the aforementioned pharmaceuticalmanufacturing process, the analyst may overlook that the process includes threeshifts with four different work teams on the shifts.在前面提到的药品生产过程中,分析人员可能忽略4个不同工作组的三个班次的交替进行三次转换的过程。

As a result of the observation andinvestigation that goes into constructing the graphical representation of theprocess, however, the analyst makes sure the data table records which teamproduced which batches on which days and that the data are stratified in theanalysis. The failure to take that human element into account results in ahighly misleading data table and might obscure the ultimate solution to theproblem.作为分析调查的结果制作流程图来展示工艺流程分析人员通过观测和深入调查研究了,分析人员要确保数据记录表要包含产品批次别,日别,班次别这些分层记录的数据知道数据是由哪个团队,是哪天以及哪个批次产生的,所以她们认为这些数据是分层。但是如果错误的把人的因素引入统计结果由于把人为因素考虑进去,从而导致错误数据表中数据的错误记录,可能会掩盖最终问题的解决办法。

Graphing the data
The graphical representation of theprocess—and the understanding of the possible sources of variation it helpsgenerate—suggests ways in which the analyst can graphically represent the data.Because data are almost always sequential, a run chart is often needed. In ourexample, the x-axis would register time and they-axis would register yield. 用图形展示过程能够给分析人员提供用图形展示数据的方法,过程图能够给出数据产生过程中可能发生波动的来源。因为数据大部分是按照时间收集的,一般需要做趋势图。在我们举的例子中,以及了解产生变量的可能来源,分析人员可以用图表表示数据。决大部分数据都是有时间顺序的,所以做趋势图就非常有必要。以我们公司趋势图为例x轴代表时间和Y轴代表产量。

A scatter plot also may be used, withprocess variables registered on the x-axis and process outputs registered onthe y-axis. Other familiar graphical techniques include box plots, histograms,dot plots and Pareto charts.散点图也是一种经常使用的图形,X轴表示过程变量,Y轴表示过程输出变量。其他的熟悉图形方法有箱线图,直方图,点图和帕累托图。

In using any of these techniques, the goalis to make sure you are exploring the relationships of potentially importantvariables and preparing an appropriate graphical representation for purposes ofstatistical analysis. Plotting the data in different ways can lead to insightsand surprises about the sources of variation. 不管使用哪一种图形表示,我们的目标是把重要的变量之间的潜在关系可以表现出来,并通过适当的图形来实现来展示统计分析的目的。通过不同图形来展示数据,会得到不同的图形,我们会惊喜的发现波动的来源视角。

Statistically analyzing the data

The statistical analysis of the data,usually with the aid of statistical soft ware, establishes what factors arestatistically significant. For example, are the differences in yield producedby different work teams statistically significant? What about variations intemperature or flow? What about the measurement system itself? 用统计的方法进行数据分析通常需要在统计软件的帮助下,数据的统计分析需确定那些统计因子的具有显著的统计意义。例如,不同的工作组的产量是不同吗制造的产品收率差异具有统计意义吗?温度或流量是怎么变化的?测量系统本身的有什么问题?

The key to success lies in intimatelyknowing the data from the context of the process, graphically representing itand formulating a model that includes the comparisons, relationships, tests andfits you are going to study.对数据进行分析成功的关键在于要对数据产生的的背景非常熟悉,能够用图形展示过程用图形展示数据,并建立(一个)具有可比性、相关性、可测试验证、可拟合的模型这个模型可以用于进一步的研究进行进一步研究。

Once you have created the graphics and donethe statistical calculations, the results should be checked against the model.Does it account for all of the variation? In short, do the results make sense?If so, you can confidently report your results.3如果一旦你创建了的图形,并运行了统计分析,最后统计的结果要再一次验证你的数据模型。是否包含记述了所有的变量?总之,你的结果有意义吗?如果有意义,就可以放心地汇报你的结果。

Beyond analysis to action

The final point about reporting the resultsoffers a reminder that analysis goes beyond the exploratory or confirmatory.The analyst also must be able to display and communicate results to decisionmakers. The most elegant analysis possible is wasted if it fails to communicateand the organization therefore fails to act. 最后给分析人员一个提醒:分析的结果不仅仅局限于探究性分析和验证。分析人员必须要能跟决策者沟通,并向决策者清楚地表达清楚你的分析结果,同时要会沟通。如果因为不擅长沟通或者而造成组织没有将分析结果付诸行动执行,那是非常可惜的再完美的分析也是徒劳无功。

Early in my career, I was asked to analyzewhether a chemical company’s new product had adversely affected animals insafety studies. Personnel in the company’s lab insisted the data from theexperiments showed adverse effects, and the company should therefore ceasedevelopment of the product. Analysts on the company’s business side hadconcluded the data showed no adverse effects. My analysis reached the sameconclusion, and in a showdown meeting between the business and the lab personnel,I presented my findings. 在我刚参加工作不久,参与分析了一家化工厂开发的新产成品是否对动物有不良影响的安全性研究。公司里实验部门的分析人员坚持认为实验数据显示得出的结果是对动物有不良影响,所以建议停止这家公司应该停止开发这款新产品。分析人员从公司的商业角度分析没有不良影响。我的分析得出同样地的结论,在实验部和商业部的最终表决会议上,我发表我的观点。

At the conclusion of my presentation,replete with analytical representations of the statistical significance of thedata, the lab director remained unconvinced. So I handed him one final graph: adot plot that, for some reason, I had not included in my presentation.我演讲陈述的结论都以具有显著统计特征的数据为依据,是数据分析的结果但实验部门的主管始终不为所动持怀疑态度。所以,我递给他展示了最后一张图表的最后一个图-点图,出于某种原因,我没有在我的演包括之前的陈述中包含这张图。

He looked at the graph and began to thinkaloud while everyone in the meeting sat silently. He continued to look and talkand look and talk. At last, he said emphatically, “Maybe there isn’t adifference.”他一边看着图一边自言自语的说些什么,其他的人安静的坐在一旁。他继续一边看一边说些什么自言自语,最后他大声的说了一句:“也许没有什么区别。”

In the absence of that persuasive graphicalrepresentation and model of the data, the company might have ceased productionof what turned out to be a valuable and harmless product. The bottom line isthat the analyst must not only do data analysis that matters, but also make itmatter. 在缺乏有说服力的图表和数据模型时候,公司可能会放弃生产原本是有价值却无害的产品。因此分析人员不仅要分析数据本身只针对问题进行数据分析,也要将使数据分析的结果变为成为事实。


xy_persist (威望:2) (天津 河西区) 电子制造 部长 - 6Sigma黑带


我来啦!领豆豆 ^_^

3 个回复,游客无法查看回复,更多功能请登录注册

