Using Language Processing Software For Auto-Review Of Oil Analysis Reports

Mar 14, 2018 2:58:19 PM


Bluefield has completed many reviews on the effectiveness of condition monitoring systems, including those involving oil analysis. Often we find that the site engineers are overrun with many tasks and rely on the Red, Yellow, Green symbols (A, B, C or X) samples from the lab performing the analysis. Unfortunately, the value of oil analysis is to enable the site to become proactive and take steps to avoid the onset of failure, not just stay slightly ahead of breakdowns and take action only when a red flag has been raised by the lab. The trends and analysts’ comments from their interpretation of the results offer proactive information however, on a large site, the engineers or planners do not have sufficient time to read through many thousands of reports and comments. Therefore we often see that action is only taken when it is too late to enable the business to work in a predictive manner.

It seems like a difficult situation to overcome, especially with businesses in mining reducing their workforces but requiring more from the equipment.

Bluefield has worked with our partner, Relialytics, to use the latest technology in text, natural language processing and data visualisation in order to identify opportunities to improve oil condition monitoring and reduce the time required for engineering resources to extract the important issues requiring proactive action.

Text and natural language processing software tools generate and visualise networks that show the grouping of words and phrases and the linkages between them. This enables text based recommendations and anomalies to be identified and highlighted automatically, rather than read through all of the comments one sample at a time.

Bluefield has completed a small scale study using these technologies on oil sample data from Caterpillar 3508 engines and transmissions from the same machines over a period of almost 10 years (including several component replacements). From this study we have not only identified clear trends in the components throughout their lives, but we have also been able to quickly identify actions that should have been taken even when the samples were reported as normal. Additionally, we have identified other anomalies that should have been corrected to have an effective oil condition analysis program.

The Study

The intent of this study was to simply identify what is possible from this technology. For the purposes of this initial study, Bluefield has utilised oil sample data recorded over 10 years from 3 Caterpillar 3508 engines and transmissions from the same machines.

The data was analysed by Relialytics using the oil analysts’ comments as a basis for the analysis and connecting comments to numerical values. By comparing the language contained in the comments between each oil sample, Relialytics was able to construct networks showing how each of the samples connects to each other.

By examining relationships between samples via language and visualising the results it was possible to very quickly and efficiently:

  • Show groupings of normal and abnormal samples.
  • Highlight particular problems specific to a component, even for ‘A’ samples.
  • Quickly determine the events and trends that have resulted in mobile equipment being pulled out of service due to highly abnormal sample results.

Additionally, by associating each sample’s numerical data with the associated analysts' comments, it is possible to determine whether the comments and gradings are consistent with the numerical results as laboratory contracts and analysts change.

The Results

The 3508 engine network in Figures 1 to 8 below demonstrates what has been produced from the analysis software. To the uninitiated these may look difficult to interpret however the networks offer important insights that can be quickly identified and then examined in more detail against the numerical data.

In a network, each node represents an oil sample. The different coloured clusters of samples represent samples that contain similar language in their comments.  The links between samples show they are related by language.

The intent of this paper is not to enable the reader to be able to interpret the network diagrams, as this requires some further effort. The key is to recognise that the software is able to group similar language and also highlight outliers, then cross reference against the numerical data. The network diagrams referred to below are shown as a basis for visualisation of the results only and the clusters which represent closely related language are represented by more dense areas of the network diagram. The more dense the grouping of nodes, the closer related the language.

There are several important learnings from this initial level of analysis as follows:

  1. Dense clusters – see Figure 1.

In these condition monitoring networks, the dense or tightly packed clusters (highlighted) represent normal samples i.e. in each cluster the sample comments exhibit the same language.

  1. Dispersed clusters – see Figure 2.

The more dispersed clusters show some similar language in the comments but there are changes from sample to sample. These tend to represent abnormal samples.  Figure 2 shows how the language in sample comments progresses through a dispersed cluster from slightly high to high copper levels.

  1. Network spine shows abnormal connecting comments – see Figure 3.

The spine of the network is made up of sample comments that connect the various clusters. Variation of the language in the comments relating to samples exhibiting abnormal readings provides the network with structure and makes them easily identifiable.  Over 60% of the samples in the spine shown in Figure 3 are “B” samples or the probable beginnings of an issue.

  1. Identify clusters of comments specific to a component – see Figure 4

Across its lifetime, an equipment component will provide any number of abnormal sample readings.  It is likely that similar components in other pieces of mobile equipment will provide similar samples and comments.  As a result, using this type of network analysis enables very specific issues affecting small numbers of mobile equipment to be quickly identified. I.e. We can quickly identify equipment components that stand out from the rest of the components in a large group.

Figure 4 shows an example where oil oxidation has been identified in samples.  This has only occurred in one of the engines.


Figure 1 – Engine Network Showing Denser Normal Clusters (1, 2, 3, 8, 10)

Figure 2 – Engine Network Showing Abnormal Sample Comment Progression in a Dispersed Cluster

Figure 3

Figure 3 – Engine Network Spine Showing Abnormal Sample Comments


Figure 4

Figure 4 – Easy Identification of Abnormal Samples Specific to a Machine Component


Drilling down into the clusters and nodes we can take more detailed learning from this analysis as shown in the example below.

In point 4 above we had used the network diagrams to identify that one of the engines examined in this paper was showing repeated occurrences of oil oxidation.

A plot of oil viscosity vs meter readings (see Figure 5) shows repeated samples of high viscosity and oxidation levels for the troubled engine.  After the engine was replaced the oil viscosity returned to more normal levels and no more oxidation issues were reported.

A plot of copper levels vs meter readings (see Figure 6 – where node colour represents different engines) shows that, for the same engine and at the same time as there were high levels of oil oxidation, there were sustained levels of copper in the oil samples.  Samples comments suggested that while the copper levels were probably due to chemical leaching, it was recommended that the oil filters be cut open and inspected for debris to determine whether other issues were present.

At the same time, by changing the emphasis of the node colour (in both the oil viscosity and copper vs meter reading plots) to represent the sample gradings (see Figure 7) it was possible to show regular inconsistencies in sample grading for high viscosity and copper readings for the engine exhibiting oil oxidation.  Figure 8 highlights that even though some samples were commented on as having high viscosity and copper levels they were graded as “B” samples.  There were other samples graded as “C” samples that had even lower viscosity and copper levels than the “B” samples.



Figure 5

Figure 5 – Oil Viscosity vs Meter Readings by Engine Type



Figure 6

Figure 6 – Copper Levels vs Meter Readings by Engine Type


Figure 7


Figure 7 – Oil Viscosity vs Meter Readings by Sample Grading


Figure 8

Figure 8 – Copper Levels vs Meter Readings by Sample Grading


The Possibilities

Even from this small scale study it was possible to identify issues between sample grading and lab commentary that were more than likely preventing the early and proactive identification of issues with key mobile equipment components.  The likely results were sustained component damage resulting in replacements of components that could have been extended.

While this study was on a small scale, the potential for this technology is compelling. Some of the possibilities include:

  1. Process large amounts of information (thousands of samples monthly) quickly and efficiently to identify the necessary actions required to develop more proactive maintenance regimes.
  2. Quickly and efficiently, identify clear trends over the lives of components for specific elements that can be treated as a baseline performance over the life of the component. This provides insights into the normal progression of the life of a component without having to analyse thousands of samples manually to obtain these insights.
  3. Minimise the resource time to review condition monitoring information and oil analysis reports, raise subsequent actions in a proactive manner which can extend equipment life
  4. Understand the condition of components for an entire machine quickly in order to plan replacements and align downtime for groups of components
  5. Enable a deeper understanding of analysts’ comments and open a dialogue with labs that can provide improved reporting and clarity.
  6. Analyse subsequent work orders resulting from the condition monitoring reports by using the language recognition software to review extended descriptions
  7. Potentially build a level of automation into the subsequent work order generation
  8. Analyse breakdown work order information to quickly identify problem areas even though the failure codes are not utilised.
  9. Quickly compare like equipment across multiple sites and highlight site specific conditions.
  10. Compare the performance of equipment types and manufacturers.