Ah, the swamp! My O&G career began with Texaco Exploration Ltd, mapping in Canada's frontier regions (ie, everything except the Western Canada Sedimentary Basin or WCSB). A running disagreement erupted between an interpreter who couldn't get his loops to tie one line, and the surveyor of that line. One Monday morning, the interpreter discovered, sitting on his desk, a sapling with the shot point tag still affixed. End of argument; start of re-interpretation.
My early lesson then was to engage with whoever had touched the data. Not always possible when material came from an archive but often during a coffee break I might turn up an inference ("Oh yea, I remember that survey; check with Miss R ...."). Later on, working offshore West Africa, I learned that digital survey information could still be misleading: there have been three different Pointe Noire datums. On a project for the MMS (now BOEM) about 2008, neglecting to specify a datum & projection for EACH GIS layer in a NAD83 (North American Datum 1983) project meant trouble. The GIS software assumed NAD27 if it didn't find a .prj (projection) file for the given layer. So I apply mapping standards while like Ronald Reagan, also "trust but verify". Call me a "Slow Cooker".
Dickson Intl. Geosciences(DIGs)
Original Message:
Sent: 09-11-2025 07:11 PM
From: David Feineman
Subject: Importance of data optimisation in the oil and gas industry
William
After reading your post, I remembered something I had seen a long time ago. At the beginning of the Mythical Man-Month there is a picture of a menu from a restaurant in New Orleans with a quote at the top. It reads: "Good cooking takes time. If you are made to wait, it is to serve you better, and to please you." The moral that complex code took time to prepare carries over into achieving fit for purpose data quality. Glad to hear that you are making some inroads on projects that improve data quality when conflating data from multiple sources.
At the risk of showing how long ago I was in the data quality swamp, in the 1980s the whole issue of data quality was actively being discussed by cartographers and surveyors.
In 1988, they came out with a proposed standard for digital cartographic information that identified attributes like lineage, positional accuracy, attribute accuracy, logical consistency, and completeness which in aggregate would help define a data sets fitness for a particular use that could then be encoded in some form of metadata label. I believe that showed quite insightful and innovative thinking on a tough problem, although I suspect many of the folks working in the area within E&P companies have probably never been exposed to it- but large organizations seem like they tend to minimize the value of past learning.
Original Message:
Sent: 09-11-2025 01:13 PM
From: William Dickson
Subject: Importance of data optimisation in the oil and gas industry
David,
Both your original post and its coda spoke loudly to me. My tiny consultancy chews on diverse data sets to optimize quality. Our resulting insights have taken 2 to 5 years in the most complex cases but even the quickest results follow the same pattern. What happens is that different data sets provide partial answers while offering apparent contradictions. For example, anomalies detected by ocean floor geochemical surveys will mismatch locations of both sea-surface slicks and seismic DHIs (direct hydrocarbon indicators).
Resolution includes an understanding of the radius of detection of each method, analysis of positional errors and even review of hand-written notes by the original surveyor. At some point, the data reach critical mass when we achieve a testable hypothesis that fits all the scrubbed-clean data while rejecting data that fails validation (bad coordinates; mislabeling and even fraud). We have made a living (just) by rolling such insights into non-exclusive studies.
AI tools help when we know how to prompt them which knowledge may come late in the process. Data management vendors know this so have developed scanning tools to identify metadata gaps and errors and either correct them (80 - 95% range) or flag issues that require manual correction. Smart organizations will access such vendors rather than try to build their own capacity or (shudder) continue to ignore the problem.
Disclaimer: I'm on the Board of AAPG's Datapages subsidiary. We are working on bringing data access solutions to our membership and publishing partners. See more at https://www.aapg.org/resources/datapages
------------------------------
William Dickson
Dickson Intl. Geosciences(DIGs)
Houston
billd@digsgeo.com
Original Message:
Sent: 09-05-2025 11:30 AM
From: David Feineman
Subject: Importance of data optimisation in the oil and gas industry
I would add one final thought since you framed the discussion as about "optimizing" data. When we look at end to end business processes for doing work, thinking about optimization generally boils down to choosing between 3 objectives. You could say it takes too long to get to the end product, so therefore you want to optimize to increase the speed of dealing with things so that you get results faster. You could say it costs too much to produce the end product, and therefore want to get the same product but cheaper than before. Or you could say you want to improve the quality of the product that you are getting. My opinion is that optimization to get better quality generally requires you to slow down the process and do new activities with people or machines that increase costs. And my interpretation of your optimizing data concept was to try to improve the overall quality of an enterprise's data asset. If the reality of implementing your vision adds costs and delays access, it may shed light on why gaining traction with management on this class of issues is difficult.
Original Message:
Sent: 09-04-2025 05:41 AM
From: Oluwatosin Abikoye
Subject: Importance of data optimisation in the oil and gas industry
thank you David.
Original Message:
Sent: 08-29-2025 09:15 AM
From: David Feineman
Subject: Importance of data optimisation in the oil and gas industry
Nice article describing the potential benefits if one re-engineered the workflows around data management and supported them with competent staff and useful technology.
· Regrettably, data management/data quality is a wicked problem generally associated with a high level of organizational irresponsibility and multiple dimensions based on the disciplines responsible for data acquisition and interpretation, and between the requirements of data made for immediate vs potential future use.
· Most data collected is never looked at or touched after the time of acquisition which erodes the business case around the optimized data process you are advocating. So investment in data cleanup, metadata capture and data stewardship in the large look like bad business decisions. To increase the likelihood of action you need to be very clear on the specific data classes and data types you think need to be addressed and the impact of lack of action and explain it to the potential user management community, but retain skepticism about the likelihood of success.
· We all instinctively know that good data + good data = good data, that bad data + bad data = bad data. What is difficult is that adding new good data to bad data produces bad data, where bad means untrustworthy for the purpose we intend to use it for. Conceptually, this means that what might have started out as a search for a deterministic answer has to be approached statistically due to the lack of reliability of the data.
· The rogue's gallery – that is the sources of bad data- is large and can impact 2D, 3D, and 4D problems. Consider sources like uncalibrated, drifting, or dead sensors, incorrect interpretations, issues in upscaling or downscaling measurement spaces in x,y,z, wrong locations, incorrect time stamps- the list is long and varoed making validation complex. That's all part of the landscape one would have to traverse to get to your optimized state.
· The data management / data quality morass is not new, although data sets have rapidly gotten larger and more complex due to technology. Perhaps data browsers that came in along with the drive to analytics allow users to find a company's data, but I am certain that companies can and do spend money on repurchasing data they own but can't find. Data validation remains on the user's shoulders at the time of need and probably does not result in improved enterprise data quality globally. Paul Simon wrote an additional verse to the song The Boxer that goes: "After changes upon changes we are more or less the same;
After changes we are more or less the same.
· If you made it this far, I will suggest there is an article in the New Yorker magazine that you should read called Vaunted by Zach Helfand which describes the fact checking process and activities they go through. Obviously, their use case is not scientific data flowing through E&P organizations. But it illustrates what can happened when an organizations management recognizes the need for accuracy and invests in the people with skills and is willing to slow down production cycle time to get to data quality on output.