DATA & ANALYTICS | DATA ACQUISITION

Eight Essential Best Practices for Data Acquisition Success

By Brian E. Bolton

April 11, 2022

In today’s industrial automation environment, the amount of data available is increasing so fast that businesses are hard-pressed to keep up with it. Data comes from multiple sources now more than ever before. In addition, the cost of storage space has decreased, making the increase in Big Data affordable. Under these circumstances, businesses run the risk of missing valuable data that is critical to success or, worse, storing mounds of data that has little to no impact on the business.

In the past, engineers and instrumentation/electrical technicians tended to be the only ones to access and use the data, but now the entire organization uses data. So, how can businesses ensure the quality, type, and quantity of stored historized data remains relevant year after year and the right people can access and use it?

With new advances in smart manufacturing and Industrial Internet of Things (IIoT)-enabled technologies on the market, businesses can proactively capture, collect and store data from incoming raw materials to the final product to consumers. They can discover which real-time data adds value to improve operational efficiencies and increase their competitive edge in the marketplace. The first step is to understand data acquisition systems and consider the eight essential best practices for data acquisition success.

Breaking Data Down Bit by Bit

In its simplest form, a data acquisition system (DAQ or DAS) samples signals that measure real-world physical conditions and converts the resulting samples into digital numeric values that a computer can manipulate. These systems typically convert analog waveforms into digital values for processing. Data acquisition system components include:

  • Sensors – to convert physical parameters to electrical signals
  • Signal-conditioning circuitry – to convert sensor signals into a form that can be converted to digital values
  • Analog-to-digital converters – to convert conditioned sensor signals to digital values

Another data acquisition component is edge-computing technology. Obtaining data from the edge is becoming increasingly important as some applications need computing power and access to data immediately. 

  • Edge data – data delivery in the form of a message via Rest API or data storage devices. Data is created in a messaging format and transferred via cloud services or a web application programming interface (API).

In addition to the above components, many companies have utilized all forms of programming languages to develop data acquisition software programs to help capture mission-critical data. At the same time, numerous vendors have developed their own versions of data historians (e.g., OSIsoft PI, AspenTech IP21, and Rockwell Automation’s FactoryTalk Historian), which are used to acquire and store selected data from instrumentation and control system sources. For the first data acquisition best practice, it is essential businesses understand automation and control system sources to know which ones are right for your data acquisition needs.

1. Understand Automation and Control Sources

Field instruments use a variety of sensors to convert physical properties, such as valve position, temperature, pressure, level, density, viscosity, and more, to electrical signals that are interpreted via control systems. The control systems are the heart and brain of the automation process, so understanding how they fit in with data acquisition system requirements is key. The type of control system used depends on the complexity of the process being automated. Currently, the most common industrial instrumentation and control systems, platforms and devices are:

  • Supervisory control and data acquisition (SCADA) – A SCADA software tool is used to view, monitor and control process variable data, while providing a graphical representation of the process via human-machine interface (HMI) displays.
  • Programable logic controllers (PLCs) – PLCs handle data up to about 3,000 I/O points.
  • Distributed Control Systems (DCSs) – A DCS handles data when the I/O point count is greater than 3,000.
  • Manufacturing execution systems (MESs)/manufacturing operations management (MOM) – MES/MOM systems help control warehouse inventory; packaged raw materials, packaging material and parts
  • Enterprise resource planning (ERP) systems – An ERP captures administrative data; load times, equipment utilization, personnel availability, orders and raw material availability
  • Edge devices – These devices query and store remote data; lighting, weather sensors, pump and motor details and electrical energy usage.

These platforms can collect, generate, organize, and manage data that will be valuable to the business through data-analytic tools. Figure 1 is an example of how data acquisition systems and tools are networked.

Data Acquisition Network Connections

Figure 1. Data acquisition network connections

2. Understand Connectivity/Interfaces and Cloud Connectors

Getting process automation data from the control system sources and written to the data historians requires connectivity via interfaces. Understanding the different types of interfaces needed for collecting and storing the required data is the second data acquisition best practice. The interfaces typically reside on a separate server and are commonly referred to as interface nodes.

Some of the most used interface types include:

  • OLE for Process Control (OPC) – OPC is an interoperable, software interface standard that allows Windows programs to communicate with industrial hardware devices. OPC servers are implemented in a client/server architecture. A control system uses a hardware communication protocol that the OPC server software program converts into an OPC protocol.
  • OLE for Process Control-Data Access (OPC-DA) – OPC-DA, developed by the OPC Foundation, was designed to eliminate the need for custom drivers/connectors to communicate with various sources. The OPC-DA standard has had multiple revisions to keep up with the changes in data sources.
  • OLE for Process Control Historical Data Access (OPC-HDA) – OPC-HDA is used to retrieve and analyze historical process data for multiple purposes, optimization, inventory control, and regulatory compliance to name a few. OPC-HDA servers are typically used for retrieving data from a Process Data Historian, relational database, or a remote terminal unit (RTU).
  • UFL – Universal File and Stream Loading, known as PI UFL, was developed by OSIsoft for reading ASCII data sources and writing the data to the PI data historian.

In addition to these standard connections and interfaces, industries are also working with three different types of cloud service models:

  • Software as a Service (SaaS): a software distribution model where third-party providers host applications and make them available to customers over the internet. Some examples include Google Apps, Salesforce, Dropbox, DocuSign and Slack, to name a few.
  • Platform as a Service (PaaS) or application platform as a service (aPaaS): a type of cloud-computing offering where service providers deliver a platform that enables clients to develop, run and manage business applications without having to maintain the infrastructure such software development processes normally require. Examples of PaaS are AWS Elastic Beanstalk, Windows Azure, Apache Stratos, Force.com (SalesForce) and Google App Engine.
  • Infrastructure as a Service (IaaS): is a service model that delivers a computer infrastructure on an outsourced basis to support enterprise operations. IaaS provides hardware, storage, servers and data center space or network components. Examples of IaaS are DigitalOcean, Microsoft Azure, Amazon Web Services (AWS), Rackspace and Google Compute Engine (GCE).

These services are commonly referred to as the “cloud computing stack.” IaaS is on the bottom of the stack; PaaS is in the middle and SaaS is on top. Data collected via cloud services can be securely transferred from one data source to another via cloud connectors. This works very well when multiple locations need to collect data on their own servers and share across the enterprise.

3. Properly Set Up Buffering

“Buffering” is an interface node’s ability to access and temporarily store the collected interface data and forward it to the appropriate historian. Properly setting up buffering is the third data acquisition best practice.

To effectively perform data acquisition, it is recommended buffering is enabled on the interface nodes. Otherwise, if the interface node stops communicating with a historian, the collected data is lost. Buffering application programming interfaces (APIs) (e.g., API Buffer Server [Bufserv] and PI Buffer Subsystem [PIBufss]) can read the data in shared memory. If a connection from a data source to the historian server exists, the buffering application can also send the data to the historian server. If there is no connection to the historian server, it continues to store the data in shared memory (if shared storage memory is available) or writes the data to disk (if shared memory storage is full). When the buffering application re-establishes connection to the historian server, it writes to the historian server the interface data contained in both the shared memory storage and the disk.

4. Effectively Plan Backup and Archiving

Establishing efficient and effective backup and archiving plans is the fourth data acquisition best practice. It is especially important to understand the difference between backing up data versus data archiving. Data backups are used to restore data in case it is lost, corrupted, or destroyed. Data archives protect older/historical information that is not needed for everyday business operations but is occasionally needed for various business decisions.

Backup strategies are key for protecting current/immediate data. Most IT professionals have already established best practices for backing up all the networked systems. This applies to systems inside and outside of firewalls. Protocol documentation is critical to backing up and restoring data when things do not go as planned.

Data archiving is the practice of moving data that is no longer being used to a separate storage device. Data archives are indexed and have search capabilities to aid in locating and retrieving files. Several data backup software vendors (e.g., AWS Cloud Services, Rubrik and SolarWinds MSP) are addressing archiving in their current and future software releases. Several studies (e.g., SolarWinds MSP) are available online concerning backups versus archiving.

5. Properly Set Up Scan Classes

Understanding how to properly set up scan classes is the fifth data acquisition best practice. Historian interfaces use a code called a scan class to scan tags at different time intervals and schedule data collection. Scan classes determine a period of time in hours, minutes and seconds that tells the historian how often to collect the data. An interval and an offset define the scan class. The offset can be used to adjust specific time intervals. The offset helps avoid having two scan classes with the same frequency scanning at the same time.

The commands used for scan classes are as follows:

/f=SS (The frequency equals time in seconds)

/f=SS;SS (The frequency equals time in seconds with an optional offset time)

/f=HH:MM:SS (The frequency equals time in hours, minutes and seconds)

/f=HH:MM:SS,hh:mm:ss (The frequency equals time in hours, minutes and seconds with an offset time)

/f=00:01:00,00:00:15 /f=00:01:00,00:00:45 (Two scan classes with the same frequency but using offsets to avoid scanning at the same time)

Knowing the data to be collected is essential to setting up the scan class. For example, data for temperature, level, pressure, and flow will need a faster scan rate. Data for starting a pump or opening a valve may only need to be written when the state changes. Properly setting up the scan classes will ensure your system runs as efficiently as possible. 

6. Organizing Data

A little over 5 years ago, the organization of collected data at the historian level was limited. As the amount of data being collected continued to grow, it became increasingly more difficult to find and group data in ways that made sense to the people consuming the data. Organizing data is the sixth data acquisition best practice.

Various software programs make it easier to organize data. The most commonly used PI Server and its Asset Framework (AF) component make the organization and sharing of data much easier. The AF component can integrate, contextualize, refine, reference, and further analyze data from multiple sources and even external relational databases. Users can create a hierarchy of elements/assets and all their attributes including metadata. For example, a major dog treat manufacturer has four facilities that manufacture chicken, beef, and pork flavored dog treats. All four facilities also have the same type of equipment, raw material storage, blenders, presses, ovens, and packaging. Figure 2 gives a high-level view of the dog treat manufacturer’s element/asset hierarchy. Setting up an AF structure and performing the task properly requires individuals who have a high-level understanding of the elements/assets within the organization.

Manufacturer element asset hierarchy

Figure 2. Dog treat manufacturer’s element/asset hierarchy

Attributes for the assets should be added at the asset detail level. For Raw Material Storage Tank CRM01, you may have the following:

  • Level
  • High level alarm
  • Temperature
  • Cooling on/off
  • Tank capacity
  • Inlet valve open/closed
  • Discharge valve open/closed
  • Product name

Metadata from other sources can be set up as well:

  • LOT number
  • Date received

7. Metadata Use

Understanding the effects and use of metadata is the seventh data acquisition best practice. Metadata is defined as “a set of data that describes and gives information about other data.” Using software-coded connectors, access to data from all types of data sources is possible. Having the ability to link metadata to assets provides some unique ways to collect, analyze, visualize, and report on process conditions.

Linking data from MESs, ERPs, or even maintenance planning sources will make the available information even more relevant to users. Generating templatized displays will allow a user to visualize similar assets with just a single mouse-click. These content rich displays have process-related assets and attributes, as well as various metadata details. Now the displays can show not only what is currently being monitored but also other tasks, such as “time until next maintenance due” and “name, model, date of installation, and runtime hours.”

Having such a high level of detail makes for better informed, data-driven business decisions. Businesses that think through their processes and identify every piece of data that can contribute to their success, and work toward acquiring that data, will be the most successful. It is extremely important to approach the use of metadata as an extension of the live attributes being collected.

8. Obtain Data from the Edge

Understanding the advantage of obtaining data from the edge and knowing how to obtain it is the eighth data acquisition best practice. Collecting data from the edge is not necessarily a new concept, but it is now more affordable than ever.

Edge computing streamlines the flow of traffic from IIoT devices for real-time data analysis. Data from sensors in the field is written to edge devices and then written to the edge infrastructure. From the edge infrastructure, the data is replicated to the centralized data center (typically in the cloud) at low roundtrip speeds of 5 to 10 milliseconds.

Edge devices provide an entry point into an enterprise core network. Some of the latest edge devices have the historian databases embedded for collecting data for synchronization via multiple ways of connecting. New small, form-factor devices can be purchased for as little as $299. Data from sensors in the field is written to edge devices and then written to the edge infrastructure. From the edge infrastructure, the data is replicated to the centralized data center (typically in the cloud) at a low roundtrip speed of 5–10 ms. Data from the edge brings information from the most remote areas of the business to the heart of the data collection system at nearly real-time speed. Having as much real-time, quality data available as possible for decision making will keep businesses competitive.

Valuable Data Leads to Success

To ensure historized data remains relevant year after year and the right people can access it, consider these eight best practices as the most practical means to help determine data acquisition objectives and strategies. Also, consider consulting a third-party automation solutions provider to help implement a quality, high availability data acquisition system. They can provide a holistic view of data acquisition systems and software, while helping review the various vendor options on the market, including historians and data-analytic tools.

Today’s data acquisition technologies provide the opportunity to improve asset utilization and fully realize the benefits of Big Data and enhanced production processes. Achieve business gains and stay ahead of the competition with the most dependable data acquisition system and software in place.

ABOUT THE AUTHOR

Brian E. Bolton (brian.bolton@mavtechglobal.com) is a consultant for MAVERICK Technologies. He has 35+ years of experience in chemical manufacturing, including more than 20 years involved with the OSIsoft PI Suite of Applications, Quality Assurance, Continuous Improvement and Data Analysis.

Want a MAVERICK expert to contact you?

Want to Read More Articles Like This?

Subscribe to our quarterly newsletter.

Inside Automation - Latest Edition
Inside Automation - Latest Edition
IA-landing-page-banner-1100px