Data analytics, the process of transforming data into actionable information, is a core need for a successful digital transformation strategy but continues to be addressed from an Industry 3.0 perspective. Identifying and addressing key blind spots is critical for scalable, and ultimately, successful data analytics. Digitally transforming our operations requires a shift of mindset and practices, leaving behind the standard architectures and deployments deeply engrained in our organizational culture for the past forty years.
Most organizations are more reactionary than strategic in how they handle data analytics. Without forming a clear picture of the desired result, analytics solutions are piecemealed together, resulting in unwieldy architectures, and increasing amounts of technical debt.
The goal of a Unified Analytics Framework (UAF) is to provide an analytic architecture that reduces or eliminates the production and management of custom code and scripting, centralizes integration work, and creates a single source of information that functions as a bi-directional information gateway. In conjunction with the Unified Namespace (UNS), the UAF provides a single source of truth for the information an organization requires, uniting structur eand events from the UNS with actionable information from siloed databases lacking context.
The UAF provides a centralized environment for cleansing, contextualizing, and combining various datasets. With all underlying data sources and formats normalized and requiring little to no code, the UNS model is extended into a universal information model, providing calculations, KPIs, and events. By selecting a hub/spoke architecture over custom point-to-point integration, we realize increased scales of economy while keeping any custom work isolated in one location.
As an analytics hub, the UAF must in turn publish all information back to the UNS and other connected data sources through an open and universal protocol with a standardized, single data structure. Data shared must include all calculations, KPIs, events, model information, and metadata created within the UAF, as well as access to the raw data contained within the connected data sources as well. This extended directory function enables all connected applications to query all connected data sources from a single access point and in return receive a standardized data package despite the underlying data source format.Additionally, the UAF persists all raw data in its original, and often most efficient, form, structure, and location, avoiding unnecessary transfer and replication.
Replicating uncontextualized data to data warehouses or data lakes to later be incorporated into an analytics process has proven expensive in both labor and storage costs. Data arrives in various formats, without any time normalization between points, and lacking contextualization that likely needs to be added by team members local to the data source. The cost of replicating and storing the raw data is immense, but shadows in comparison to the amount of staff hours that are required to try to make the data useable. Instead, if a data lake is part of your digital strategy, it is recommended to send contextualized and standardized data aggregates or KPIs to the data lake, while making raw data accessible as needed from a single, and performant, endpoint.
Attempting to build an analytics framework on top a data historian can also prove costly. This approach requires complete standardization of all data historian instances to a single vendor, large license commitments, and the likely replacement of previous investments that function adequately for the specific job they have been tasked with. An additional burden of this approach is forcing the ingestion of relational data onto a time series optimized platform specifically for the benefit of contextualization.
Finally, the limitations of historian calculation functions should be considered in two areas – calendar referencing (and flexibility) for shift, production, and finance schedules, and the capability of versioning, or re-running of, calculations. The historian is as critical as ever in data analytics; however, it is best positioned solely as a data node for large amounts of time series data storage and diagnostic trending.
The crux of collaborating with talented engineers and developers is that their default response to most problems is to build a solution that address the issue as quickly and efficiently as possible; often doing so without considering the long-term consequences, costs, and scalability.
Software solutions that are specifically built to solve a problem for one situation at one location, do not reliably scale to many situations across many locations. The corresponding code base begins to grow exponentially as more and more custom situations must be accounted for as the number of deployments increase. What starts as an efficient fix becomes an ‘internal product’ that requires a team of half-a-dozen employees with specialized skills (and high salaries) to manage and execute.
Adding further complexity, the team building and managing the internal product often lacks deep understanding of the manufacturing process. This results in difficulties contextualizing data and increasingly disconnects them from many of the nuisances that must be addressed within the product that are all too familiar to the operations team.
Overtime, management of the software solution becomes the key focus and demands the majority of the team’s time. More tools need to be developed and integrated to centrally manage the solution. Templatization needs to be incorporated to decrease deployment times, and version support must be added to allow the solution to work across numerous locations with hardware restrictions out of the team’s control. These necessities shift the development away from new features and product usefulness begins to decline.
Over just a few years, this approach will cost millions of dollars in salaries, lengthens the time to value, and requires organizations to manage hundreds of thousands of lines of code as they shift their focus from manufacturing specialist to software developers.
Manufacturers continue to struggle with data readiness, the pre-integration, cleansing, and combining of multiple disparate data sources into a ready-to-use information gateway. Despite their struggles, they often pursue machine learning proof of concepts, and spend time evaluating applications and tools before they are data ready. The preferred approach would be pausing these efforts and addressing the two key issues facing them – data quality and integration management.
Data of poor quality, such as inaccurate, inconsistent, or incomplete data, is a major barrier to data readiness. Manufacturers need to ensure that their data is accurate, up-to-date, and relevant before they can use it to inform decision-making. Engineers who use uncleansed or untrustworthy data risk making errors resulting in poor process design/inefficient processes, quality control issues, increased rework costs, reputation damage, health and safety risks, and even legal liability. Worse yet, moving inaccurate data further downstream corrupts the output of other systems.
With such severe consequences resulting from acting upon bad data, it only takes one or two instances of poor data integrity to cause engineers and data teams to no longer trust data sources. Therefore, it's crucial for manufacturers to use properly cleansed and trustworthy data to make informed decisions and produce high-quality products.
Another major issue is the lack of a centralized and standardized approach to data integration and management. Manufacturers often have data scattered across different physical locations, departments, systems, and databases, making it difficult to access and use the data effectively. Accessing all of these systems, attempting to provide proper data context, and defining the calculations, combinations, and transformations of the data into useable information, must be a collaborative effort between both operations and data teams. This process should be defined and centralized within the enterprise so that all people and systems can benefit from the effort.
Once the data is accurate and available, time to value is reduced exponentially, allowing for new systems to be incorporated and new business use cases to be achieved. The benefits include increased efficiency, better quality control, increased agility, less downtime, better resource management, and improved collaboration between departments, suppliers, and customers.
The successful digital transformation of operations requires a shift away from traditional approaches and a refocus on scalability. This can be achieved through the creation of a Unified Analytics Framework (UAF), which centralizes integration work, creates a single source of truth, and reduces the production and management of custom code. Additionally, it is crucial to avoid common blind spots such as writing custom software and code, not prioritizing data readiness, and lacking data integrity safeguards. By addressing these blind spots, organizations can realize the benefits of a scalable and successful data analytics strategy, ultimately leading to a successful digital transformation.
At the forefront of advancing analytics and data readiness in the life sciences sector, Skellig Automation and Flow Software have aligned in a strategic partnership. This collaboration sets a fresh benchmark for actionable KPIs and insights in the industry.
Most medium to large scale manufacturers are in crisis--you may be among them--they cannot scale their analytics efforts to embrace today’s technologies, let alone tomorrow’s. To compete in the new manufacturing landscape, you must unleash the full potential of your data.
Flow Software, a company that transforms the data accumulated within organizations into valuable, usable knowledge and intelligence, has joined as the newest member of CESMII, the Smart Manufacturing Institute.