Is there anything else I can help you with today?

Is there anything else I can help you with today?

It was great to be back in Malmö after several years away and to catch up with old friends and colleagues at the recent European Bootcamp. Having lived in Malmö for close on 3 years from 2017 to 2020, it was surprisingly nostalgic to be back there, and it was fun to be able to visit some of my favourite old haunts (dragging others along with me), and to see what had changed in the time since I had been away. As with the Boston event last October, the Malmö Bootcamp was without question a success, being well-attended by an enthusiastic audience full of good ideas and questions and packed with great talks by some excellent speakers.

I ended up giving two talks, one on alternative solution ideas (some of which are discussed in previous blog posts) and one on the currently somewhat fashionable topic of Event Driven Architecture (EDA) and how it can be applied to OpenVMS applications. The latter talk seemed like a good idea when I submitted the abstract, as I could see how an EDA approach could be helpful to many of our customers from the perspective of enhancing business agility and helping to facilitate highly scalable, flexible, and performant integration with other systems, however when I started trying to pull a presentation together I began to regret the idea; it just was not coming together well and was shaping up to be a rather dull and largely theoretical talk; I was struggling to come up with any interesting and readily relatable ways to get across many of the key points I was wanting to make. But I remained optimistic that inspiration would strike, and eventually it did, thanks to conversations with two customers over just the space of a few days.

In the first of these conversations, the customer in question was describing their batch-based application environment where records pertaining to distribution and shipping of product stock were written to sequential RMS data files that were processed on a daily basis and how inefficient this process was by today’s standards. Each of the data records written to these sequential files is essentially an event, and the value of knowing about an event and being able to react to it typically degrades over time; the quicker you can get information where it needs to go, the more responsive the business can be (and the happier the boss will be). For various reasons the time and effort required by the customer to modify the application to operate in a more event-oriented manner was considered prohibitive, however it occurred to me during the conversation that this was exactly the type of scenario that the RMS change data capture solution we have been working on could address: without having to change a line of code in their application, change data capture could be used to intercept writes to the data files in near-real-time and publish the individual data records to an event bus such that they could be readily consumed by any other interested systems without significant delay.

The second customer conversation was about a completely unrelated subject, however at the end of this discussion I was for some reason reminded of what happens when you call your insurance company or some other such provider, where after they have dealt with your request or problem they will invariably ask you something like "is there anything else I can help you with today?". Given that we had some time to spare on the call and being in a somewhat mischievous frame of mind I posed exactly this question. What turned out to be top of mind with the customer basically boiled down to observability and event correlation: being able not only to promptly detect application-related problems (in what is a highly complex application environment) but to also be able to correlate such problems with other events observed across the enterprise and perform timely root-cause analysis and resolution.

For the customer in question, from an OpenVMS perspective this equated to a desire to have data from various log files including ACMS audit and error logs, Apache HTTPD logs, OpenVMS accounting data, and various other more application-specific log files loaded into a centralised log management facility so that this information could be used for the purpose described. It is not uncommon for customers to ask about being able to load log files (or the records in those log files) into such tools as they do for other operating systems such as Linux or Windows using agents provided by the vendor of the centralised log management solution, however there are generally no such agents available for OpenVMS, and while it might in some cases be technically possible to port or develop an appropriate agent for OpenVMS, there are invariably going to be limitations. For example, while Apache web server logs are plain text, other log files such as ACMS error and audit logs and OpenVMS accounting, audit, and error logs are binary files, with important RMS attributes and in some cases quite complex record structures containing OpenVMS-specific data formats; loading such files or raw records from those files into any commonly used centralised log management facility such as those provided by Splunk or Dynatrace or by search engines such as Elasticsearch is unlikely to yield useful results.

After the call I pondered the discussion whilst walking to the dog, and with the work I had recently been doing around RMS change data capture somewhat fresh in my mind, it occurred to me that a reasonable approach to addressing this customer problem might be to extrapolate this work to include a reasonably generic mechanism to process these and other such files and push the data off the OpenVMS system into queues or streams as quickly and efficiently as possible and in a format that could be readily consumed using popular tools such as Logstash, which has the ability to ingest data from a multitude of sources, transform it in various ways, and send it to a range of destinations, or customers could readily implement their own tools on Linux or Windows to consume and process the data, if so desired. In other words, we could potentially provide a tool to put any such data somewhere that it is more readily accessible and in a standard format, whereupon customers could more easily do something with it themselves, and this could be done in a near-real-time manner, subject to file flush frequency and a few other factors.

The net result of these customer discussions and the subsequent pondering of ideas resulted in my Event Driven Architecture driven architecture presentation lurching off on a bit of a tangent and the creation of a prototype solution (affectionately named "Vole") that provides functionality to monitor almost any sequential log file and publish records as they are written to the those files onto an event bus (essentially a message queue or stream) in near-real-time. For good measure, the prototype was further enhanced to support other data sources from those mentioned previously, including audit log data, and the ability to intercept and publish OPCOM and intrusion messages (although the latter two are obviously not coming from log files). It should also be noted that the solution is able to handle situations where the administrator creates new versions of files (such as new versions of ACMS audit and error logs), dynamically transitioning to processing records in those new versions after processing all records in the current version.

As noted above, the data written to some of these log files are comprised of quite complex binary records containing OpenVMS-specific data formats. To handle this situation in a flexible and extensible manner, the prototype log processing functionality provided by Vole permits the specification of a user-written shareable image that implements functions to convert such data records into a format that is more amenable to ingestion by log management solutions or any other target systems. For example, the following command file would monitor ACMS audit and error logs, using functions in the shareable images VOLE$ACMSAUDIT.EXE and VOLE$SWL.EXE respectively to convert data records into JSON documents that would them be published to the target RabbitMQ message bus using the specified routing details (exchange name and routing key). In this particular example, the respective log files are checked for new records at 5 second intervals, and the log files are monitored in parallel on separate threads, courtesy of the /DETACH qualifier. It should be noted that there is no particular requirement for the solution to use RabbitMQ as the message bus and other options such as Kafka streams, Azure Service Bus, Azure Event Hub, or ActiveMQ could certainly be used, however I had a RabbitMQ broker instance close to hand, along various code fragments that could be readily reused to quickly pull together a RabbitMQ-based prototype.

Figure 1. Simple Vole command sequence to consume records from ACMS audit and error logs as they written and publish those records to the specified event broker using the defined routing criteria. Each file is processed using a separate thread (via the /DETACH qualifier) and read records are transformed prior to publication by functionality implemented in the shareable images specified via the /CALLBACK qualifier.

This general model for processing sequential files also fitted in very nicely with what we were trying to achieve with RMS change data capture, whereby the reading and processing of the so-called RLF files created by the RMS interceptor process could be handled in much the same way as illustrated in the above example by implementing a shareable image that understands the structure of the RLF file records and what to do with them, but that is a topic for another day.

At this point I sensed that my Event Driven Architecture talk was finally coming together and I might actually have something vaguely interesting to talk about, however knowing when to stop running with an idea has never been one of my strengths, and in a moment of madness I decided that the inclusion of a live demonstration was in order, having categorically stated previously, after the Boston Bootcamp, that I would never do another live demonstration. But what might such a demonstration look like? While at least some of the data sources mentioned above are most definitely of considerable relevance to many OpenVMS users for various reasons, they are not necessarily the most interesting or exciting data sources from a demonstration perspective, however it occurred to me that there is another rather obvious and widely used (and essentially batch-oriented) data source that could benefit significantly from this solution, namely T4: instead of waiting for T4 batch cycles to complete before being able to graph and analyse the collected data, using the Vole solution we could read records from at least some of the T4 data files as they are written to disk, perform any necessary transformation processing on those records, and publish the resultant data to an event broker from which it could be consumed by any interested applications to facilitate near-real-time analysis, display, and monitoring of any desired metrics.

Having had some (admittedly somewhat limited) experience with Elasticsearch, Logstash, and Kibana in my previous role with HP's Cloud Services team, I concluded that using the so-called "ELK" stack would be a good option for demonstration purposes to consume, store, and display such T4 timeseries data, resulting in the high-level architecture illustrated in the following diagram, where Logstash is configured to consume T4 data from the relevant event broker queue(s) and send it to Elasticsearch for storage in the Elasticsearch timeseries database, whereupon it can be queried to display and analyse metrics in near-real-time using facilities provided by the browser-based Kabina data visualization tool. For the purposes of the demonstration, other non-T4 data was consumed by a Microsoft Windows-based transformation engine that will become part of our change data capture solution and inserted into tables in a Microsoft SQL Server database (Linux and PostgreSQL can also be used).

Figure 2. High-level architecture implemented for the Bootcamp demonstration. Data records from various file sources are read by Vole threads as they are written, transformed as necessary, and published to event broker queues. T4 records were consumed from the broker using Logstash and fed into the Elasticsearch time series database from which they could be queried and analysed using Kibana.

However, it was quickly apparent that the ELK stack products had evolved very considerably over the course of the decade since I last used them into extremely powerful and sophisticated tools, requiring me to take the drastic step of having to read the documentation (more than once) in order to get everything installed and configured. To simplify things somewhat, Elasticsearch and Kibana were hosted on Vultr cloud using their Elasticsearch Marketplace Application (one-click deployment), and Logstash was configured to run on the VSI network, consuming data from a locally hosted RabbitMQ event broker. After a bit of reading and some mild frustration (due to my poor comprehension), data began to flow smoothly, whereupon it was straightforward to create dynamic timeseries graphs such as those illustrated below for selected T4 metrics.

Figure 3. Simple illustration of a dashboard created using Kibana to display values of two T4 metrics being collected by Vole in near-real-time. The period over which data values can be viewed and the refresh rate can be configured as appropriate to provide the desired level of resolution (as constrained by the T4 sampling frequency).

It should be noted that these graphs are an extremely simple illustration of what is possible using Kibana (or similar such modern graphing and data analysis tools), and while I have used the term "near-real-time" liberally throughout the preceding text, there will of course be some lag between measurement and display, depending on factors such as file flush frequencies and the time taken for data to flow through the event broker and other infrastructure (with file flush frequency being by far the largest contributor), however being able to view, store, and take action on performance metrics, security events, and other such data within a few seconds (at most) of their measurement is clearly better than collecting and reviewing such data on a daily basis, so long as this near-real-time processing can be achieved without impacting the operational performance of the OpenVMS environment, which is indeed the case here, with the solution consuming minimal resources when operating in steady state mode. It is noted that on start-up Vole may consume quite considerable CPU and I/O resources if it is required to read and publish large numbers of pre-existing records to the event broker before reaching steady state operation, however this will invariably be a relatively short-lived operational spike, with Vole being routinely able to process in excess of 16000 records per second, depending primarily on record size.

Much to my relief, the talk and demonstration went well, with several customers expressing considerable interest in what had been presented, in addition to the two customers that precipitated this whole chain of events in the first place. There is work to be done before we could look to provide a production-ready Vole-based solution to customers (for example, not all T4 datasets are currently handled), however I would like to think that this work can and will happen. That being said, this would likely take the form of a VSI solution offering as opposed to the provision of a shrink-wrapped product, as just about every customer use-case will be somewhat different, requiring at least some level of customisation.

So, as a consequence of this prototyping work done to support my "Event-driven architecture and why you need it" presentation, we have a prototype solution that can in a fairly generic way be used to push data from log files and various other file-based data sources off OpenVMS quickly and efficiently such that it can be readily consumed and used by other systems, which is all very well and good, but what about Event Driven Architecture in general and its applicability to legacy OpenVMS application environments?

Putting it somewhat simplistically, EDA is a software design pattern in which systems react to events, which represent changes in state, by publishing and consuming messages that trigger actions in other parts of the system. In some respects, EDA is similar to the popular Service Oriented Architecture (SOA) model and in other respects EDA has much in common with message queueing, however SOA promotes a more tightly coupled and inherently request-response model, while traditional message queuing tends to be more point-to-point than the publish/subscribe approach advocated by EDA, but for sure there is overlap. This link provides a good general overview of EDA and gives links to additional information.

Event-driven architecture has been widely adopted by the software industry in recent years, motivated by increases in software modularity and the decomposition of monolithic applications into discrete components. EDA provides a flexible, scalable, and near-real-time approach to processing actions and responses quickly and efficiently, handling large data volumes, with different system components reacting asynchronously and independently to events without tight coupling or needing to constantly poll for updates, thereby facilitating easier integration with other systems, more rapid development, and seamless updates to individual system components. However, EDA is not only applicable to modern applications; it is also highly applicable to many legacy environments, providing opportunities for improved business agility, integration, and incremental modernization without full application replacement. EDA is well-suited to many common use-cases including payment processing, business process automation, tracking and auditing system events, change data capture (data replication), and to integration in general.

An event-driven architecture can be integrated with legacy systems by allowing the older applications to publish events, which can then be consumed by newer, more flexible systems, effectively acting as a bridge between the old and new application environments while maintaining loose coupling and enabling asynchronous communication between them; however, integrating legacy systems with EDA often requires careful consideration in terms of potential impact on legacy system operations and increased complexity (there's no free lunch). It is also important to appreciate that not all legacy systems necessarily have any notion of events, however they may have (or be able to expose) interfaces that can be used to send and receive data that would allow them to participate in an EDA environment.

A lot of software engineering today is more about creating cohesive applications by gluing together pre-existing services and other such building blocks, as opposed to designing and building something from the ground up. EDA can be seen as a form of glue that provides flexibility without compromising performance or availability. Applying event-driven strategies to legacy application environments such as those found on many OpenVMS systems can improve integration, scalability, and responsiveness by enabling asynchronous communication and decoupling of components, allowing for easier and more efficient integration with new technologies and services being deployed across the business. Reading log files in near-real-time as discussed in this post is perhaps a rather crude EDA use-case, however it is a useful one. It also serves to illustrate the fact that there may be events lurking in your systems and application environments that you may not have previously considered, which could be leveraged in a similarly useful manner. We have ended up covering quite a bit of ground here and even if the topic of the post is not of any particular relevance to your situation, the text will hopefully at least help to trigger some ideas about how you might be able to do more with your OpenVMS applications. And of course, we are always happy to talk with you about what might be possible (and to ask if there's “anything else we can help you with today”).