VSI Application Services – Replacing Your Plumbing

"There are many OpenVMS customers running large, complex, business-critical custom-written software applications. While these applications continue to serve the business well, many of them now need to interoperate and exchange data with external systems and applications running on other operating systems. The options available are to replace the existing OpenVMS-based system or to modernize it in some way so that it can continue to operate in a modern heterogeneous computing environment."

I originally wrote these words (or something very similar) back in 2008 for a Bootcamp presentation on application modernization. At that time I was working closely with people in the HP OpenVMS Engineering and BCS (Business Critical Servers) groups at HP to help OpenVMS users move from Alpha to Integrity, travelling around the globe to exotic (and sometimes not so exotic) locations, working closely with customers to help them port their applications to the Integrity platform and to help them modernize their applications in various ways, typically through the use of open source technologies to facilitate integration via open standards with other systems. During this time, I encountered many interesting and challenging problems (technical and otherwise) and made many new friends. There are plenty of stories to be told from these adventures, and it is probably fair to say that this work in no small part paved the way to me joining VSI some 6 or 7 years later. It was also during this period that – in reference to all of the integration work I was doing – my manager at the time started referring to me as the "the plumber", hence the title of the post.

Conceptually, not much has changed since I wrote those words 16 years ago, but from a technical perspective a great deal has changed, both for OpenVMS and for the software industry as a whole, creating both new opportunities and new challenges for OpenVMS users. For example, porting OpenVMS to x86-64 decouples the operating system from proprietary hardware and offers the opportunity to deploy and use OpenVMS in new and exciting ways, however the fundamental problems alluded to in my words from 2008 still exist, with OpenVMS applications still often being viewed as some sort of mysterious island of information that data cannot readily escape from or travel to. Indeed, this view of things has perhaps become somewhat further exacerbated over time by an incremental decline in the number of ISV solutions available for OpenVMS that can be used to solve these types of integration problems. For example, in recent years Qlik have dropped support on OpenVMS for what used to be the Attunity suite of products, Oracle have discontinued support for the Oracle RDBMS on OpenVMS (including the Oracle client), and in all probability Oracle MessageQ (formerly DEC and BEA MessageQ) will not be ported to VSI OpenVMS x86-64. My initial reaction to these sorts of announcements is obviously one of concern, however it has been said there are no problems, only opportunities, and indeed it is invariably possible to devise alternative solutions for such scenarios. And by leveraging open-source technologies it is often possible to devise alternative solutions that are not only more cost effective but also provide users with new opportunities to enhance their OpenVMS application environments to take advantage of modern non-proprietary protocols, simplifying integration with external systems, and providing the potential to leverage those applications that have served the business so well for many years in new and exciting ways.

At some point over the coming months, I will try to write something about some of the replacement solutions we have been working on, and in this post, I thought I'd start off by spending some time talking about Oracle MessageQ. As many readers will know, Oracle MessageQ started its life as DEC MessageQ, Digital Equipment Corporation's offering into the message queueing market in the early 1990's, aimed at competing with other products emerging in this space from the likes of IBM and TIBCO. DEC MessageQ was well-received by the market, seeing rapid adoption in the financial services, manufacturing, and telecommunications sectors, and the product was ported to Windows and to various flavours of UNIX.

“Top marks for DEC MessageQ, none for marketing Digital Equipment Corp’s DEC MessageQ is by far and away the best message-oriented middleware product in the market but is being let down by an almost non-existent marketing strategy, according to a report from London, UK-based market consultant Ovum Ltd…” https://techmonitor.ai/technology/top_marks_for_decmessageq_none_for_marketing, July 1996

However, true to form Digital Equipment Corporation did not do a particularly great job of marketing the product, and it was eventually sold to BEA Systems in the late-1990's along with several other software assets, such as DEC Object Broker, a leading-edge CORBA implementation that was initially used by BEA Systems to underpin their early WebLogic implementations. BEA Systems did a good job of capitalising on these acquired software assets, supporting customers well, and looking to merge the capabilities of these products with their flagship TUXEDO open systems transaction processing suite. However, BEA Systems was subsequently acquired by Oracle in 2008, and there has been minimal enhancement of MessageQ since this time.

As I commented above, in all probability Oracle MessageQ will not be ported to OpenVMS x86-64, making it problematical for those using the product on OpenVMS to move their OpenVMS application environments to OpenVMS x86-64. In general terms, most message queueing products work in essentially the same way and provide many of the same features, suggesting that in theory at least it should be possible to replace MessageQ with an alternative product, however life is never quite that simple. For example, some users have very extensive MessageQ deployments across multiple platforms, meaning that any replacement solution must not only support all such platforms and existing functionality, but it should ideally do so in such a way that it can be implemented without significant changes being required to existing application code. In addition (and despite my previous comment about all message queueing products working in largely the same way), MessageQ is in fact operationally and programmatically significantly different from most other message queueing products in some respects, providing some "unique" features that make it harder to replace than most other such products. But we like a challenge, and at a high level at least the plan is simple: to devise a minimally disruptive, cost effective, and future-proof way to replace MessageQ with an open-source message queueing solution!

Empowered with the above lofty aim in mind (and for the moment ignoring some of the unique features of Oracle MessageQ), the first order of business is to identify a suitable replacement message queueing technology to form the base around which a more complete replacement solution can be developed. There are several high-quality open-source message queuing technologies, including products such as Apache ActiveMQ, RabbitMQ, and LavinMQ, all of which support open standards-based messaging protocols, and are well-proven in terms of key non-functional requirements such as performance, scalability, availability, security, and product support. It should also be noted that unlike Oracle MessageQ these (and indeed most other proprietary message queueing products) are broker-based message queueing technologies, with a central broker (or a cluster of brokers) essentially acting somewhat like a network switch or router, routing messages received from publishing clients to consumers via queues in accordance with defined routing topologies and several other criteria. This of little relevance with regard to selecting a replacement product, the point being that the routing topologies supported by these more modern message queueing solutions significantly surpass the basic capabilities of Oracle MessageQ, opening the door to new integration possibilities, particularly when it is also taken into consideration that these more modern products support multiple messaging protocols, such that messages published via one protocol may (with some limitations) be consumed via a another protocol. The fact that these newer technologies conform to open standards also means that there are client API implementations available for most programming languages.

After some consideration, it was decided to base the replacement solution on RabbitMQ. A detailed discussion of the full rational behind this decision is beyond the scope of this post, but certainly some of the contributing factors were existing in-house product knowledge and a simple and flexible AMQP 0.9.1 C API for solution development, coupled with the fact that some customers we talked with had noted they were already using RabbitMQ elsewhere within the business. Without question, much the same approach as described in the subsequent text could be done using one or another of the other products mentioned, and as an aside I would note that we have recently ported the Apache Qpid Proton API (https://qpid.apache.org/proton/) to VSI OpenVMS (Integrity and x86-64 only), which may be used not only with products that support AMQP 1.0 such as ActiveMQ and RabbitMQ, but also with AMQP 1.0-compliant cloud-based message queueing services such as Azure Service Bus. From a protocol perspective, AMQP (the Advanced Message Queuing Protocol) provides an ideal choice for enterprise messaging, be it the 0.9.1 variant used natively by RabbitMQ and LavinMQ or the AMQP 1.0 OASIS standard.

But back to the problem at hand!

Fundamentally, Oracle MessageQ queue provides a relatively small and concise set of API functions for attaching to and detaching from queues, publishing and consuming messages, and so on. BEA Systems added some number of additional functions to the original API to facilitate better integration with TUXEDO, but these functions have tended not to be used by OpenVMS users, and accordingly we do not need to concern ourselves with them (although they could likely be supported by the replacement solution, if required). It is also observed that Oracle MessageQ provides various command line tools for creating message queues, defining queue characteristics, and for administering how queues can be accessed and by whom. Based on these general observations (and looking at things somewhat superficially), the high-level plan for implementing a minimally disruptive replacement solution for Oracle MessageQ using RabbitMQ is simply to implement an equivalent set of API functions using the RabbitMQ C API and to provide a comparable set of command line tools. But life is never that simple, and as alluded to previously, Oracle MessageQ has some rather unique features that need to be taken into careful consideration, and indeed the closer we look at how Oracle MessageQ works, the more interesting things get. A complete and detailed description of these challenges and the solutions devised to deal with them to create a mostly complete replacement solution exceeds the scope of this post (and would probably put most readers to sleep), but the following text hopefully provides some insight into some of the more fundamental matters that have needed to be considered without going into excessive detail regarding the actual implementation. Note that the following text assumes some level of product knowledge, and in the interests of keeping things at least somewhat brief some details have been glossed over.

Queue characteristics

Probably (and perhaps somewhat obviously) the first thing to consider is how queues work in Oracle MessageQ. Essentially everything pertaining to the configuration of the message queue topology, queue characteristics, access restrictions, and so on is pre-configured using command line tools; there are no GUI management tools, and applications cannot programmatically create queues (other than temporary queues) dynamically at runtime. Queues are created within so-called queue groups, with both queues and queue groups being identified by integer values in the range 0 to 9999. Additionally, these queue numbers have some level of meaning, with queue numbers the range 4000 to 6000 being reserved for "broadcast" queues. Queues can optionally be assigned a unique name that can be used to locate the queue in question if you know the name but not the queue and queue group numbers. Non-broadcast queues can also be journaled and can have an associated dead-letter queue.

All of these matters can be readily accommodated by RabbitMQ. It is straightforward enough to provide comparable command line tools; there is no particular problem deriving and using queue and exchange names based on the numerical queue and group model employed by Oracle MessageQ, broadcast queues can be readily handled using a fanout exchange that applications can bind to via a temporary queue, journaling is nicely handled using RabbitMQ streams, and dead-letter queues are likewise readily supported by RabbitMQ. The fact that queues in MessageQ can optionally be assigned a unique name for lookup purposes poses a slight complication, but this requirement can be readily addressed by providing a simple name server that can be queried by applications to determine queue and group number details for a given queue name.

The following graphics (taken from the RabbitMQ management interface) illustrate some of these points, showing how Oracle MessageQ queue and group numbers have been used to derive names for RabbitMQ queues and exchanges. In the queues example, queue number 1 in group 0 additionally has associated journal and a dead-letter queues. And if you are wondering about the "PAMS" prefix, this is historical and relates to the evolution of MessageQ, whereby it was originally developed by Digital Equipment Corporation for a specific customer project, later evolving into a commercial product. I am not entirely sure what the acronym “PAMS” stands for, but I think it is something like "Programmable Application Messaging System" (if you have some insights here, let me know). It should also be noted that the way in which the queue and queue group numbers have been used here to name "PAMS" queues and exchanges makes it possible within RabbitMQ to specify access controls comparable to the ACL-based access controls employed by Oracle MessageQ.

Number of approved Community Licenses — Figure 1. Queues (top) and exchanges (bottom), illustrating the naming conventions used to map Oracle MessageQ queue and group numbers to RabbitMQ queue and exchange names. In addition to the primary pams.direct exchange used for routing messages to target queues, the bottom graphic also shows a broadcast (fanout) exchange, and the dead letter exchange used to route messages that cannot be processed into a dead-letter queue, if defined.

Message routing and consuming messages

Message routing within Oracle MessageQ is very simple and is readily handled by RabbitMQ. There is no concept of an exchange or similar such routing component, with messages simply being published directly into their target queues. This basic queueing functionality could easily be replicated in RabbitMQ by designing our replacement PAMS API to publish messages directly into target queues via the RabbitMQ default exchange, but to provide a greater flexibility and control (particularly with regard to journaling) it was instead decided to provide a direct exchange (pams.direct) through which messages would be routed into their relevant queues, with the queue group and queue number being used to define the routing key. As noted previously, MessageQ also supports a basic broadcast service, such that all consumers attached to a broadcast queue will receive a copy of any messages published to that queue. This functionality is readily implemented in RabbitMQ using the fanout exchange type, with exchange names including the queue number, as illustrated in the above diagram, where there is one such exchange, named "pams.broadcast-0000.5000". For any clients wishing to subscribe to messages published to such an exchange, our replacement API simply creates a temporary queue and binds that queue to the exchange.

While publishing and routing of messages in Oracle MessageQ is very simple, consuming messages is a somewhat different story, with Oracle MessageQ providing functionality to facilitate selective consumption messages from queues based on a variety of selection criteria, including message properties such message class, type, and priority, and messages can also be selected based on their content. These are somewhat evil anti-patterns as far as RabbitMQ (and most other modern message queueing systems) is concerned and this is functionality that cannot realistically be supported. RabbitMQ does of course have priority queues, and it is possible to assign a message priority in the message header, but it is not possible to consume messages based on priority. It would certainly be possible to consume messages and re-queue those messages that did not meet the desired selection criteria, but this could create a nasty "poison message" problem (messages that are repeatedly requeued) and could also cause problems with regard to message order. After some deliberation the decision was made to not even try to support these features, as it would likely not end well and would compromise other aspects of the overall solution. This will impact some users, but likely not significantly, and the benefits derived from the new RabbitMQ-based solution easily outweigh the loss of such functionality (we could of course be proved wrong). Oracle MessageQ can also select (consume) messages based on queue number, which is functionality readily handled with RabbitMQ by using a separate channel for each such queue and maintaining relevant context within our PAMS API.

Additionally, Oracle MessageQ provides three functions for consuming messages, each with slightly different characteristics. The simplest of these functions is pams_get_msg(), which simply attempts to retrieve the next message from the queue or returns immediately if there are no messages available. In addition to this function there is pams_get_msgw(), which allows the specification of a timeout for how long the function should wait for messages to become available, and there is also the pams_get_msga() function, which retrieves messages asynchronously, waiting indefinitely in a separate non-blocking thread for messages to become available and processing them via a callback function. In general, it is straightforward to replicate the functionality of these three functions with our RabbitMQ-based replacement API implementation, however there is some complexity associated with the implementation of the first function variant related to how messages are received by client applications, which is discussed in more detail below. It should also be noted that while I used the word "thread" above, in Oracle MessageQ the pams_get_msga() function uses OpenVMS ASTs as opposed to POSIX threads, however our replacement API uses the latter, making it more readily possible for us to port the replacement API to other platforms.

Finally with regard to consuming messages it should be noted that acknowledgement of consumed messages is performed within the replacement API as opposed to being done at the application level. This is not particularly ideal, as if the application is unable for any reason to process a message or if the application crashes, messages could be lost. However, there is not really any better way to deal with this situation, as the Oracle MessageQ API itself does not provide specific ack/nack calls (although it does provide some facilities to guarantee delivery, but that is not quite the same thing).

Publishing messages

In contrast to consuming messages, publishing messages in Oracle MessageQ via the PAMS API is relatively straightforward, being handled by the pams_put_msg() function, which accepts a myriad of fascinating parameters in addition to queue details and the actual message data, including parameters specifying message type, class, and priority, correlation ID, reply-queue details, and so on, with some of these values being required, and others (such as correlation ID and reply-queue) being optional, depending on the messaging scenario at hand. With the exception of message class, all of these properties can be readily mapped to RabbitMQ equivalents, and message class is readily accommodated by our replacement API by combining it as a string with the message type and using the resultant value as the RabbitMQ message type property in our replacement pams_put_msg() function, and separating out the values as integers in our replacement "get" functions.

Name server

As noted previously, queues in Oracle MessageQ are identified by a group number and a queue number, however it is also possible to assign queues a meaningful textual name, which can be used to determine the group and queue number via a call to the pams_locate_q() function. To provide comparable functionality in our replacement solution, it was necessary to implement a simple name server that could be called by clients to perform the required name resolution. The name server was implemented as a RabbitMQ client using the RPC message pattern and runs on OpenVMS as a detached process. The name server implements a small set of functions that support the following operations, with messages being exchanged between applications and the name server as JSON objects:

Ping (aliveness test, keep-alive)
Get queue and group numbers based on queue name
Get queue name based on queue number and group number
Register queue
Get next free queue number (for temporary queues)

As mentioned above, the name server is typically run as an OpenVMS detached process, using a command procedure similar to that shown below, which also serves to illustrate some of the basic functionality provided by the command line interface that forms part of the overall replacement solution. The first point to note is that the replacement solution is affectionately known as "Otter" (for absolutely no good reason, other than the fact that they're cute animals). As can be seen from this simple example, the command line interface provides commands to attach to the RabbitMQ message broker, to create exchanges and queues, and so on. Queues are assigned a name in addition to group and queue numbers, and details as to whether queues are journaled or have associated dead letter exchanges can be optionally specified. The command line interface also provides commands to delete queues and exchanges, to purge queues, to register and de-register queues with the name server, and to ping the name server.

Journals

As mentioned previously, queues in Oracle MessageQ can have an associated journal, which is essentially just a sequential RMS file containing all messages published to the associated queue, with journal file names being derived using the queue and group numbers. This journaling functionality can be readily supported by RabbitMQ using RabbitMQ streams that get a copy of each message that is routed to the associated queue. There is also a separate set of API functions provided by Oracle MessageQ for reading messages from these journal files, the functionality of which can again be easily replicated by opening a new connection to the RabbitMQ broker, reading messages from the relevant stream until end-of-stream is detected, and closing the connection. It is noted that Oracle MessageQ does in fact support several journal types, however not all are relevant in a RabbitMQ context and are therefore not supported by the replacement solution.

Connecting and disconnecting

As mentioned elsewhere, Oracle MessageQ is not broker-based or distributed, requiring the use of separate gateway processes to facilitate remote access to the central queueing bus via various (and sometimes ancient) protocols. Accordingly, there are no explicit connect and disconnect functions provided by the Oracle MessageQ PAMS API. Because of the way Oracle MessageQ works, such functions are not required, with connection essentially equating to applications attaching exclusively to their nominated primary queue, and disconnection equating to detaching from the primary queue. In contrast, it is necessary to explicitly connect to the RabbitMQ broker (via TCP/IP) before any other operations can be performed. While we have added explicit connect and disconnect functions to our replacement API, using these functions obviously requires minor changes to application code, and to avoid the need for any such changes it is instead possible to define the broker connect string as a logical name that can be used by functions within our API implementation to connect to the broker if it is not already connected. The discerning reader will note that this means any application can have only one connection to the broker, however it is important to appreciate that connections to the RabbitMQ broker are multiplexed with channels that can be thought of as lightweight connections sharing a single TCP connection. In other words, only a single instance of our replacement PAMS API can be initiated by an application program, however this is consistent with the with the Oracle MessageQ API implementation.

Once connected to RabbitMQ, we emulate Oracle MessageQ behaviour, requiring applications to first "attach" to their nominated primary queue as an exclusive consumer before they can bind to and consume messages (in a non-exclusive manner) from any other so-called secondary and multi-reader queues. Access to queues and other resources in RabbitMQ is controlled via the RabbitMQ permissions system, which provides comparable if not better access control than the ACL-based approach that is employed by Oracle MessageQ on OpenVMS.

In addition to defining a logical name for the broker connect string, the replacement API also uses logical names to specify the default group ID, message read timeout, and the RabbitMQ pre-fetch count. Note that the message read timeout is required because with the RabbitMQ-based solution we are "consuming" messages as opposed to "getting" them, with the broker pushing messages down the wire to consumers (within various constraints), as opposed to clients periodically polling the broker for new messages, which is considerably less efficient. The idea of the aforementioned read timeout is to provide time for the broker to push messages down the wire if our application is calling the pams_get_msg() API function in a tight loop and we want to avoid bogus "queue empty" scenarios, although if the pre-fetch count is greater than 1, this situation becomes less of a concern (I will not bore you with a detailed description of the pre-fetch count, but it essentially limits how many unacknowledged messages the broker can push onto a channel or connection).

Interesting features

The Oracle MessageQ PAMS API has some rather interesting features. For example, all function arguments are without exception passed by reference, and additional arguments have been added over time to some API functions to support new functionality such as to provide support for message sizes greater than 64KB. These things are arguably not best practice from an API perspective, however such considerations are largely irrelevant and any purist views with regard to API design must be set aside, as our replacement API must obviously provide the same interface, whether we are enamoured of it or not. It is also possible to rationalize at least some of these API features. For example, the fact that all function arguments are passed by reference and with string arguments and their lengths being passed separately makes it somewhat easier to use the API with languages other than C/C++, such as Fortran, COBOL, or BASIC.

Utility routines

In addition to providing the core set of functions for attaching to and detaching from queues, and for publishing and consuming messages, Oracle MessageQ also includes the utility function putil_show_pending, which can be used to determine the number of messages currently pending in one or more queues. This function appears to be little used by Oracle MessageQ users, but for the sake of completeness we have included a simple implementation in our replacement API that performs a passive queue declare operation for the queue of interest, which will return a count of the number of undelivered messages that are currently in the queue. It should be noted that this approach will not take into account any delivered but unacknowledged messages, and there is certainly scope for improvement in terms of how we have implemented things here, but given the apparently little-used nature of this function any such additional effort was not considered worth the effort at this time.

Conclusion

If you have managed to read this far, well done, and I shall torment you no further! If nothing else, this post hopefully serves to illustrate the types of solutions VSI Application Services can implement and the sorts of problems we can help you to solve, or possibly some of the ideas discussed here will serve as inspiration for the development of your own solutions to similar such problems. The key point is that OpenVMS is not some sort of strange island from which data cannot escape from or travel to, and without too much effort OpenVMS-based applications can indeed readily participate in modern heterogeneous computing environments.

In terms of the specific problem considered in this post, Oracle MessageQ has served users well for many years, but the product now has limited support, and it will likely not be ported to OpenVMS x86-64 meaning that users wishing to move to OpenVMS x86-64 will therefore require an alternative solution. Any such solution should be modern, fully supported, require minimal change to existing application code, and ideally open up new opportunities from an integration perspective. While there remain some functional gaps that are still to be addressed, the solution outlined here goes a very considerable way to meeting these requirements. The general approach taken to develop this solution is also quite generic, and with a little ingenuity and imagination it could be readily applied to the solution of other problems.

In terms of current status, the solution described here has been implemented and is working very well on OpenVMS, however there are a few things that still need to be done. In particular, the replacement PAMS API needs to be ported to Linux and Windows, timeouts need to be supported for some API operations, and of course some documentation needs to be developed. We also need to develop the solution into a professional services offering that we can provide to customers to help them migrate.