Large(r) Data Comes to RPG IV

Article ID: 57131

Working with Big Data

In November of 2007, I talked about how to work with more than 64Kof data in RPG IV. In i5/OS (IBM "I"), IBM has added this capability to RPG IV so now it's built-in. Let's see how they did it.

New Data Size Limits in RPG IV

When you move to RPG IV, you'll see that old 64K limit for fields and other data elements removed. Technically, IBM tells me that the limit is really removed this time, not just increased. This means that the limitation is now an operating system limit, not a language limitation. Currently the system allows you to use data elements up to just shy of 16 megabytes. In fact, due to operating system restrictions, here are the v6.1 capabilities in RPG IV:

Data Construct RPGIII RPG IV at V5.4 RPG IV at v6.1
Field Length 256 65535 16,773,104
Data Structure Length 9999 65535 16,773,104
Array Elements 999 32767 16,773,104
Constants/Literals 256 2048 16,380
Table 1: RPG IV Size Limits

The limits in Table 1 are substantially different in v6.1 than back in RPG III. The interesting thing is that in RPG III the limitation was largely due to the width of the from/to columns on the Input specifications and the Result Field Length column on the Calculation specification. Mirroring that problem, in RPG IV, the Definition specification has 7 positions for length notation, so you might be wondering how you can enter an eight-digit value in a seven-position area. The answer is keywords.

The New LEN Keyword

To circumvent the columnar notation, in v6.1 IBM has introduced the LEN keyword on the Definition specifications. Use this keyword to specify the length of the field you're defining. Since the keyword area is free format and goes from position 44 to 80, you have more than enough space to define a field as large as necessary.

.....DName+++++++++++EUDSFrom+++To+++++TDc.Functions++++++++++++++++++++++++++++
     D myData          S               A   Len(1365286)

In this example, the field named MYDATA is defined with a length of 1,365,286. The keyword does not currently support relative sizing; for example, there is no way to mimic the following using the LEN keyword:

.....DName+++++++++++EUDSFrom+++To+++++TDc.Functions++++++++++++++++++++++++++++
     D yourData        S             +5A

Relative sizing is going to be ported to keyword notation probably when and if IBM moves RPG to total free-format syntax. But for now, you'll need to switch back to the conventional syntax if this kind of capability is needed.

Arrays and Data Structures

Fortunately, the DIM keyword existed and was simply enhanced to handle the larger size capability; nothing new needed to be introduced. Regular Arrays and Data Structure Arrays can continue to have the number of elements specified using the DIM keyword. Simply indicate the number of elements up to 16,773,104; however, be warned -- the "length x width" of the array is also restricted to 16,773,104 bytes of data. In other words, the total size of the array (all of its elements) cannot exceed 16,773,104 bytes. This means you can have 16,773,104 elements, so long as each is only one byte in length, or you can have two elements 8,386,552 bytes each, and so on.

Rather than set a limit on the number of elements, the limitation is on the total size of the array's data (all elements). Hopefully this means that if and when IBM removes the 16 MB scalar limit, RPG IV implicitly has the ability to leverage the new capabilities. After all, in 1981 16 MB was a huge amount of data, but today, I have an 8 GB "thumb drive" attached to my keychain, so 16 MB doesn't mean anything by today's standards. And I find it interesting on a system that supports 286 trillion addresses that user spaces and field's have size limitation of 16 MB. That's a bit embarrassing.

Suppose you need an array with 10-byte elements and you need a million of them. Suppose you need a megabyte of them (1,048,576 elements) that array would be declared as follows:

.....DName+++++++++++EUDSFrom+++To+++++TDc.Functions++++++++++++++++++++++++++++
     D bigArray        S             10A   Dim(1048576)

Data Structures are enhanced as well. The total size of the Data Structure cannot exceed the 16,773,104 byte capability. This means you can have a very huge data structure that gets passed to an i5/OS API or collect a lot more records from an SQL fetch all at once. Given the fact that the Database restricts record format lengths to 32K of data if you divide 16,733,104 by about 512 elements or records that you can read into storage at one time. And since most applications have database records of a few hundred bytes (well except for that one...), this means that by using SQL FETCH, you can read literally thousand of records into storage--into a data structure array--at one time. Necessary? I don't think so, but that's how those darn kids are writing code these days.

Data Structure Space Galore

.....DName+++++++++++EUDSFrom+++To+++++TDc.Functions++++++++++++++++++++++++++++
     D hostData        DS                  Qualified
     D  custmast                           LikeRec(custrec) Dim(1024)
     D  itemmast                           LikeRec(itemrec) Dim(1024)
     D  invoice                            LikeRec(invrec)  Dim(10)
     

It isn't like me to creates these gigantic data structure like those used in several i5/OS APIs. However, some people have cool ideas to make something work, and the new size capabilities help them do more of that kind of thing. For example, nesting several database files into a signal data structure and making them into an array is unheard of. But with the new larger capabilities, doing so is no longer impossible. In this example, I've embedded three database file record formats into a single, qualified data structure. In addition, I've made them arrays with 1K of elements. Certainly this would have blown out the 64K limit on v5.4 and earlier. Consequently, before v6.1 it was impossible to do this.

Side Effects

To me a more important feature that has finally been implemented in RPG IV as a consequence of relaxing the data size limits is that all the built-in functions and operation codes now support data greater than 64K. Before v6.1 we had to resort to C runtime functions if we breached that barrier (as was often the case when using a user space with APIs). But now instead of using MEMCPY or some other C runtime function, a simple EVAL or perhaps a %SUBST() function will do the same job for us. And this is worth the cost of moving to v6.1 all by itself. The relaxing/removal of limits in the programming language is a great thing, and has been added to my list of the top four most important things added to RPG IV since it was first introduced.

Bob Cozzi's Top 4 RPG IV Enhancements

  1. Removing the Size Limit
  2. EXTFILE and EXTMBR Keywords
  3. Qualified Data Structures
  4. Removing the Field Name Length Limit

Bob Cozzi lectures on RPG IV and System i development at corporate customers and user groups all year long. He is author of "RPG TnT: 101 Dynamic Tips 'n Techniques with RPG IV" and hosts RPG World Live, a video podcast for RPG developers and Managers aired live on Fridays from 11:00 AM to Noon Central time (16:00 GMT) on ustream.tv and on RPGWorld.com. Bob also produces RPG World, the most popular RPG IV developer conference of the year. Visit RPGWorld.com for details on the dates and location of the 2009 RPG World Conference.

ProVIP Sponsors

ProVIP Sponsors